Maximize insights in Oracle Analytics Cloud with Data Lakehouses Part 3 – Medallion Architecture

Maximize insights in Oracle Analytics Cloud with Data Lakehouses Part 3 – Medallion Architecture

This is the third in my series of blogs on Data Lakehouses. My first two articles lay the conceptual framework for Data Lakes and Data Warehouses. This article discusses the Data Lake medallion design pattern.

Medallion architecture is a Data Lake design approach with tiers that are named after the familiar Olympic Medal types. Unlike the Olympics, your data assets don't race each other. In fact, the Bronze data generally comes in first πŸ˜€ There are multiple schools of thought for structuring Data Lakes but the prevailing architecture is moving towards the 3 tier medallion architecture approach. In fact, Oracle's forthcoming Intelligent Data Lake (IDL) is leveraging this approach.

What is the Oracle Intelligent Data Lake (IDL)?
The Intelligent Data Lake was one of my key takeaways from Oracle CloudWorld 24. So what is it, and why does it matter? The answer to the former is that the Intelligent Data Lake (IDL) is Oracle’s solution to building a comprehensive Data Lakehouse platform using modern data management tools.

Regardless of the products you use to create your Data Lake, you need to structure it in a logical way. The medallion approach consists of 3 tiers:

  • Bronze Layer: Raw data from external source system that is as close to its natural form as possible. The data will be source system fields along with load information such as load ID, load data, source name, etc. Having a Bronze layer allows for data archiving of source data and reprocessing of data without having to pull data from the source. Data integration tools often include replication/mass ingestion capabilities that are specifically designed to hydrate this layer of the Data Lake. You might also see this layer referred to as the Landing Zone or Raw Zone, however, given its critical nature I think it is important to put it on the podium, hence Bronze Layer.
  • Silver Layer: The raw data from the the Bronze Layer has been cleansed of data quality issues and gets written to the Silver layer. Data quality routines include items such as data deduplication and data type enforcement. This data will be the source for downstream analytical workloads. For instance, data scientists will use this layer to perform advanced analytical workloads.
  • Gold Layer: Also called the curated layer, it is used by business users to consume the data. The data from the Silver Layer is processed into domain specific data models (aka star schema or Subject Area). This includes aggregating and partitioning the Silver layer data to maximize performance. In a pure Data Lake structure all this data will be processed and stored in a format like JSON, Parquet or Iceberg. In a Data Lakehouse architecture, the Gold Layer could be a hybrid. For instance there would be some Oracle Object Storage JSON files and some dimensionally modelled tables in an Oracle Autonomous Data Warehouse.

I had a client that had acquired multiple organizations. They wanted to implement a universal commission and expenses policy but lacked an easy way to see all the data together. This simple example below shows how you could create a gold layer with the analytical data sets needed to make the compensation and expenses decisions. First, you must get the requisite raw data, then you need to combine employees in the silver layer in addition to some data cleanup in the various other data extracts. Finally, three aggregated Subject Areas would be created in the Gold Layer for business user consumption in a tool like OAC. In the next post in this series, I will review ways to create a Gold Zone.

Oracle Data Lake Medallion Architecture
Medallion Data Lake Architecture