Maximize insights in Oracle Analytics Cloud with Data Lakehouses Part 1 - The Data Lake

Maximize insights in Oracle Analytics Cloud with Data Lakehouses Part 1 - The Data Lake
Photo by C Boyd / Unsplash

The variety, velocity, and volume of data have increased dramatically over the last twenty-plus years. The Data Lake was the architectural answer to address this evolving data landscape. The Data Lake approach has allowed data engineers to store data from almost all different types of sources using streaming and batch data pipelines into a single location. The data is generally stored in its raw format, including structured, unstructured, and semi-structured data. This approach avoids the need to immediately transform the data into a data warehouse. You can instead store the data files in an object store like Oracle Object Storage. This allows for storage of large amounts of data in a simple and cost effective manner.

Oracle Data Lake
Oracle Data Lake Example

The image above shows some of the types of data that can be loaded into the Data Lake. The size of this data can grow very quickly. Sensor data on manufacturing machines is constantly streaming large amounts of data, which is the same with logging of software applications. Videos, photos, and documents can all be large files and can add up quickly for certain organizations. As of the writing of this blog 1 TB of Oracle frequent access Object Storage is about $25/month (the latest pricing can be found at the Oracle Cloud Cost Estimator). You can save even more money by storing some of your lesser-used data in Oracle infrequent access Object Storage, which is less than half the cost of frequent access storage.

Many modern analytical workloads require this type of raw data, especially Machine Learning (ML) and Artificial Intelligence (AI) solutions. Oracle's AI Vision service must have the raw photos to do its image processing. The OCI Generative AI Service Large Language Model (LLM) can be trained with your corporate documents to provide more relevant responses. Monitoring sensor data can help identify when a machine needs adjustments or preventive maintenance but that requires algorithms that monitor the large streams of data being generated in real time. These are just a handful of examples of what can be done with the variety of data available to organizations. Oracle has an excellent website that goes over Data Lake concepts along with a couple of other use cases here.

Hopefully you see the benefits of having a repository of all of your different types of data in one place. However, I often get asked, why would you take structured data and put into a Data Lake? A Data Lake at its core is just an organized collection of file objects, isn't that a very inefficient way to report on highly structured data such as monthly sales? That is a very good question, and it will get answered in Part 2 of Maximize insights in Oracle Analytics Cloud with Data Lakehouses where I will focus on how an Oracle Data Warehouse compliments a Data Lake. In future posts I'll discuss how the Lakehouse can be leveraged by Oracle Analytics Cloud to further your analytics capabilities.