The most popular and comprehensive Open Source ECM platform
Data Analytics: The Rise of Data Lakehouses
‘Analytics Query Accelerators‘ is the name of a Gartner product category of enterprise software that speeds up the queries and analytics of data stored in a data lake architecture. More informally, the name ‘Data Lakehouses‘ is being used.
Data Lakes are created by extracting and transforming all the data contained in applications and repositories of an enterprise and bringing all the data together into a single location. The data lake often has too much data for doing analysis, so a subset of the data is identified and then moved into a data warehouse for processing.
But the two-step approach of creating a data lake and then processing subsets of data in a data warehouse has problems. For one, it’s hard to keep the data in the data lake and data warehouse consistent and current. The data is also very difficult to maintain, and data warehouse software doesn’t usually integrate well with machine learning and AI, a requirement that is becoming ever more important.
That’s where the Data Lakehouse comes in. A Data Lakehouse effectively combines Data Lakes and Warehouses into a single entity that can support many different data formats and has business intelligence, reporting, and AI features built into it. Two vendors with Data Lakehouse products are Databricks and Snowflake.
Bernard Marr, a well-known business consultant and author, said that “the data lakehouse approach is one that’s likely to become increasingly popular as more organizations begin to understand the value of using unstructured data together with AI and machine learning. In the analytics journey, it’s a step up in maturity from the combined data lake and data warehouse model that until recently has been seen as the only option for organizations that want to continue with legacy BI and analytics workflows while also migrating towards smart, automated data initiatives.”
Adam Ronthal, a Gartner analyst, said that “we are moving in the direction where the data lakehouse becomes a best practice, but everyone is moving at a different speed.”