You can edit almost every page by Creating an account. Otherwise, see the FAQ.

Data Lakehouse

From EverybodyWiki Bios & Wiki





Script error: No such module "Draft topics". Script error: No such module "AfC topic".

A data lakehouse is an architectural pattern for scalable data storage which builds upon data lake architecture while adding data warehouse management technology that enables usage for traditional business intelligence or machine learning workloads. A data lakehouse is characterized by 1) open direct-access data formats, such as Apache Parquet, Apache ORC, Delta Lake, and Apache Iceberg, 2) first-class support for Business Intelligence and data science workloads, and 3) the ability to process structured, semi-structured, and unstructured data.[1][2][3][4]

Background[edit]

While the technologies used in modern data lakehouse architectures have been present since 2016[5][6][7], the term data lakehouse was popularized by Databricks in 2020[8] to promote usage of the Delta Lake storage layer, which combined the benefits of data lakes, MPP-style data warehouses, and streaming analytics — as a solution to data quality issues which occurred in large data repositories.[9] Other organizations such as AWS[10], OneHouse[11], Dremio[12], and Facebook[13], Google Cloud[14] and Dell[15] subsequently adopted the term to describe new or existing products.

Notable examples[edit]

All of the traditional data lake providers advertise data lakehouse offerings such as:

Other cloud-native companies which sit on top of existing cloud data lakes such as Amazon S3, Azure Data Lake, Google Cloud Storage also provide data lakehouse offerings:

See also[edit]

References[edit]

  1. Armbrust, Michael; Ali Ghodsi; Reynold Xin; Matei Zaharia. "Lakehouse: a new generation of open platforms that unify data warehousing and advanced analytics" (PDF). InProceedings of CIDR 2021 Jan.
  2. Levins, M., Srivastava, R., Inmon, B. (2021). Building the Data Lakehouse. United States: Technics Publications.
  3. Marr, Bernard (18 July 2022). "What Is A Data Lakehouse? A Super-Simple Explanation For Anyone". Forbes.
  4. Woodie, Alex (2023-02-15). "Open Table Formats Square Off in Lakehouse Data Smackdown". Datanami.
  5. Kerner, Sean Michael (2022-02-22). "Onehouse emerges with managed Apache Hudi data lake service". TechTarget.
  6. Armbrust, Michael; Chambers, Bill; Zaharia, Matei (2017-10-25). "Databricks Delta: A Unified Data Management System for Real-time Big Data". Databricks Blog.
  7. Woodie, Alex (8 February 2021). "Apache Iceberg: The Hub of an Emerging Data Service Ecosystem?". Datanami.
  8. Databricks (2020-01-30). "What Is a Lakehouse?". Databricks Blog.
  9. Woodie, Alex (2019-04-08). "How Databricks Keeps Data Quality High with Delta". Datanami.
  10. 10.0 10.1 Woodie, Alex (2020-12-07). "AWS Bolsters Its Lakehouse". Datanami.
  11. Miller, Rob (2022-02-02). "With $8M seed, Onehouse builds open source data lakehouse, eyes managed service". TechCrunch.
  12. Sharma, Shubham (2022-03-02). "Dremio launches free data lakehouse service for enterprises". VentureBeat.
  13. Biswapesh, Chattopadhyay; Pedro, Pedreira; Sameer, Agarwal; Yutian, Sun (2023-01-26). Shared Foundations: Modernizing Meta's Data Lakehouse (PDF). CIDR '23. Amsterdam, The Netherlands: Conference on Innovative Data Systems Research.
  14. 14.0 14.1 Goodison, Donna (2022-04-06). "Google Cloud just built a data lakehouse on BigQuery". Protocol.
  15. Mellor, Chris (2022-06-27). "Dell builds its own partner-based data lakehouse". Blocks & Files. Situation Publishing.
  16. Lardinois, Frederic (2022-08-17). "Cloudera launches its all-in-one SaaS data lakehouse". TechCrunch.
  17. "Databricks Simplifies the Path to Building Lakehouses for Business Intelligence and Machine Learning" (Press release). San Francisco: Databricks. Business Wire. 2020-02-24.
  18. "Founded by Ex-Uber Data Architect and Apache Hudi Creator, Onehouse Supercharges Data Lakes for AI and Machine Learning With $8 Million in Seed Funding From Greylock and Addition" (Press release). Menlo Park, CA: Onehouse. Global Newswire. 2022-02-02.


This article "Data Lakehouse" is from Wikipedia. The list of its authors can be seen in its historical and/or the page Edithistory:Data Lakehouse. Articles copied from Draft Namespace on Wikipedia could be seen on the Draft Namespace of Wikipedia and not main one.