You can edit almost every page by Creating an account and confirming your email.

Delta Lake (Software)

From EverybodyWiki Bios & Wiki




Delta Lake
Original author(s)Michael Armbrust, Databricks
Initial releaseApril 2019; 7 years ago (April 2019)
Written inScala, Python
Engine
    Operating systemCross-platform
    TypeData warehouse, Data lake
    LicenseApache License 2.0
    Website

    Search Delta Lake (Software) on Amazon.

    Delta Lake is an open-source storage framework that enables building a data lakehouse architecture with various compute engines and APIs. It brings ACID transactions and scalable metadata handling to big data workloads and addresses common issues with data lakes such as data quality, schema evolution, and concurrency control.[1] Delta Lake is a project under the Linux Foundation and is released under the Apache License.[2][3]


    History

    Databricks open-sourced Delta Lake in April 2019 to the Linux Foundation, but kept some features proprietary. In June 2022 Databricks open-sourced all of Delta Lake[4][5]

    Features

    Delta Lake supports multiple compute engines, such as Apache Spark, Presto, Flink, Trino, and Apache Hive. It also provides APIs for different programming languages, such as Scala, Java, Python, Rust, and Ruby. Delta Lake extends Apache Parquet data files with a file-based transaction log that tracks every change to the data and prevents data corruption.

    Architecture

    Delta Lake works internally by extending Parquet data files with a file-based transaction log (aka "delta log") that tracks every change to the data and ensures ACID transactions. The transaction log consists of JSON files that contain information about the actions performed on the data, such as add, remove, set transaction, and commit. The transaction log also maintains a snapshot of the current state of the data by using checkpoints that store Parquet metadata.[6]

    References

    1. Armbrust, Michael; Das, Tathagata; Sun, Liwen; Yavuz, Burak; Zhu, Shixiong (2020-08-01). "Delta lake: high-performance ACID table storage over cloud object stores" (PDF). Proceedings of the VLDB Endowment. 13 (12): 3411–3424. doi:10.14778/3415478.3415560. Unknown parameter |s2cid= ignored (help)
    2. "Delta Lake GitHub License". The Apache Software Foundation. 5 October 2022. Retrieved 5 October 2022.
    3. Woodie, Alex (2023-02-15). "Open Table Formats Square Off in Lakehouse Data Smackdown". Datanami.
    4. Armbrust, Michael; Ghodsi, Ali (2022-06-30). "Open Sourcing All of Delta Lake". News. Databricks. Retrieved 2023-03-02.
    5. Ghoshal, Anirban (2022-06-28). "Open Sourcing All of Delta Lake". InfoWorld. International Data Group.
    6. "Build Lakehouses with Delta Lake". Retrieved 2023-03-02.


    This article "Delta Lake (Software)" is from Wikipedia. The list of its authors can be seen in its historical and/or the page Edithistory:Delta Lake (Software). Articles copied from Draft Namespace on Wikipedia could be seen on the Draft Namespace of Wikipedia and not main one.