Databricks Open Sources Delta Lake for Data Lake Reliability

Databricks, a leader in Unified Analytics and founded by the original creators of Apache Spark™, announced a new open source project called Delta Lake to deliver reliability to data lakes. Delta Lake is the first production-ready open source technology to provide data lake reliability for both batch and streaming data. This new open source project will enable organizations to transform their existing messy data lakes into clean Delta Lakes with high quality data, thereby accelerating their data and machine learning initiatives.
While attractive as an initial sink for data, data lakes suffer from data reliability challenges. Unreliable data in data lakes prevents organizations from deriving business insights quickly and significantly slows down strategic machine learning initiatives. Data reliability challenges derive from failed writes, schema mismatches and data inconsistencies when mixing batch and streaming data, and supporting multiple writers and readers simultaneously.

Spotlight

Other News

Dom Nicastro | April 03, 2020

Read More

Dom Nicastro | April 03, 2020

Read More

Dom Nicastro | April 03, 2020

Read More

Dom Nicastro | April 03, 2020

Read More