Business Strategy

Databricks Introduces Data Lineage For Unity Catalog

Databricks
Databricks, the data and AI business that pioneered the data lakehouse concept, has unveiled data lineage for Unity Catalog, dramatically increasing the lakehouse's data governance capabilities. The movement of data within an organization is described by data lineage. Customers can obtain insight into where data in their lakehouse originated from, who developed it and when, how it has been amended over time, how it is being utilized, and much more by using this new Unity Catalog functionality. Unity Catalog data lineage is now available for preview on AWS and Microsoft Azure.

Organizations deal with a flood of data from numerous sources, and it's very difficult to understand where that data originated from, how it's moving and changing, who has access to it, and how it's being utilized. However, having that knowledge is critical for establishing trust and assessing risk. With data lineage for Unity Catalog, data teams can view all of the downstream consumers affected by data changes – apps, dashboards, machine learning models, or data sets, for example – and rapidly assess the degree of the effect and alert the appropriate stakeholder of changes.

Data lineage enables data consumers, like data scientists, data engineers, and data analysts, to undertake context-aware analysis, leading to higher-quality results. Moreover, data stewards can detect which data sets are no longer used or have become outdated in order to retire superfluous data, lowering risk and ensuring end users only consume high-quality data. The new Unity Catalog features provide enterprises with a comprehensive picture of the full data lifecycle, allowing data executives to understand how data is gathered, if it has been updated, and the methods employed.

"Governance capabilities such as data lineage are critical as we work to build the industry's most robust lakehouse platform. Without good data lineage, it is challenging to track the business and verification processes that data-driven organizations need to be successful. Our goal is to ensure our customers can focus on insights, and move toward proactive data management practices through a unified, transparent view of their entire data ecosystem."

Matei Zaharia, Co-Founder and Chief Technologist at Databricks

Unity Catalog's key features involve automatic run-time lineage, which captures every lineage created in Databricks, giving more accuracy and efficiency than manually tagging data. This data is collected for tables, views, and columns to provide a detailed picture of upstream and downstream data flows. Lineage also works across all Databricks workloads, including SQL, Python, R, and Scala, enabling any data personas to enrich their tools with data intelligence and superior insights. This involves tracking the history of entries such as notebooks, processes, and dashboards.

Spotlight

Other News
Business Intelligence

EAGLYS, Mitsui and Quantinuum Collaborate to Build a Quantum-Resistant Data Analytics (AI) Platform Using Quantum Computing Hardened Encryption Keys

Mitsui & Co., Ltd | December 13, 2023

In partnership, EAGLYS, Inc., Mitsui & Co., Ltd., and Quantinuum have integrated Quantum Origin into EAGLYS' secure computation product DataArmor, strengthening the platform against the quantum threat to encrypted data. Mitsui and EAGLYS have used secure computation technology to build a platform that allows research institutes and businesses to securely collaborate using each other's data and AI models. DataArmor uniquely maintains the confidentiality and security of sensitive data and AI models using homomorphic encryption, a technology that allows analysis to be performed on data while still encrypted. This protects encrypted data from being revealed when shared, safeguarding organizations and their Intellectual Property against advanced cyber threats. EAGLYS has now integrated Quantum Origin, the world's only solution that uses the power of quantum computing processes to provably strengthen encryption keys, as part of their Quantum-Resistant Data Analytics (AI) Platform. This integration strengthens the resilience of DataArmor against the threat of a quantum-computing-based attack. Quantum computers are expected to empower significant innovation in the future and create approximately 100 trillion yen in value by 20351. At the same time, the development of quantum computing technology presents a new threat to the cryptographic security measures that protect the confidentiality of encrypted data and communications. RSA, one of the most widely used cryptographic algorithms, may soon be deciphered by cybercriminals to expose confidential data. It is unclear when a quantum computer will be capable of cracking existing encryption, but organizations are increasingly concerned about cyber security attacks called 'Hack now, Decrypt later.' A recent Deloitte survey revealed that over 50%2 of cyber professionals believe their organization is at risk from this attack, where malicious third parties steal and store encrypted data to decrypt it when the quantum computing technology is available. As a countermeasure against such potential threats, organizations must strengthen encryption protection for data and AI models by using post-quantum cryptographic algorithms and hardening cryptographic keys. Research institutes and businesses increasingly need to collaborate using each other's data and AI models to accelerate innovation in chemical materials development, drug discovery, financial analysis, and retail trends. To maintain the confidentiality and security of the data and AI models used in these collaborations, EAGLYS' DataArmor platform combines fully homomorphic encryption based on lattice cryptography and Quantum Origin's quantum-derived entropy for key generation to strengthen protection against a quantum computing based attack. Going forward, the three companies will continue to develop new use cases utilizing this platform. "We are pleased to collaborate on this advanced project with Quantinuum, a world-class quantum computing company, and our valued partner Mitsui & Co. This collaboration is an important initiative for our business and will enhance our Homomorphic Encryption platform, enabling us to create new value for our customers through AI and data in a highly secure environment. By deploying the Quantum-Resilient AI Platform together with the three companies, we hope to further increase customer value through secure data sharing and AI in the chemical, medical, financial, and retail industries.", said Hiroki Imabayashi, Founder and CEO, EAGLYS Inc. "Through this collaboration, we are able to present a quantum-resistant data analytics (AI) platform capable of creating new value using EAGLYS' secure computation technology. We are convinced that by combining EAGLYS' secure computation technology with Quantum Origin, DataArmor will become an increasingly important solution to prepare for potential threats that may accompany the commercialization of quantum computers. We will continue collaborating with both companies and focus on realizing new customer value by combining Secure Computation and Quantum Technology." said Koji Naniwada, Deputy General Manager & Senior Tech Lead, Quantum Innovation Dept., Corporate Development Div., Mitsui & Co., Ltd. Duncan Jones, Head of Cybersecurity, Quantinuum, added, "Hardening encryption keys is critical to protecting sensitive data in the post-quantum era, and Quantum Origin is the world's only technology that provably strengthens key generation. By integrating Quantum Origin, EAGLYS is future-proofing the security and integrity of its customer's data." About Quantinuum Quantinuum is one of the world's largest integrated quantum computing companies, formed by the combination of Honeywell Quantum Solutions' world-leading hardware and Cambridge Quantum's class-leading middleware and applications. Quantinuum accelerates quantum computing and the development of applications across chemistry, cybersecurity, finance, and optimization. The company employs over 480 individuals, including 350 scientists, at nine sites across the United States, Europe, and Japan. About Mitsui & Co., Ltd. Mitsui & Co. is a global trading and investment company with a presence in more than 60 countries and a diverse business portfolio covering a wide range of industries. The company identifies, develops, and grows its businesses in partnership with a global network of trusted partners including world leading companies, combining its geographic and cross-industry strengths to create long-term sustainable value for its stakeholders. Mitsui has set three key strategic initiatives for its current Medium-term Management Plan: supporting industries to grow and evolve with stable supplies of resources and materials, and providing infrastructure; promoting a global transition to low-carbon and renewable energy; and empowering people to lead healthy lives through the delivery of quality healthcare and access to good nutrition. About EAGLYS Inc EAGLYS is a company that provides an AI platform that uses AI and Homomorphic Encryption. The company creates new value by collaborating data from various industries such as chemical, manufacturing, medical, and retail. Particularly in the field of chemistry, EAGLYS provides solutions that significantly change industry practices.

Read More

Big Data Management

data.world Integrates with Snowflake Data Quality Metrics to Bolster Data Trust

data.world | January 24, 2024

data.world, the data catalog platform company, today announced an integration with Snowflake, the Data Cloud company, that brings new data quality metrics and measurement capabilities to enterprises. The data.world Snowflake Collector now empowers enterprise data teams to measure data quality across their organization on-demand, unifying data quality and analytics. Customers can now achieve greater trust in their data quality and downstream analytics to support mission-critical applications, confident data-driven decision-making, and AI initiatives. Data quality remains one of the top concerns for chief data officers and a critical barrier to creating a data-driven culture. Traditionally, data quality assurance has relied on manual oversight – a process that’s tedious and fraught with inefficacy. The data.world Data Catalog Platform now delivers Snowflake data quality metrics directly to customers, streamlining quality assurance timelines and accelerating data-first initiatives. Data consumers can access contextual information in the catalog or directly within tools such as Tableau and PowerBI via Hoots – data.world’s embedded trust badges – that broadcast data health status and catalog context, bolstering transparency and trust. Additionally, teams can link certification and DataOps workflows to Snowflake's data quality metrics to automate manual workflows and quality alerts. Backed by a knowledge graph architecture, data.world provides greater insight into data quality scores via intelligence on data provenance, usage, and context – all of which support DataOps and governance workflows. “Data trust is increasingly crucial to every facet of business and data teams are struggling to verify the quality of their data, facing increased scrutiny from developers and decision-makers alike on the downstream impacts of their work, including analytics – and soon enough, AI applications,” said Jeff Hollan, Director, Product Management at Snowflake. “Our collaboration with data.world enables data teams and decision-makers to verify and trust their data’s quality to use in mission-critical applications and analytics across their business.” “High-quality data has always been a priority among enterprise data teams and decision-makers. As enterprise AI ambitions grow, the number one priority is ensuring the data powering generative AI is clean, consistent, and contextual,” said Bryon Jacob, CTO at data.world. “Alongside Snowflake, we’re taking steps to ensure data scientists, analysts, and leaders can confidently feed AI and analytics applications data that delivers high-quality insights, and supports the type of decision-making that drives their business forward.” The integration builds on the robust collaboration between data.world and Snowflake. Most recently, the companies announced an exclusive offering for joint customers, streamlining adoption timelines and offering a new attractive price point. The data.world's knowledge graph-powered data catalog already offers unique benefits for Snowflake customers, including support for Snowpark. This offering is now available to all data.world enterprise customers using the Snowflake Collector, as well as customers taking advantage of the Snowflake-only offering. To learn more about the data quality integration or the data.world data catalog platform, visit data.world. About data.world data.world is the data catalog platform built for your AI future. Its cloud-native SaaS (software-as-a-service) platform combines a consumer-grade user experience with a powerful Knowledge Graph to deliver enhanced data discovery, agile data governance, and actionable insights. data.world is a Certified B Corporation and public benefit corporation and home to the world’s largest collaborative open data community with more than two million members, including ninety percent of the Fortune 500. Our company has 76 patents and has been named one of Austin’s Best Places to Work seven years in a row.

Read More

Data Visualization

DataOps.live to Deliver a Streamlined Data Management Process for Snowflake's Global Technical Sales Teams

DataOps.live | December 11, 2023

DataOps.live, The Data Products Company™, today announced that Snowflake has licensed the DataOps.live platform to prepare and deliver technical sales demonstrations for its global customers. The speed and productivity gains of the enhanced process, called Snowflake Solutions Center, are cost effective and will accelerate sales performance, impacting business insights that are made available to management. "Snowflake's collaboration with DataOps.live will enhance the efficiency, productivity, and value our Sales Engineers bring to our selling efficiency," said Eve Besant, SVP Worldwide Sales Engineering at Snowflake. The Snowflake Solutions Center, supported by the DataOps.live platform, will enable Snowflake users to seamlessly configure, build, test, and deploy data projects and data products for sales demonstration purposes. This comprehensive and collaborative solution includes the following features: Solutions Catalog: Enables users to build and maintain a solutions catalog of Snowflake data product demonstrations. One-Click Deployment: Allows Sales Engineers to effortlessly deploy new instances of Snowflake data product demonstrations with a single click. Lifecycle Management: Manages the entire lifecycle of these data products. Declarative Infrastructure Management: Utilizes DataOps.live's SOLE engine for full declarative management of Snowflake infrastructure. CI/CD and Orchestration: Capabilities that enable efficient project management. Python Orchestration: Delivers streamlined workflows. Git Workflow and Governance: Ensures effective governance with full Git workflow and governance features. DataOps Development Environment: Allows users to utilize the unique cloud-based IDE using VS code and Git for enhanced productivity. Snowpark Applications: Users can now build Snowpark applications with ease. Streamlit and Snowflake Native Apps: Effortlessly deploy Streamlit and Snowflake Native Apps. The solution provided to Snowflake has been tested for many months by a core team of Sales Engineering (SE) leaders and participants from across the globe in a collaborative process to ensure success. A total of 750 Snowflake SEs are anticipated to be trained and onboarded in the coming months to leverage the benefits of the DataOps.live platform in their sales process. "We have worked closely as a partner of Snowflake since 2017 and welcomed Snowflake Ventures as an investor in DataOps.live in 2021, so this collaborative solution is a great honor for us. As the platform to support the Snowflake Solution Center, we look forward to helping them enhance the efficiency and productivity of their solution selling process and drive real value on a daily basis," said Justin Mullen, co-Founder and CEO, DataOps.live. "We are so proud to expand our collaboration with Snowflake and to now include them as a strategic customer," said John F. Marchese, EVP Strategic Business, Alliances & Channels, DataOps.live. "This agreement marks a significant step forward in Snowflake's commitment to provide their worldwide SEs and other customer-facing technical sales staff with cutting-edge DataOps capabilities. Their belief in, and selection of, DataOps.live is an incredible endorsement for our unique ability to create value and drive ROI in business outcomes." About DataOps.live DataOps.live – the Data Products company, delivers productivity breakthroughs for data teams by enabling agile DevOps automation (#TrueDataOps) and a powerful Developer Experience (DX) to modern data platforms. The DataOps.live SaaS platform brings automation, orchestration, continuous testing and unified observability to deliver the Data Products you want at the speed the business needs. DataOps.live is a global company funded by Notion Capital, Anthos Capital and Snowflake Ventures, with enterprise clients including Roche Diagnostics and OneWeb.

Read More

Big Data Management

Dremio Partners With Carahsoft to Bring Modern Data Infrastructure to the Public Sector

Dremio | January 05, 2024

Dremio, the easy and open data lakehouse, and Carahsoft Technology Corp., The Trusted Government IT Solutions Provider, today announced a partnership. Under the agreement, Carahsoft will serve as Dremio’s Master Government Aggregator®, making the company’s complete cloud and software portfolio for Government, Defense, Intelligence and Education available through Carahsoft’s reseller partners and NASA Solutions for Enterprise-Wide Procurement (SEWP) V, Information Technology Enterprise Solutions – Software 2 (ITES-SW2), National Association of State Procurement Officials (NASPO) ValuePoint, National Cooperative Purchasing Alliance (NCPA) and OMNIA Partners contracts. This collaboration paves the way for Public Sector organizations to harness cutting-edge data analytics capabilities which empower them to make smarter decisions and significantly enhance operational efficiency through lightning-fast data access. Dremio propels agencies into the future by embracing a state-of-the-art data lakehouse architecture in Public Sector organizations. By transitioning to Dremio's solutions, organizations can enjoy sub-second query performance and a remarkable 10-fold improvement in price performance. The new environment eliminates costly and complex legacy data lake solutions and implements a flexible, highly modern architecture. Dremio provides cost-effective self-service analytics and data management, simplifying data pipelines and ETL complexity while accelerating insights across diverse storage locations. With a proven track record of modernizing legacy Hadoop environments and implementing the modern data lakehouse solution, Dremio and Carahsoft aim to transform Public Sector data management by breaking down data silos. Embracing a data mesh concept enables efficient handling of various data sources, fostering cross- collaboration and decentralized data control. Dremio’s expertise supports Public Sector agencies in implementing these principles, eradicating data silos for a more collaborative and efficient approach to data analytics. "Public sector organizations face a range of data infrastructure challenges, many of which are common to both Government and non-government entities. These issues often hinder effective data management, analysis, and decision-making,” said Roger Frey, Vice President of Alliances at Dremio. “Dremio's mission is to make data easily accessible and analyzable for all users, regardless of where it resides. We are excited to partner with Carahsoft to bring our state-of-the-art data analytics solutions to the Public Sector." “Within the Public Sector’s intricate data landscape, complexities often impede efficient data management and decision making,” said Laura Howton, Sales Director who leads the Analytics and Data Management Team at Carahsoft. “By adding Dremio to our AI and Machine Learning portfolio, our reseller partners can now provide modern, cost-effective and easily accessible data analytics tools to Government customers, bolstering their modernization efforts.” Dremio’s solutions are available through Carahsoft’s SEWP V contracts NNG15SC03B and NNG15SC27B, ITES-SW2 Contract W52P1J-20-D-0042, NASPO ValuePoint Master Agreement #AR2472, NCPA Contract NCPA01-86 and OMNIA Partners Contract #R191902. For more information, contact the Carahsoft team at 571-591-6430 or Dremio@carahsoft.com; or register for a complimentary webcast, “The Key Steps in Decreasing Costs and Improving Performance Through Data Lake Modernization,” Thursday, January 25, at 2p.m. About Dremio Dremio is the easy and open data lakehouse, providing self-service analytics with data warehouse functionality and data lake flexibility across all of your data. Use Dremio's lightning-fast SQL query service and any other processing engine on the same data. Dremio increases agility with a revolutionary data-as-code approach that enables Git-like data experimentation, version control, and governance. In addition, Dremio eliminates data silos by enabling queries across data lakes, databases, and data warehouses, and by simplifying ingestion into the lakehouse. Dremio's fully managed service helps organizations get started with analytics in minutes, and automatically optimizes data for every workload. As the original creator of Apache Arrow and committed to Arrow and Iceberg’s community-driven standards, Dremio is on a mission to reinvent SQL for data lakes and meet customers where they are on their lakehouse journey.

Read More