BIG DATA MANAGEMENT

Databricks Recognized as a Leader in 2021 Gartner® Magic Quadrant™ for Cloud Database Management Systems

Databricks | December 20, 2021

Databricks, the Data and AI company and pioneer of data lakehouse architecture, today announced that Gartner has positioned Databricks as a Leader in its 2021 Magic Quadrant for Cloud Database Management Systems (DBMS) report for the first time. In combination with its positioning as a Leader in the 2021 Gartner® Magic Quadrant for Data Science and Machine Learning Platforms (DSML) report earlier this year, Databricks is now the only cloud native vendor to be recognized as a Leader in both Magic Quadrant reports.

The Gartner report evaluated 20 different vendors based on completeness of vision and ability to execute within the rapidly evolving market.

"We consider our positioning as a Leader in both of these reports to be a defining moment for the Databricks Lakehouse Platform and confirmation of the vision for lakehouse as the data architecture of the future. We're honored to be recognized by Gartner as we've brought this lakehouse vision to life. We will continue to invest in simplifying customers' data platform through our unified, open approach."

Ali Ghodsi, CEO and Co-Founder of Databricks

We believe the uniqueness of the achievement is in how it was accomplished. It is not uncommon for vendors to show up in multiple Magic Quadrants each year across many domains. But, they are assessed on disparate products in their portfolio that individually accomplish the specific criteria of the report. The results definitively show that one copy of data, one processing engine, one approach to management and governance that's built on open source and open standards –  across all clouds – can deliver class-leading outcomes for both data warehousing and data science/machine learning workloads.

We feel our position as a leader in the 2021 Magic Quadrant for DBMS underscores a year of substantial growth for the company. These milestones include the announcement of its fifth major open source project Delta Sharing, the acquisition of cutting-edge German low-code/no-code startup, 8080 Labs, as well as raising a total of $2.6 billion in funding in 2021 at a current valuation of $38 billion to accelerate the global adoption of its lakehouse platform.

Gartner, "2021 Cloud Database Management Systems," Henry Cook, Merv Adrian, Rick Greenwald, Adam Ronthal, Philip Russom, December 14, 2021

Gartner, "2021 Magic Quadrant for Data Science and Machine Learning Platforms," Peter  Krensky, Carlie Idoine, Erick Brethenoux, Pieter den Hamer, Farhan Choudhary, Afraz Jaffri, Shubhangi Vashisth, March 1, 2021

Gartner does not endorse any vendor, product or service depicted in its research publications and does not advise technology users to select only those vendors with the highest ratings or other designation. Gartner research publications consist of the opinions of Gartner's Research & Advisory organization and should not be construed as statements of fact. Gartner disclaims all warranties, expressed or implied, with respect to this research, including any warranties of merchantability or fitness for a particular purpose.

About Databricks
Databricks is the data and AI company. More than 5,000 organizations worldwide — including Comcast, Condé Nast, H&M, and over 40% of the Fortune 500 — rely on the Databricks Lakehouse Platform to unify their data, analytics and AI. Databricks is headquartered in San Francisco, with offices around the globe. Founded by the original creators of Apache Spark™, Delta Lake and MLflow, Databricks is on a mission to help data teams solve the world's toughest problems.

Spotlight

Companies that thrive in the next decade will be those that value a data-driven workforce. In recent years, we've seen the swift rise of the data scientist, the data engineer and many other roles in data science. But what exactly does a career in data science involve....


Other News
DATA ARCHITECTURE

Matillion Announces Matillion ETL for Databricks Partner Connect and public preview of Matillion Data Loader for Databricks

Matillion | June 29, 2022

Matillion, the leading enterprise cloud data integration platform, announced Matillion ETL is available now on Databricks Partner Connect, a one-stop portal for discovering and connecting validated data, analytics, and AI tools. Availability in Partner Connect allows customers to easily bring business-critical data from applications, files, and databases into the Databricks Lakehouse Platform without any pre-configuration. Matillion ETL for Delta Lake on Databricks delivers easy-to-use, cloud-native data integration and transformation for the lakehouse, enabling more users to take advantage of the lakehouse architecture. Matillion's platform supports multiple use cases that enable businesses to achieve faster time to value on their cloud data journey. Matillion ETL's high-efficiency, code-optional environment gives developers everything they need to perform more complex loading tasks, and apply business rules to data pipelines at scale. Similarly, Matillion Data Loader's no-code simplicity empowers data scientists, analysts, and line of business managers to quickly load data into the cloud without coding. "We are excited to bring on Matillion as a new partner in Partner Connect. They enable a new class of users who are not comfortable or don't want to write code to ingest and transform data on the Databricks Lakehouse Platform. "Now, access to Matillion's GUI-based data integration platform is a matter of a few clicks, empowering anyone to make data ready and available for analytics using the power of the lakehouse architecture." Adam Conway, SVP of Products at Databricks Matillion users can perform transformations to prepare data for BI, analytics, machine learning, and artificial intelligence. With the Universal Connectivity feature that gives users the ability to create custom connectors inside Matillion ETL in minutes, users can connect with virtually any data source and quickly build data pipelines to ingest data into Delta Lake on Databricks. "As enterprises continue to move data workflows to the cloud, there is an enormous opportunity for better performance, speed, and scalability by leveraging the data lakehouse and Matillion's low-code/no-code approach to data integration," said Ciaran Dynes, chief product officer at Matillion. "With these two cloud technologies, enterprises can accelerate their analytics projects, empowering their teams to focus on delivering BI and data science results to their business stakeholders." In addition to the release of Matillion ETL for Databricks Partner Connect, Matillion announced the public preview of Matillion Data Loader to quickly and easily ingest data at speed and scale into Delta Lake on Databricks. The unified loading experience of both batch and change data capture pipelines in the same interface helps increase user productivity and accelerate time to value. When paired with Matillion's low-code transformation capabilities in Matillion ETL for Databricks, the offering now provides a complete solution for loading and transforming data into the Databricks lakehouse. About Matillion Matillion makes the world's data useful with an easy-to-use, cloud-native data integration and transformation platform. Optimized for modern enterprise data teams, only Matillion is built on native integrations to cloud data platforms such as Snowflake, Delta Lake on Databricks, Amazon Redshift, Google BigQuery, and Microsoft Azure Synapse to enable new levels of efficiency and productivity across any organization.

Read More

BIG DATA MANAGEMENT

Qlik Expands Real-Time Data Integration and Cloud Analytics Services for Snowflake

Qlik | June 16, 2022

Qlik® today announced multiple new and enhanced capabilities that will help customers maximize the value of their investment in Snowflake Data Cloud. These services expand the ability to both seamlessly feed Snowflake with near real-time data and more easily access data in real-time and action it for decision making across the enterprise. “Customers are looking to augment their investment in Snowflake with services that accelerate the access and availability of near real-time data for modern analytics. “We are excited about Qlik’s latest SaaS platform integrations with Snowflake, which can make it easier for customers to leverage near real-time data through Snowflake to improve decision-making across the organization with up-to-date insights.” Tarik Dwiek, Head of Technology Alliances of Snowflake “Since the start of our collaboration with Qlik and Snowflake, we’ve seen some immense results. By moving the database to the cloud and having it work with Qlik's automated solutions, we can unlock new data sources in our data warehouse within five days, instead of 20, and create new incremental data marts in days rather than weeks. We also achieved a 75% cost reduction on ETL development,” said Maikel Jaspers, BI Engineer at Ewals Cargo Care. “Now not only can we run reports more efficiently, but we can also fulfil more requests. This has given us more agility, and ultimately makes the entire company much more effective, something we've been striving to do for a very long time.” Qlik enhanced its Cloud Analytics Services for Snowflake with two new features that help customers drive more value from near real-time data when deploying Qlik’s cloud platform with Snowflake. Direct Query enables users to automatically generate pushdown SQL to query Snowflake on demand from within Qlik Sense. This combines the ability to optimize Snowflake queries with immediate access to the most recent data, while allowing for the creation of best-in-class visualizations and dashboards in Qlik Sense. Qlik’s FinOps and query optimization analytic applications expand on Qlik’s existing Snowflake usage dashboards and provide CDOs and data leaders with a comprehensive understanding of what is driving Snowflake usage and how to best optimize Snowflake workloads to improve the user experience. Qlik also released new and enhanced Qlik Cloud Data Services capabilities for Snowflake, including: Real-time change data capture and movement to Snowflake including major databases, SAP, mainframe and SaaS applications. Data warehouse automation for model-driven code generation, which dramatically reduces the time, cost, and risk to realize the full potential of Snowflake. Reverse ETL to replicate enriched data from the Snowflake platform back to the operational systems of record. At QlikWorld, Qlik’s recent customer and partner event, industry leaders such as ABB, Best Buy Canada, CSS, Harman, Novartis, SDI and Urban Outfitters showcased how they leverage Qlik solutions with Snowflake to activate data for insights and action. ABB demonstrated how they combine Qlik Data Integration to feed Snowflake with SAP and other data sources to unlock the hidden value of data at scale and help create deeper insights faster. Best Buy Canada showed how they partnered with Accenture, Qlik and Snowflake to accelerate the delivery of the initial phase of a multi-phased data ecosystem modernization journey in the cloud. CSS outlined how they built a modern cloud data lakehouse with near real-time data ingestion and fully automated creation of analytics structures for rapid business insights by using Qlik Data Integration with Snowflake. SDI showcased ZEUS, their MRO technology platform, which digitizes a portion of the supply chain by leveraging Qlik and Snowflake to uniquely summarize activity for any supply chain manager. “We’re seeing continued customer success and growth in demand to leverage Qlik and Snowflake together to advance cloud data analytics strategies,” said Itamar Ankorion, SVP of Technology Alliances at Qlik. “These recent new features and enhancements, including Direct Query capabilities and Cloud Data Integration Services, are part of our continued investments to innovate and help customers accelerate time to value and drive more insights and action from their data.” About Qlik Qlik’s vision is a data-literate world, where everyone can use data and analytics to improve decision-making and solve their most challenging problems. A private company, Qlik offers real-time data integration and analytics solutions, powered by Qlik Cloud, to close the gaps between data, insights and action. By transforming data into Active Intelligence, businesses can drive better decisions, improve revenue and profitability, and optimize customer relationships. Qlik serves more than 38,000 active customers in over 100 countries.

Read More

BIG DATA MANAGEMENT

Komprise Automates Unstructured Data Discovery with Smart Data Workflows

Komprise | May 20, 2022

Komprise, the leader in analytics-driven unstructured data management and mobility, today announced Komprise Smart Data Workflows, a systematic process to discover relevant file and object data across cloud, edge and on-premises datacenters and feed data in native format to AI and machine learning (ML) tools and data lakes. Industry analysts predict that at least 80% of the world’s data will be unstructured by 2025. This data is critical for AI and ML-driven applications and insights, yet much of it is locked away in disparate data storage silos. This creates an unstructured data blind spot, resulting in billions of dollars in missed big data opportunities. Komprise has expanded Deep Analytics Actions to include copy and confine operations based on Deep Analytics queries, added the ability to execute external functions such as running natural language processing functions via API and expanded global tagging and search to support these workflows. Komprise Smart Data Workflows allow you to define and execute a process with as many of these steps needed in any sequence, including external functions at the edge, datacenter or cloud. Komprise Global File Index and Smart Data Workflows together reduce the time it takes to find, enrich and move the right unstructured data by up to 80%. “Komprise has delivered a rapid way to visualize our petabytes of instrument data and then automate processes such as tiering and deletion for optimal savings,” says Jay Smestad, senior director of information technology at PacBio. “Now, the ability to automate workflows so we can further define this data at a more granular level and then feed it into analytics tools to help meet our scientists’ needs is a game changer.” Komprise Smart Data Workflows are relevant across many sectors. Here’s an example from the pharmaceutical industry: 1) Search: Define and execute a custom query across on-prem, edge and cloud data silos to find all data for Project X with Komprise Deep Analytics and the Komprise Global File Index. 2) Execute & Enrich: Execute an external function on Project X data to look for a specific DNA sequence for a mutation and tag such data as "Mutation XYZ". 3) Cull & Mobilize: Move only Project X data tagged with "Mutation XYZ" to the cloud using Komprise Deep Analytics Actions for central processing. 4) Manage Data Lifecycle: Move the data to a lower storage tier for cost savings once the analysis is complete. Other Smart Data Workflow use cases include: Legal Divestiture: Find and tag all files related to a divestiture project and move sensitive data to an object-locked storage bucket and move the rest to a writable bucket. Autonomous Vehicles: Find crash test data related to abrupt stopping of a specific vehicle model and copy this data to the cloud for further analysis. Execute an external function to identify and tag data with Reason = Abrupt Stop and move only the relevant data to the cloud data lakehouse to reduce time and cost associated with moving and analyzing unrelated data. “Whether it’s massive volumes of genomics data, surveillance data, IoT, GDPR or user shares across the enterprise, Komprise Smart Data Workflows orchestrate the information lifecycle of this data in the cloud to efficiently find, enrich and move the data you need for analytics projects. “We are excited to move to this next phase of our product journey, making it much easier to manage and mobilize massive volumes of unstructured data for cost reduction, compliance and business value.” Kumar Goswami, CEO of Komprise About Komprise Komprise is a provider of unstructured data management and mobility software that frees enterprises to easily analyze, mobilize, and monetize the right file and object data across clouds without shackling data to any vendor. With Komprise Intelligent Data Management, you can cut 70% of enterprise storage, backup and cloud costs while making data easily available to cloud-based data lakes and analytics tools.

Read More

DATA SCIENCE

Saturn Cloud and Bodo.ai Partner to Bring Extreme Performance Python to Data Scientists

Saturn Cloud | June 02, 2022

Saturn Cloud, the data science and machine learning platform and bodo.ai, a parallel data compute platform providing extreme scale and speed for Python, have announced their partnership to take Python analytics performance to the next level for data science teams. Data scientists develop multiple workflows across teams, and rely on Saturn Cloud to provide a collaborative environment and computing resources. With this partnership, those teams now have seamless access to the Bodo platform - allowing them to scale prototypes to petabyte-scale parallel-processing production without any tuning or re-coding. Saturn Cloud's pre-built tools allow data science teams to collaborate and scale easily, without locking users into patterns. Instead, the platform encourages the workflow the user already has, while providing an environment where they don't need to rely on dev sources or manage compute environments. It prioritizes keeping the data scientist self-sufficient, while being able to collaborate and share work more efficiently. Bodo offers a parallel compute platform providing extreme scale and speed, but with the simplicity and flexibility of using native Python. In contrast to using libraries and frameworks like Spark, Bodo is a new type of compiler offering automatic parallelism and high efficiency surpassing 10,000+ cores. Bodo can also be used natively with analytics packages such as Pandas, NumPy, SciKit Learn, and more. The joint solution is available immediately, with bodo.ai software running within Saturn Cloud resources. Saturn Cloud provides a pre-built template with Bodo already installed and configured. Then, users are able to access the functionality of bodo.ai within JupyterLab or via SSH from VSCode, PyCharm, or the terminal. By using Saturn Cloud, users are able to get up to 4TB of RAM and 128 vCPUs, all backing the powerful software of Bodo. You can try the following examples right away here: Use Bodo to speed up feature engineering and model training or use Bodo to speed up data manipulation and analysis. "Our partnership is focused on providing massive speed and productivity improvements to data scientists struggling with large-scale analytics projects. Bodo's platform adds terabyte-scale processing with unheard-of infrastructure efficiencies for Saturn Cloud users." Behzad Nasre, CEO, Bodo "We not only want to provide a flexible workspace for data science teams, but enable greater Python scaling capabilities to increase productivity in projects that are more demanding. This joint offering with Bodo will give users an opportunity to take their work to the next level with automatic parallelization for better overall performance," says Sebastian Metti, one of the Saturn Cloud founders. About Saturn Cloud Saturn Cloud is a data science and machine learning platform flexible enough for any team. Collaborate together in the cloud on analyses and model training, then deploy your code. All using the same patterns you're used to, but with cloud scale. Learn more here. About Bodo Founded in 2019, Bodo.ai is an extreme-performance parallel compute platform for data analytics, scaling past 10,000 cores and petabytes of data with unprecedented efficiency and linear scaling. Leveraging automatic parallelization and the first inferential compiler, Bodo is helping F500 customers solve some of the world's largest data analysis problems. And doing so in a fraction of traditional time, complexity, and cost, all while leveraging the simplicity and flexibility of native Python. Developers can deploy Bodo on any infrastructure, from a laptop to a public cloud.

Read More

Spotlight

Companies that thrive in the next decade will be those that value a data-driven workforce. In recent years, we've seen the swift rise of the data scientist, the data engineer and many other roles in data science. But what exactly does a career in data science involve....

Resources