Big Data Management
data.world | January 24, 2024
data.world, the data catalog platform company, today announced an integration with Snowflake, the Data Cloud company, that brings new data quality metrics and measurement capabilities to enterprises. The data.world Snowflake Collector now empowers enterprise data teams to measure data quality across their organization on-demand, unifying data quality and analytics. Customers can now achieve greater trust in their data quality and downstream analytics to support mission-critical applications, confident data-driven decision-making, and AI initiatives.
Data quality remains one of the top concerns for chief data officers and a critical barrier to creating a data-driven culture. Traditionally, data quality assurance has relied on manual oversight – a process that’s tedious and fraught with inefficacy. The data.world Data Catalog Platform now delivers Snowflake data quality metrics directly to customers, streamlining quality assurance timelines and accelerating data-first initiatives.
Data consumers can access contextual information in the catalog or directly within tools such as Tableau and PowerBI via Hoots – data.world’s embedded trust badges – that broadcast data health status and catalog context, bolstering transparency and trust. Additionally, teams can link certification and DataOps workflows to Snowflake's data quality metrics to automate manual workflows and quality alerts. Backed by a knowledge graph architecture, data.world provides greater insight into data quality scores via intelligence on data provenance, usage, and context – all of which support DataOps and governance workflows.
“Data trust is increasingly crucial to every facet of business and data teams are struggling to verify the quality of their data, facing increased scrutiny from developers and decision-makers alike on the downstream impacts of their work, including analytics – and soon enough, AI applications,” said Jeff Hollan, Director, Product Management at Snowflake. “Our collaboration with data.world enables data teams and decision-makers to verify and trust their data’s quality to use in mission-critical applications and analytics across their business.”
“High-quality data has always been a priority among enterprise data teams and decision-makers. As enterprise AI ambitions grow, the number one priority is ensuring the data powering generative AI is clean, consistent, and contextual,” said Bryon Jacob, CTO at data.world. “Alongside Snowflake, we’re taking steps to ensure data scientists, analysts, and leaders can confidently feed AI and analytics applications data that delivers high-quality insights, and supports the type of decision-making that drives their business forward.”
The integration builds on the robust collaboration between data.world and Snowflake. Most recently, the companies announced an exclusive offering for joint customers, streamlining adoption timelines and offering a new attractive price point. The data.world's knowledge graph-powered data catalog already offers unique benefits for Snowflake customers, including support for Snowpark.
This offering is now available to all data.world enterprise customers using the Snowflake Collector, as well as customers taking advantage of the Snowflake-only offering. To learn more about the data quality integration or the data.world data catalog platform, visit data.world.
About data.world
data.world is the data catalog platform built for your AI future. Its cloud-native SaaS (software-as-a-service) platform combines a consumer-grade user experience with a powerful Knowledge Graph to deliver enhanced data discovery, agile data governance, and actionable insights. data.world is a Certified B Corporation and public benefit corporation and home to the world’s largest collaborative open data community with more than two million members, including ninety percent of the Fortune 500. Our company has 76 patents and has been named one of Austin’s Best Places to Work seven years in a row.
Read More
Data Architecture
SingleStore | January 25, 2024
SingleStore, the database that allows you to transact, analyze and contextualize data, today announced powerful new capabilities — making it the industry’s only real-time data platform. With its latest release, dubbed SingleStore Pro Max, the company announced ground-breaking features like indexed vector search, an on-demand compute service for GPUs/ CPUs and a new free shared tier, among several other innovative new products. Together, these capabilities shrink development cycles while providing the performance and scale that customers need for building applications.
In an explosive generative AI landscape, companies are looking for a modern data platform that’s ready for enterprise AI use cases — one with best-available tooling to accelerate development, simultaneously allowing them to marry structured or semi-structured data residing in enterprise systems with unstructured data lying in data lakes.
“We believe that a data platform should both create new revenue streams while also decreasing technological costs and complexity for customers. And this can only happen with simplicity at the core,” said Raj Verma, CEO, SingleStore. “This isn’t just a product update, it’s a quantum leap… SingleStore is offering truly transformative capabilities in a single platform for customers to build all kinds of real-time applications, AI or otherwise.”
“At Adobe, we aim to change the world through digital experiences,” said Matt Newman, Principal Data Architect, Adobe. “SingleStore’s latest release is exciting as it pushes what is possible when it comes to database technology, real-time analytics and building modern applications that support AI workloads. We’re looking forward to these new features as more and more of our customers are seeking ways to take full advantage of generative Al capabilities.”
Key new features launched include:
Indexed vector search. SingleStore has announced support for vector search using Approximate Nearest Neighbor (ANN) vector indexing algorithms, leading to 800-1,000x faster vector search performance than precise methods (KNN). With both full-text and indexed vector search capabilities, SingleStore offers developers true hybrid search that takes advantage of the full power of SQL for queries, joins, filters and aggregations. These capabilities firmly place SingleStore above vector-only databases that require niche query languages and are not designed to meet enterprise security and resiliency needs.
Free shared tier. SingleStore has announced a new cloud-based Free Shared Tier that’s designed for startups and developers to quickly bring their ideas to life — without the need to commit to a paid plan.
On-demand compute service for GPUs and CPUs. SingleStore announces a compute service that works alongside SingleStore’s native Notebooks to let developers spin up GPUs and CPUs to run database-adjacent workloads including data preparation, ETL, third-party native application frameworks, etc. This capability brings compute to algorithms, rather than the other way around, enabling developers to build highly performant AI applications safely and securely using SingleStore — without unnecessary data movement.
New CDC capabilities for data ingest and egress. To ease the burden and costs of moving data in and out of SingleStore, SingleStore is adding native capabilities for real-time Change Data Capture (CDC) in for MongoDB®, MySQL and ingestion from Apache Iceberg without requiring other third party CDC tools. SingleStore will also support CDC out capabilities that ease migrations and enable the use of SingleStore as a source for other applications and databases like data warehouses and lakehouses.
SingleStore Kai™. Now generally available, and ready for both analytical and transactional processing for apps originally built on MongoDB. Announced in public preview in early 2023, SingleStore Kai is an API to deliver over 100x faster analytics on MongoDB® with no query changes or data transformations required. Today, SingleStore Kai supports BSON data format natively, has improved transactional performance, increased performance for arrays and offers industry-leading compatibility with MongoDB query language.
Projections: To further advance as the world’s fastest HTAP database, SingleStore has added Projections. Projections allow developers to greatly speed up range filters and group by operations by introducing secondary sort and shard keys. Query performance improvements range from 2-3x or more, depending on the size of the table.
With this latest release, SingleStore becomes the industry’s first and only real-time data platform designed for all applications, analytics and AI. SingleStore supports high-throughput ingest performance, ACID transactions and low-latency analytics; and structured, semi-structured (JSON, BSON, text) and unstructured data (vector embeddings of audio, video, images, PDFs, etc.). Finally, SingleStore’s data platform is designed not just with developers in mind, but also ML engineers, data engineers and data scientists.
“Our new features and capabilities advance SingleStore’s mission of offering a real-time data platform for the next wave of gen AI and data applications,” said Nadeem Asghar, SVP, Product Management + Strategy at SingleStore. “New features, including vector search, Projections, Apache Iceberg, Scheduled Notebooks, autoscaling, GPU compute services, SingleStore Kai™, and the Free Shared Tier allow startups — as well as global enterprises — to quickly build and scale enterprise-grade real-time AI applications. We make data integration with third-party databases easy with both CDC in and CDC out support.”
"Although generative AI, LLM, and vector search capabilities are early stage, they promise to deliver a richer data experience with translytical architecture," states the 2023 report, “Translytical Architecture 2.0 Evolves To Support Distributed, Multimodel, And AI Capabilities,” authored by Noel Yuhanna, Vice President and Principal Analyst at Forrester Research. "Generative AI and LLM can help democratize data through natural language query (NLQ), offering a ChatGPT-like interface. Also, vector storage and index can be leveraged to perform similarity searches to support data intelligence."
SingleStore has been on a fast track leading innovation around generative AI. The company’s product evolution has been accompanied by high-momentum growth in customers and surpassing $100M in ARR late last year. SingleStore also recently ranked #2 in the emerging category of vector databases, and was recognized by TrustRadius as a top vector database in 2023. Finally, SingleStore was a winner of InfoWorld’s Technology of the year in the database category. To learn more about SingleStore visit here.
About SingleStore
SingleStore empowers the world’s leading organizations to build and scale modern applications using the only database that allows you to transact, analyze and contextualize data in real time. With streaming data ingestion, support for both transactions and analytics, horizontal scalability and hybrid vector search capabilities, SingleStore helps deliver 10-100x better performance at 1/3 the costs compared to legacy architectures. Hundreds of customers worldwide — including Fortune 500 companies and global data leaders — use SingleStore to power real-time applications and analytics. Learn more at singlestore.com. Follow us @SingleStoreDB on Twitter or visit www.singlestore.com.
Read More
Machine Learning
InterSystems | January 12, 2024
InterSystems, a creative data technology provider dedicated to helping customers solve their most critical scalability, interoperability, and speed problems, today announced general availability of the InterSystems IRIS Cloud SQL and InterSystems IRIS Cloud IntegratedML® services. These fully managed cloud-native smart data services empower developers to build cloud-native database and machine learning (ML) applications in SQL environments with ease.
With Cloud SQL and Cloud IntegratedML, developers can access a next generation relational database-as-a-service (DBaaS) that is fast and easy to provision and use. Embedded AutoML capabilities allow developers to easily develop and execute machine learning models with just a few SQL-like commands in a fully-managed, elastic cloud-native environment.
Shaping a complete data management portfolio for mission-critical applications
As part of the InterSystems Cloud portfolio of smart data services, Cloud SQL and Cloud IntegratedML provide application developers with access to InterSystems proven enterprise-class capabilities as self-service, fully managed offerings on Amazon Web Services (AWS), while providing a fast and seamless on-ramp to the full suite of capabilities in InterSystems IRIS® data platform.
InterSystems IRIS is a next-generation data platform designed for organizations implementing smart data fabrics that provide powerful database management, integration, and application development capabilities. By consolidating these capabilities into a single product, InterSystems IRIS accelerates the time it takes to realize value from data, simplifies overall system architectures, and reduces both maintenance effort and costs.
“We are excited for the capabilities of InterSystems IRIS to be exposed through these new, easy to deploy and easy to use services,” said Scott Gnau, Global Head of Data Platforms at InterSystems. “With native support for AutoML, we give developers the power to build comprehensive, predictive, and prescriptive applications.”
Fully managed, enterprise-class reliability with InterSystems IRIS Cloud SQL
Cloud SQL makes it easy for application developers to leverage advanced relational database capabilities as a fully managed, secure, scalable, high performance, highly available cloud-native database-as-a-service (DBaaS).
Cloud SQL delivers the following benefits for SQL developers:
Extremely high performance, especially for ingesting and processing incoming data and performing SQL queries on the data with low latency at scale
Fast and easy to provision and use
Ability to easily connect client applications via JDBC, ODBC, DB-API, and ADO.NET drivers
Automated security, data encryption, and backups
Automation of machine learning tasks with InterSystems IRIS Cloud IntegratedML
Available as an additional cloud managed service for InterSystems IRIS Cloud SQL customers, Cloud IntegratedML extends the capabilities of Cloud SQL to enable SQL developers to quickly build, tune, and execute machine learning models with just a few SQL-like commands, without moving or copying data to a different environment. A significant advantage of Cloud IntegratedML is the elimination of the need to transfer or replicate data to an external platform to build ML models, or to move ML models to a different environment for execution.
Cloud IntegratedML delivers the following benefits for SQL developers:
Automation of machine learning processes and resource-intensive tasks such as feature engineering, model development, and fine-tuning
Seamless integration of models developed and trained with Cloud IntegratedML within Cloud SQL, facilitating real-time predictive insights and prescriptive actions in response to events and transactions
This comprehensive suite of smart data services establishes the InterSystems Cloud portfolio of smart data services as an optimal choice for SQL developers seeking a robust, high-performance database solution tailored to their needs. The new Cloud SQL and Cloud IntegratedML services are available through InterSystems Developer Hub.
About InterSystems
Established in 1978, InterSystems is the leading provider of next-generation solutions for enterprise digital transformations in the healthcare, finance, manufacturing, and supply chain sectors. Its cloud-first data platforms solve interoperability, speed, and scalability problems for large organizations around the globe. InterSystems is committed to excellence through its award-winning, 24×7 support for customers and partners in more than 80 countries. Privately held and headquartered in Cambridge, Massachusetts, InterSystems has 38 offices in 28 countries worldwide. For more information, please visit InterSystems.com.
Read More