Business Intelligence, Big Data Management, Data Science

Snorkel AI Introduces New Foundation Model Data Platform to Bring the Power of Programmatic Data Development to Generative AI

Snorkel AI Introduces New Foundation Model Data Platform

Snorkel AI, the data-centric AI company, introduced the Foundation Model Data Platform, powered by its unique programmatic data development approach. With Snorkel AI's Foundation Model Data Platform, any company can now use their proprietary enterprise data and knowledge to build custom foundation models (FMs) or large language models (LLMs), or improve the accuracy of leading commercial or open-source models for domain-specific generative AI and predictive AI applications.

Despite the Cambrian explosion for generative AI applications, accuracy and privacy are top challenges for enterprise adoption1. Nearly 40 percent of enterprises are already considering building enterprise-specific LLMs or adapting existing ones using their proprietary data2. The biggest blocker for model development is manually preparing the data that the models are trained with. Snorkel AI brings an unparalleled track record of transforming manual data development processes into programmatic solutions. Some of the world's largest enterprises, such as five of the top ten US banks, Memorial Sloan Kettering, BNY Mellon, Wayfair, and more, use Snorkel Flow to programmatically label data and train models for mission-critical predictive AI applications with production-grade accuracy.

Snorkel AI's new Foundation Model Data Platform expands programmatic data development beyond labeling for predictive AI with two core solutions: Snorkel GenFlow for building generative AI applications and Snorkel Foundry for developing custom LLMs with proprietary data. With Snorkel Flow, GenFlow, and Foundry, enterprises can support critical data development for all the ways enterprises want to leverage FMs and LLMs.

"Wayfair's partnership with Snorkel AI underscores our commitment to machine learning innovation, continually enhancing our customers' on-site search experience among our vast array of 40 million products," said Tulia Plumettaz, Director of Machine Learning at Wayfair. "Snorkel's programmatic labeling approach helps our data scientists improve catalog content automation and overcome accuracy, consistency, and efficiency challenges. In addition, Snorkel's data-centric AI platform supports our mission to utilize foundation models in future developments."

Snorkel AI now offers the full stack of solutions for foundation model data development, including:

  • Snorkel Flow to rapidly build, manage, and deploy predictive AI applications (e.g., classification, information extraction) using programmatic labeling, fine-tuning, and distillation. Enterprises can unlock production-grade accuracy for mission-critical business applications such as financial document analysis, clinical trial analytics, KYC, etc. Snorkel AI's customers have reduced AI development time from months to days and costs by hundreds of thousands of dollars per project.
  • Snorkel GenFlow to rapidly build, manage, and deploy generative AI applications (e.g. summarization, question answering, chat) by programmatically curating, scoring, filtering, and sampling instructions and responses for instruction tuning with RLHF and other methods. Enterprises can improve performance and reliability on specific tasks using their proprietary data.
  • Snorkel Foundry to build custom FMs/LLMs by programmatically sampling, filtering, cleaning, and augmenting proprietary data for domain-specific pre-training. Enterprises can use their data as a differentiator by adapting powerful but generic base models into domain-specific specialist models that can serve as a base for all internal AI applications—predictive and generative.

"Today, everyone uses nearly the same models, algorithms, and approaches for training FMs and LLMs—but it's the data that they train on at all stages which is the differentiator, and the secret sauce that AI-first companies are investing in and guarding most heavily," said Alex Ratner, CEO and co-founder of Snorkel AI. "Our Foundation Model Data Platform enables every enterprise to use their unique, proprietary data and knowledge to build or adapt FMs and LLMs with production-level accuracy on their data and workloads, unlike off-the-shelf FMs. Proprietary data and knowledge is the one durable moat in AI today, and we enable enterprises to own and use this themselves."

Snorkel AI has collaborated with Microsoft to enable Azure AI customers to use proprietary data to fine-tune and customize machine learning models and applications.

“Microsoft delivers the world's most capable foundation models and empowers AI developers focused on deeply customized domain-specific use cases to ground these models with their own data. The advancements made by Snorkel AI in this space have the potential to be transformative across the industry,” said John Montgomery, Corporate Vice President, Program Management, AI Platform at Microsoft, “Snorkel AI's new foundation model platform has the potential to significantly enhance how Azure customers build, fine-tune, and apply large language models across their business. This could fundamentally shift the current paradigm, making AI more accessible and customizable for every enterprise, regardless of size or industry. The power of Snorkel AI’s innovations combined with Microsoft's AI platform is a game-changer.”

Learn more about Snorkel AI’s Foundation Model Data Platform at www.snorkel.ai.

About Snorkel AI

Founded by a team spun out of the Stanford AI Lab, Snorkel AI makes AI development fast and practical by transforming manual AI development processes into programmatic solutions. Snorkel AI enables enterprises to develop AI that works for their unique workloads using their proprietary data and knowledge, 10-100x faster. Backed by Addition, Greylock, GV, In-Q-Tel, Lightspeed Venture Partners and funds and accounts managed by BlackRock, the company is based in Palo Alto. For more information on Snorkel AI, please visit: https://www.snorkel.ai/ or follow @SnorkelAI.

Spotlight

Other News
Big Data

Airbyte Racks Up Awards from InfoWorld, BigDATAwire, Built In; Builds Largest and Fastest-Growing User Community

Airbyte | January 30, 2024

Airbyte, creators of the leading open-source data movement infrastructure, today announced a series of accomplishments and awards reinforcing its standing as the largest and fastest-growing data movement community. With a focus on innovation, community engagement, and performance enhancement, Airbyte continues to revolutionize the way data is handled and processed across industries. “Airbyte proudly stands as the front-runner in the data movement landscape with the largest community of more than 5,000 daily users and over 125,000 deployments, with monthly data synchronizations of over 2 petabytes,” said Michel Tricot, co-founder and CEO, Airbyte. “This unparalleled growth is a testament to Airbyte's widespread adoption by users and the trust placed in its capabilities.” The Airbyte community has more than 800 code contributors and 12,000 stars on GitHub. Recently, the company held its second annual virtual conference called move(data), which attracted over 5,000 attendees. Airbyte was named an InfoWorld Technology of the Year Award finalist: Data Management – Integration (in October) for cutting-edge products that are changing how IT organizations work and how companies do business. And, at the start of this year, was named to the Built In 2024 Best Places To Work Award in San Francisco – Best Startups to Work For, recognizing the company's commitment to fostering a positive work environment, remote and flexible work opportunities, and programs for diversity, equity, and inclusion. Today, the company received the BigDATAwire Readers/Editors Choice Award – Big Data and AI Startup, which recognizes companies and products that have made a difference. Other key milestones in 2023 include the following. Availability of more than 350 data connectors, making Airbyte the platform with the most connectors in the industry. The company aims to increase that to 500 high-quality connectors supported by the end of this year. More than 2,000 custom connectors were created with the Airbyte No-Code Connector Builder, which enables data connectors to be made in minutes. Significant performance improvement with database replication speed increased by 10 times to support larger datasets. Added support for five vector databases, in addition to unstructured data sources, as the first company to build a bridge between data movement platforms and artificial intelligence (AI). Looking ahead, Airbyte will introduce data lakehouse destinations, as well as a new Publish feature to push data to API destinations. About Airbyte Airbyte is the open-source data movement infrastructure leader running in the safety of your cloud and syncing data from applications, APIs, and databases to data warehouses, lakes, and other destinations. Airbyte offers four products: Airbyte Open Source, Airbyte Self-Managed, Airbyte Cloud, and Powered by Airbyte. Airbyte was co-founded by Michel Tricot (former director of engineering and head of integrations at Liveramp and RideOS) and John Lafleur (serial entrepreneur of dev tools and B2B). The company is headquartered in San Francisco with a distributed team around the world. To learn more, visit airbyte.com.

Read More