Business Intelligence, Big Data Management, Data Science

Airbyte Makes Hundreds of Data Sources Available for Artificial Intelligence Applications

Airbyte Makes Hundreds of Data Sources Available

Airbyte, creators of the fastest-growing open-source data movement platform, today made available connectors for the Pinecone and Chroma vector databases as the destination for moving data from hundreds of data sources, which then can be accessed by artificial intelligence (AI) models.

“We are the first general-purpose data movement platform to add support for vector databases – the first to build a bridge between data movement platforms and AI,” said Michel Tricot, CEO, Airbyte. “Now, Pinecone and Chroma users don’t have to struggle with creating custom code to bring in data; they can use the new Airbyte connector to select the data sources they want.”

Because vector databases have the ability to interpret data to create relationships, their usage is increasingly popular as users seek to gain more meaning from data. Vector databases are ideal for applications like recommendation systems, anomaly detection and natural language processing, and as sources for AI applications – specifically Large Language Models (LLM).

The vector database destination in Airbyte now enables users to configure the full ELT pipeline, starting from extracting records from a wide variety of sources to separating unstructured and structured data, preparing and embedding text contents of records, and finally loading them into vector databases – all through a single, user-friendly interface. These vector databases can then be accessed by LLMs. All existing advantages of the Airbyte platform are now extended to vector databases, including:

  • The largest catalog of data sources that can be connected within minutes, and optimized for performance.
  • Availability of the no-code connector builder that makes it possible to easily and quickly create new connectors for data integrations that addresses the “long-tail” of data sources.
  • Ability to do incremental syncs to only extract changes in the data from a previous sync.
  • Built-in resiliency in the event of a disrupted session moving data, so the connection will resume from the point of the disruption.
  • Secure authentication for data access.
  • Ability to schedule and monitor status of all syncs.

Airbyte continues to innovate and support cutting-edge technologies to empower organizations in their data integration journey. The addition of vector database support marks another significant milestone in Airbyte's commitment to providing powerful and efficient solutions for data integration and analysis.

The vector database destination is currently in alpha status and available supporting: Pinecone on both Airbyte Cloud and the Open Source Software (OSS) version; Chroma and the embedded DocArray database on Airbyte OSS; plus more options in the future.

Airbyte makes moving data easy and affordable across almost any source and destination, helping enterprises provide their users with access to the right data for analysis and decision-making. Airbyte has the largest data engineering contributor community – with more than 800 contributors – and the best tooling to build and maintain connectors.

About Airbyte

Airbyte is the open-source data movement leader running in the safety of your cloud and syncing data from applications, APIs, and databases to data warehouses, lakes, and other destinations. Airbyte offers four products: Airbyte Open Source, Airbyte Enterprise, Airbyte Cloud, and Powered by Airbyte. Airbyte was co-founded by Michel Tricot (former director of engineering and head of integrations at Liveramp and RideOS) and John Lafleur (serial entrepreneur of dev tools and B2B). The company is headquartered in San Francisco with a distributed team around the world. To learn more, visit airbyte.com.

Spotlight

Spotlight

Related News

Big Data

Provider Density Data from LexisNexis Risk Solutions Shows Inequality of Provider Availability Across Regions

PR Newswire | October 06, 2023

LexisNexis® Risk Solutions, a leading provider of data and analytics, released new insights on the latest national and regional provider density trends for primary and specialty care. The analysis explores how often prescriber data changes, the metropolitan areas seeing the biggest change in the number of primary care providers (PCPs) and the metropolitan areas with the highest and lowest number of heart disease patients per cardiologist. Outflows of providers and coverage ratios can impact a community's ability to deliver accessible and efficient care, and with a looming shortfall of PCPs[1], it's important to understand where the existing PCPs are located. The analysis reveals the five metropolitan areas with the highest percent increase and decrease of PCPs between June 2022 and June 2023. According to the data, the Vallejo-Fairfield, CA area topped the list with a nearly 40% increase in PCPs. Conversely, the Fayetteville, NC area saw the highest decrease – losing nearly 12% of its PCPs. As chronic diseases continue to increase, the density of specialty providers becomes paramount. The provider density analysis examines the number of patients with heart disease per cardiologist in metropolitan statistical areas (MSAs) spanning large, medium, small, and micropolitan areas. The data shows as MSAs get smaller, the number of patients per cardiologist increases substantially, with many rural communities having thousands of heart disease patients per cardiologist. Among major metropolitan areas, Boston has the best ratio with 196 heart disease patients per cardiologist, and Las Vegas has the worst ratio with 824 heart disease patients per cardiologist. Additionally, the analysis found significant degradation of prescriber data in a short period of time. Over a quarter of prescribers (26%) had at least one change in their contact or license information within a 90-day period. This finding is based on the primary location of more than 2 million prescribers and illustrates the potential for data inaccuracies, creating an additional challenge for patients navigating the healthcare ecosystem. "Data is an essential element to fueling healthcare's success, but the continuously changing nature of provider data, when left unchecked, poses a threat to care coordination, patient experience, and health outcomes," said Jonathan Shannon, associate vice president of healthcare strategy, LexisNexis Risk Solutions. "Our recent analysis emphasizes the criticality of ensuring provider information is clean and accurate in real-time. With consistently updated provider data, healthcare organizations can develop meaningful strategies to improve provider availability, equitable access, and patient experience, particularly for vulnerable populations."

Read More

Big Data Management

IBM Releases Watsonx AI with Generative AI Models for Data Governance

IBM | September 08, 2023

IBM announces plans to enhance its Watsonx AI and data platform, with a focus on scaling AI impact for enterprises. Key improvements include new generative AI models, integration of foundation models, and features like Tuning Studio and Synthetic Data Generator. IBM emphasizes trust, transparency, and governance in training and plans to incorporate AI into its hybrid cloud solutions, although implementation difficulty and cost may be issues. IBM reveals its plans to introduce new generative AI foundation models and enhancements to its Watsonx AI and data platform. The goal is to provide enterprises with the tools they need to scale and accelerate the impact of AI in their operations. These improvements include a technical preview for watsonx.governance, the addition of new generative AI data services to watsonx.data, and the integration of watsonx.ai foundation models into select software and infrastructure products. Developers will have the opportunity to explore these capabilities and models at the IBM TechXchange Conference, scheduled to take place from September 11 to 14 in Las Vegas. The upcoming AI models and features include: 1. Granite Series Models: IBM plans to launch its Granite series models, utilizing the ‘Decoder’ architecture, is essential for large language models (LLMs). These models will support various enterprise natural language processing (NLP) tasks, including summarization, content generation, and insight extraction, with planned availability in Q3 2023. 2. Third-Party Models: IBM is currently offering Meta's Llama 2-chat 70 billion parameter model and the StarCoder LLM for code generation within watsonx.ai on IBM Cloud. IBM places a strong emphasis on trust and transparency in its training process for foundation models. They follow rigorous data collection procedures and include control points to ensure responsible deployments in terms of governance, risk assessment, privacy, bias mitigation, and compliance. IBM also intends to introduce new features across the watsonx platform: For Watsonx.ai: Tuning Studio: IBM plans to release the Tuning Studio, featuring prompt tuning, allowing clients to adapt foundation models to their specific enterprise data and tasks. This is expected to be available in 3Q23. Synthetic Data Generator: IBM has launched a synthetic data generator, enabling users to create artificial tabular data sets for AI model training, reducing risk and accelerating decision-making. For Watsonx.data: Generative AI: IBM aims to incorporate generative AI capabilities into watsonx.data to help users discover, augment, visualize, and refine data for AI through a self-service, natural language interface. This feature is planned for technical preview in 4Q 2023. Vector Database Capability: IBM plans to integrate vector database capabilities into watsonx.data to support watsonx.ai retrieval and augmented generation use cases, also expected in the technical preview in 4Q 2023. For Watsonx.governance: Model Risk Governance for Generative AI: IBM is launching a tech preview for watsonx.governance, providing automated collection and documentation of foundation model details and model risk governance capabilities. Dinesh Nirmal, Senior Vice President, Products, IBM Software, stated that IBM is dedicated to supporting clients throughout the AI lifecycle, from establishing foundational data strategies to model tuning and governance. Additionally, IBM will offer AI assistants to help clients scale AI's impact across various enterprise use cases, such as application modernization, customer care, and HR and talent management. IBM also intends to integrate watsonx.ai innovations into its hybrid cloud software and infrastructure products, including intelligent IT automation and developer services. IBM's upgrades to the Watsonx AI and data platform offer promise but, come with potential drawbacks. Implementation complexity and the need for additional training may create a steep learning curve. The associated costs of advanced technology could be prohibitive for smaller organizations. The introduction of generative AI and synthetic data raises data privacy and security concerns. Additionally, despite efforts for responsible AI, the risk of bias in models necessitates ongoing vigilance to avoid legal and ethical issues.

Read More

Data Visualization

Salesforce Unveils Einstein 1 Platform: Transforming CRM Experiences

Salesforce | September 14, 2023

Salesforce introduces the groundbreaking Einstein 1 Platform, built on a robust metadata framework. The Einstein 1 Data Cloud supports large-scale data and high-speed automation, unifying customer data, enterprise content, and more. The latest iteration of Einstein includes Einstein Copilot and Einstein Copilot Studio. On September 12, 2023, Salesforce unveiled the Einstein 1 Platform, introducing significant enhancements to the Salesforce Data Cloud and Einstein AI capabilities. The platform is built on Salesforce's underlying metadata framework. Einstein 1 is a reliable AI platform for customer-centric companies that empowers organizations to securely connect diverse datasets, enabling the creation of AI-driven applications using low-code development and the delivery of entirely novel CRM experiences. Salesforce's original metadata framework plays a crucial role in helping companies organize and comprehend data across various Salesforce applications. This is like establishing a common language to facilitate communication among different applications built on the core platform. It then maps data from disparate systems to the Salesforce metadata framework, thus creating a unified view of enterprise data. This approach allows organizations to tailor user experiences and leverage data for various purposes using low-code platform services, including Einstein for AI predictions and content generation, Flow for automation, and Lightning for user interfaces. Importantly, these customizations are readily accessible to other core applications within the organization, eliminating the need for costly and fragile integration code. In today's business landscape, customer data is exceedingly fragmented. On average, companies employ a staggering 1,061 different applications, yet only 29% of them are integrated. The complexity of enterprise data systems has increased, and previous computing revolutions, such as cloud computing, social media, and mobile technologies, have generated isolated pockets of customer data. Furthermore, Salesforce ensures automatic upgrades three times a year, with the metadata framework safeguarding integrations, customizations, and security models from disruptions. This enables organizations to seamlessly incorporate, expand, and evolve their use of Salesforce as the platform evolves. The Einstein 1 Data Cloud, which supports large-scale data and high-speed automation, paves the way for a new era of data-driven AI applications. This real-time hyperscale data engine combines and harmonizes customer data, enterprise content, telemetry data, Slack conversations, and other structured and unstructured data, culminating in a unified customer view. Currently, the platform is already processing a staggering 30 trillion transactions per month and connecting and unifying 100 billion records daily. The Data Cloud is now natively integrated with the Einstein 1 Platform, and this integration unlocks previously isolated data sources, enabling the creation of comprehensive customer profiles and the delivery of entirely fresh CRM experiences. The Einstein 1 Platform has been expanded to support thousands of metadata-enabled objects per customer, each able to manage trillions of rows. Furthermore, Marketing Cloud and Commerce Cloud, which joined Salesforce's Customer 360 portfolio through acquisitions, have been reengineered onto the Einstein 1 Platform. Now, massive volumes of data from external systems can be seamlessly integrated into the platform and transformed into actionable Salesforce objects. Automation at scale is achieved by triggering flows in response to changes in any object, even events from IoT devices or AI predictions, at a rate of up to 20,000 events per second. These flows can interact with any enterprise system, including legacy systems, through MuleSoft. Analytics also benefit from this scalability, as Salesforce provides a range of insights and analytics solutions, including reports and dashboards, Tableau, CRM analytics, and Marketing Cloud reports. With the Einstein 1 Platform's common metadata schema and access model, these solutions can operate on the same data at scale, delivering valuable insights for various use cases. Salesforce has additionally made Data Cloud accessible at no cost to every customer with Enterprise Edition or higher. This allows customers to commence data ingestion, harmonization, and exploration, leveraging Data Cloud and Tableau to extend the influence of their data across all business segments and kickstart their AI journey. Salesforce's latest iteration of Einstein introduces a conversational AI assistant to every CRM application and customer experience. This includes: Einstein Copilot: This is an out-of-the-box conversational AI assistant integrated into every Salesforce application's user experience. Einstein Copilot enhances productivity by assisting users within their workflow, enabling natural language inquiries, and providing pertinent, trustworthy responses grounded in proprietary company data from the Data Cloud. Furthermore, Einstein Copilot proactively takes action and offers additional options beyond the user's query. Einstein Copilot Studio: This feature enables companies to create a new generation of AI-powered apps with custom prompts, skills, and AI models. This can help accelerate sales processes, streamline customer service, auto-generate websites based on personalized browsing history, or transform natural language prompts into code. Einstein Copilot Studio offers configurability to make Einstein Copilot available across consumer-facing channels such as websites and messaging platforms like Slack, WhatsApp, or SMS. Both Einstein Copilot and Einstein Copilot Studio operate within the secure Einstein Trust Layer, an AI architecture seamlessly integrated into the Einstein 1 Platform. This architecture ensures that teams can leverage generative AI while maintaining stringent data privacy and security standards. The metadata framework within the Einstein 1 Platform expedites AI adoption by providing a flexible, dynamic, and context-rich environment for machine learning algorithms. Metadata describes the structure, relationships, and behaviors of data within the system, allowing AI models to better grasp the context of customer interactions, business processes, and interaction outcomes. This understanding enables fine-tuning of large language models over time, delivering continually improved results.

Read More