Business Intelligence, Big Data Management

Comet Introduces Kangas, An Open Source Smart Data Exploration, Analysis and Model Debugging Tool for Machine Learning

Comet Introduces Kangas, An Open Source Smart Data Exploration
Comet, provider of the leading MLOps platform for machine learning (ML) teams from startup to enterprise, today announced a bold new product: Kangas. Open sourced to democratize large scale visual dataset exploration and analysis for the computer vision and machine learning community, Kangas helps users understand and debug their data in a new and highly intuitive way. With Kangas, visualizations are generated in real time; enabling ML practitioners to group, sort, filter, query and interpret their structured and unstructured data to derive meaningful information and accelerate model development.

Data scientists often need to analyze large scale datasets both during the data preparation stage and model training, which can be overwhelming and time-consuming, especially when working on large scale datasets. Kangas makes it possible to intuitively explore, debug and analyze data in real time to quickly gain insights, leading to better, faster decisions. With Kangas, users are able to transform datasets of any scale into clear visualizations.

“A key component of data-centric Machine Learning is being able to understand how your training data impacts model results and where your model predictions are wrong. “Kangas accomplishes both of these goals and dramatically improves the experience for ML practitioners.”

Gideon Mendels, CEO and co-founder of Comet

Putting Large Scale Machine Learning Dataset Analysis at Your Fingertips

Developed with the unique needs of ML practitioners in mind, Kangas is a scalable, dynamic and interoperable tool that allows for the discovery of patterns buried deep within oceans of datasets. With Kangas, data scientists can query their large-scale datasets in a manner that is natural to their problem, allowing them to interact and engage with their data in novel ways.

Noteworthy benefits of Kangas include:


Unparalleled Scalability: Kangas was developed to handle large datasets with high performance.
Purpose Built: Computer Vision/ML concepts like scoring, bounding boxes and more are supported out-of-the-box, and statistics/charts are generated automatically.
Support for Different Forms of Media: Kangas is not limited to traditional text queries. It also supports images, videos and more.
Interoperability: Kangas can run in a notebook, as a standalone local app or even deployed as a web app. It ingests data in a simple format that makes it easy to work with whatever tooling data scientists already use.
Open Source: Kangas is 100% open source and is built by and for the ML community.

Kangas was designed for the entire community, to be embraced by students, researchers and the enterprise. As individuals and teams work to further their ML initiatives, they will be able to leverage the full benefits of Kangas. Being open source, all are able to contribute and further enhance it as well.

“Interoperability and flexibility are inherent in Comet’s value proposition, and Comet aims to expand on that value through open source contributions,” added Mendels. “Kangas is a continuation of all of our efforts, and we couldn’t wait to get its capabilities into the hands of as many data scientists, data engineers and ML engineers as possible. We believe by open sourcing it, Comet can help teams get the most out of their ML projects in ways that have not been possible previously.”

Kangas is available as an open source package for any type of use case. It will be available under Apache License 2 and is open to contributions from community members.

About Comet
Comet provides an MLOps platform that data scientists and machine learning teams use to manage, optimize, and accelerate the development process across the entire ML lifecycle, from training runs to monitoring models in production. Comet’s platform is trusted by over 150 enterprise customers including Affirm, Cepsa, Etsy, Uber and Zappos. Individuals and academic teams use Comet’s platform to advance research in their fields of study. Founded in 2017, Comet is headquartered in New York, NY with a remote workforce in nine countries on four continents. Comet is free to individuals and academic teams. Startup, team, and enterprise licensing is also available.

Spotlight

Other News
Data Visualization

SensiML Unveils Data Studio - Next-Generation Sensor Data Management for AI / ML

SensiML | December 20, 2023

SensiML Corporation, a leader in AI software for IoT and a subsidiary of QuickLogic, announced the launch of Data Studio, a ground-breaking platform designed to redefine the landscape of sensor data management. With a focus on practicality and efficiency, Data Studio empowers engineers and data scientists by offering an integrated solution that addresses the most time-consuming tasks in AI engineering projects - creating high-quality datasets for evaluating and developing ML models. According to Cognilytica, a well-respected AI / ML consulting firm, approximately 80% of the total time for machine learning (ML) projects is allocated to data preparation. These tasks include data identification, aggregation, cleansing, labeling, and augmentation – all of which are supported in SensiML's collaborative development environment. SensiML Data Studio significantly improves productivity and simplifies dataset management for anyone working on sensor data ML projects. With real-time connectivity, intuitive visualization tools, sensor data video synchronization, and robust support for large-scale collaborative projects, it offers a seamless experience for developers on edge devices, gateways, PCs, and cloud platforms. A comprehensive overview of all the features of Data Studio can be found on the SensiML website. The primary features are highlighted below: Effortless Data Capture and Import - Capture live sensor data, analyze it instantly, and label any data for seamless insights. Collaboratively Label Sensor Data - Employ flexible labeling methodologies for sensor data, including manual, AI-assisted, and custom – and sync video for effortless complex labeling. Store and analyze data locally on your computer or remotely. Data Analysis and Model Evaluation - Visually compare ML models, filter, transform, and fuse sensor data – all with built-in tools and your own Python expertise. Label and Data Versioning – Keep track of your labels and model results with versioned labels. Easily export your project to an open format. "SensiML Data Studio makes sensor data management and analysis more accessible and efficient, empowering developers to build better, more impactful applications using sensor data across a wide range of industries," said Chris Knorowski, CTO of SensiML. SensiML Data Studio is poised to transform sensor data analysis, offering a valuable resource for researchers, engineers, and data scientists across diverse sectors from agriculture and consumer wearables to medical devices, smart buildings, and factory maintenance. About SensiML SensiML, a subsidiary of QuickLogic (NASDAQ: QUIK), offers cutting-edge software that enables ultra-low power IoT endpoints that implement AI to transform raw sensor data into meaningful insight at the device itself. The company's flagship solution, the SensiML Analytics Toolkit, provides an end-to-end development platform spanning data collection, labeling, algorithm and firmware auto-generation, and testing. The SensiML Toolkit supports Arm® Cortex®-M class and higher microcontroller cores, Intel® x86 instruction set processors, and heterogeneous core QuickLogic SoCs and QuickAI platforms with FPGA optimizations.

Read More

Business Strategy

Devo Security Data Platform Attains FedRAMP Authorization

Devo | January 09, 2024

Devo Technology, the security data analytics company, today announced that the Devo Security Data Platform received Authorization to Operate (ATO) at the Moderate level under the Federal Risk and Authorization Management Program (FedRAMP). The Devo Security Data Platform successfully completed FedRAMP's rigorous accreditation process, enabling federal agencies to secure their environments with a market-leading security information and event management (SIEM). Agencies and their partners can now leverage Devo to solve their toughest IT and security challenges with unparalleled visibility and a unified view of risk posture, security operations and the threat landscape. The demand to keep pace with rapidly evolving cyber threats at cloud speed and scale has never been higher for the U.S. government. New Office of Management and Budget (OMB) regulations require federal agencies to collect and retain logs for long time periods. These requirements strain legacy SIEM and logging solutions, resulting in higher license and maintenance costs and slower query times. The Devo Security Data Platform's massive ingestion capabilities overcome these challenges and enable agencies to manage petabytes of data—from any device or application—cost-effectively and performantly in the cloud. Kayla Williams, CISO, Devo, said: "Devo relentlessly maintains the highest standards of internal security controls to ensure customers can protect themselves from security threats with peace of mind. Commercial customers have used the Devo Security Data Platform in the cloud for years, and this milestone enables us to continue to extend the same seamless experience to federal agencies and their partners." The Small Business Administration sponsored Devo's authorization. FedRAMP was established to provide a cost-effective, risk-based approach for the adoption and use of cloud services by the federal government. FedRAMP empowers agencies to use modern cloud technologies with an emphasis on the security and protection of federal information. The Devo Security Data Platform is also available in the AWS GovCloud Marketplace, an isolated AWS Region designed to host sensitive data and regulated workloads in the cloud, assisting customers with U.S. federal, state and local government compliance requirements. About Devo Devo unleashes the power of the SOC. The Devo Security Data Platform, powered by our HyperStream technology, is purpose-built to provide the speed and scale, real-time analytics, and actionable intelligence global enterprises need to defend expanding attack surfaces. An ally in keeping your organization secure, Devo combines the power of people and AI to augment security teams, leading to better insights and faster outcomes. Headquartered in Cambridge, Massachusetts, with operations in North America, Europe and Asia Pacific, Devo is backed by Insight Partners, Georgian, TCV, General Atlantic, Bessemer Venture Partners, Kibo Ventures and Eurazeo. Learn more at www.devo.com.

Read More

Big Data

Airbyte Racks Up Awards from InfoWorld, BigDATAwire, Built In; Builds Largest and Fastest-Growing User Community

Airbyte | January 30, 2024

Airbyte, creators of the leading open-source data movement infrastructure, today announced a series of accomplishments and awards reinforcing its standing as the largest and fastest-growing data movement community. With a focus on innovation, community engagement, and performance enhancement, Airbyte continues to revolutionize the way data is handled and processed across industries. “Airbyte proudly stands as the front-runner in the data movement landscape with the largest community of more than 5,000 daily users and over 125,000 deployments, with monthly data synchronizations of over 2 petabytes,” said Michel Tricot, co-founder and CEO, Airbyte. “This unparalleled growth is a testament to Airbyte's widespread adoption by users and the trust placed in its capabilities.” The Airbyte community has more than 800 code contributors and 12,000 stars on GitHub. Recently, the company held its second annual virtual conference called move(data), which attracted over 5,000 attendees. Airbyte was named an InfoWorld Technology of the Year Award finalist: Data Management – Integration (in October) for cutting-edge products that are changing how IT organizations work and how companies do business. And, at the start of this year, was named to the Built In 2024 Best Places To Work Award in San Francisco – Best Startups to Work For, recognizing the company's commitment to fostering a positive work environment, remote and flexible work opportunities, and programs for diversity, equity, and inclusion. Today, the company received the BigDATAwire Readers/Editors Choice Award – Big Data and AI Startup, which recognizes companies and products that have made a difference. Other key milestones in 2023 include the following. Availability of more than 350 data connectors, making Airbyte the platform with the most connectors in the industry. The company aims to increase that to 500 high-quality connectors supported by the end of this year. More than 2,000 custom connectors were created with the Airbyte No-Code Connector Builder, which enables data connectors to be made in minutes. Significant performance improvement with database replication speed increased by 10 times to support larger datasets. Added support for five vector databases, in addition to unstructured data sources, as the first company to build a bridge between data movement platforms and artificial intelligence (AI). Looking ahead, Airbyte will introduce data lakehouse destinations, as well as a new Publish feature to push data to API destinations. About Airbyte Airbyte is the open-source data movement infrastructure leader running in the safety of your cloud and syncing data from applications, APIs, and databases to data warehouses, lakes, and other destinations. Airbyte offers four products: Airbyte Open Source, Airbyte Self-Managed, Airbyte Cloud, and Powered by Airbyte. Airbyte was co-founded by Michel Tricot (former director of engineering and head of integrations at Liveramp and RideOS) and John Lafleur (serial entrepreneur of dev tools and B2B). The company is headquartered in San Francisco with a distributed team around the world. To learn more, visit airbyte.com.

Read More

Data Architecture

SingleStore Announces Real-time Data Platform to Further Accelerate AI, Analytics and Application Development

SingleStore | January 25, 2024

SingleStore, the database that allows you to transact, analyze and contextualize data, today announced powerful new capabilities — making it the industry’s only real-time data platform. With its latest release, dubbed SingleStore Pro Max, the company announced ground-breaking features like indexed vector search, an on-demand compute service for GPUs/ CPUs and a new free shared tier, among several other innovative new products. Together, these capabilities shrink development cycles while providing the performance and scale that customers need for building applications. In an explosive generative AI landscape, companies are looking for a modern data platform that’s ready for enterprise AI use cases — one with best-available tooling to accelerate development, simultaneously allowing them to marry structured or semi-structured data residing in enterprise systems with unstructured data lying in data lakes. “We believe that a data platform should both create new revenue streams while also decreasing technological costs and complexity for customers. And this can only happen with simplicity at the core,” said Raj Verma, CEO, SingleStore. “This isn’t just a product update, it’s a quantum leap… SingleStore is offering truly transformative capabilities in a single platform for customers to build all kinds of real-time applications, AI or otherwise.” “At Adobe, we aim to change the world through digital experiences,” said Matt Newman, Principal Data Architect, Adobe. “SingleStore’s latest release is exciting as it pushes what is possible when it comes to database technology, real-time analytics and building modern applications that support AI workloads. We’re looking forward to these new features as more and more of our customers are seeking ways to take full advantage of generative Al capabilities.” Key new features launched include: Indexed vector search. SingleStore has announced support for vector search using Approximate Nearest Neighbor (ANN) vector indexing algorithms, leading to 800-1,000x faster vector search performance than precise methods (KNN). With both full-text and indexed vector search capabilities, SingleStore offers developers true hybrid search that takes advantage of the full power of SQL for queries, joins, filters and aggregations. These capabilities firmly place SingleStore above vector-only databases that require niche query languages and are not designed to meet enterprise security and resiliency needs. Free shared tier. SingleStore has announced a new cloud-based Free Shared Tier that’s designed for startups and developers to quickly bring their ideas to life — without the need to commit to a paid plan. On-demand compute service for GPUs and CPUs. SingleStore announces a compute service that works alongside SingleStore’s native Notebooks to let developers spin up GPUs and CPUs to run database-adjacent workloads including data preparation, ETL, third-party native application frameworks, etc. This capability brings compute to algorithms, rather than the other way around, enabling developers to build highly performant AI applications safely and securely using SingleStore — without unnecessary data movement. New CDC capabilities for data ingest and egress. To ease the burden and costs of moving data in and out of SingleStore, SingleStore is adding native capabilities for real-time Change Data Capture (CDC) in for MongoDB®, MySQL and ingestion from Apache Iceberg without requiring other third party CDC tools. SingleStore will also support CDC out capabilities that ease migrations and enable the use of SingleStore as a source for other applications and databases like data warehouses and lakehouses. SingleStore Kai™. Now generally available, and ready for both analytical and transactional processing for apps originally built on MongoDB. Announced in public preview in early 2023, SingleStore Kai is an API to deliver over 100x faster analytics on MongoDB® with no query changes or data transformations required. Today, SingleStore Kai supports BSON data format natively, has improved transactional performance, increased performance for arrays and offers industry-leading compatibility with MongoDB query language. Projections: To further advance as the world’s fastest HTAP database, SingleStore has added Projections. Projections allow developers to greatly speed up range filters and group by operations by introducing secondary sort and shard keys. Query performance improvements range from 2-3x or more, depending on the size of the table. With this latest release, SingleStore becomes the industry’s first and only real-time data platform designed for all applications, analytics and AI. SingleStore supports high-throughput ingest performance, ACID transactions and low-latency analytics; and structured, semi-structured (JSON, BSON, text) and unstructured data (vector embeddings of audio, video, images, PDFs, etc.). Finally, SingleStore’s data platform is designed not just with developers in mind, but also ML engineers, data engineers and data scientists. “Our new features and capabilities advance SingleStore’s mission of offering a real-time data platform for the next wave of gen AI and data applications,” said Nadeem Asghar, SVP, Product Management + Strategy at SingleStore. “New features, including vector search, Projections, Apache Iceberg, Scheduled Notebooks, autoscaling, GPU compute services, SingleStore Kai™, and the Free Shared Tier allow startups — as well as global enterprises — to quickly build and scale enterprise-grade real-time AI applications. We make data integration with third-party databases easy with both CDC in and CDC out support.” "Although generative AI, LLM, and vector search capabilities are early stage, they promise to deliver a richer data experience with translytical architecture," states the 2023 report, “Translytical Architecture 2.0 Evolves To Support Distributed, Multimodel, And AI Capabilities,” authored by Noel Yuhanna, Vice President and Principal Analyst at Forrester Research. "Generative AI and LLM can help democratize data through natural language query (NLQ), offering a ChatGPT-like interface. Also, vector storage and index can be leveraged to perform similarity searches to support data intelligence." SingleStore has been on a fast track leading innovation around generative AI. The company’s product evolution has been accompanied by high-momentum growth in customers and surpassing $100M in ARR late last year. SingleStore also recently ranked #2 in the emerging category of vector databases, and was recognized by TrustRadius as a top vector database in 2023. Finally, SingleStore was a winner of InfoWorld’s Technology of the year in the database category. To learn more about SingleStore visit here. About SingleStore SingleStore empowers the world’s leading organizations to build and scale modern applications using the only database that allows you to transact, analyze and contextualize data in real time. With streaming data ingestion, support for both transactions and analytics, horizontal scalability and hybrid vector search capabilities, SingleStore helps deliver 10-100x better performance at 1/3 the costs compared to legacy architectures. Hundreds of customers worldwide — including Fortune 500 companies and global data leaders — use SingleStore to power real-time applications and analytics. Learn more at singlestore.com. Follow us @SingleStoreDB on Twitter or visit www.singlestore.com.

Read More