BUSINESS STRATEGY

Tellius and Databricks Partner to Deliver AI-powered Decision Intelligence for the Data Lakehouse

Tellius | June 28, 2022

Tellius
Tellius, the AI-driven decision intelligence platform, today announced a partnership with Databricks to give joint customers the ability to run Tellius natural language search queries and automated insights directly on the Databricks Lakehouse Platform, powered by Delta Lake, without the need to move any data.

Answering ad-hoc business questions with data typically requires wading through dashboards, reports, or generating custom SQL queries — which is time consuming and may require tool switching or handoffs. After doing all this work, organizations still only know what happened within the data. To understand why metrics change requires further investigation and advanced analysis skills to derive key drivers, trends, and anomalies.

The partnership between Databricks and Tellius changes this paradigm. Now, anyone – business users and technical users alike – can use Tellius to engage directly with their data and models in Databricks. With Tellius, organizations can quickly search and analyze their data to identify what is happening with natural language queries, understand why metrics are changing via AI-powered Insights, and determine next best actions with deep insights and AutoML. Connecting to Delta Lake on Databricks only takes a few clicks, and then users can perform a natural language search of their unaggregated data to answer their own questions. They can drill down infinitely to get granular insights, leverage single-click AI analysis to uncover trends, key drivers, and anomalies in their data, and even create predictive models via AutoML in Tellius. These answers and insights can be utilized to write back to source applications to operationalize insights and drive actions.

With Tellius and Databricks, customers get:

  • Augmented lakehouse analytics — Allows users to analyze petabytes of live data securely without moving it out of the Databricks Lakehouse Platform
  • Zero-data movement insights — Makes it easier to unearth actionable insights from a multitude of sources on the Databricks Lakehouse Platform through AI-powered analysis of disparate structured and unstructured data
  • Faster data collaboration — Democratizes data access across entire analytics teams without worrying about performance or IT maintenance
  • More consumable AI and ML — Insights and predictions from machine learning workloads built in Databricks can be accessed through natural language search, getting them into production with business users quicker than ever

“With our partners at Databricks, we are delivering cloud-native data analytics to accelerate positive business impact from AI and machine learning,” said Ajay Khanna, Founder and CEO of Tellius. “Business users and data professionals can now focus on deriving insights across their multiple data sources and enterprise applications and on taking action based on automated recommendations without compromising on analytics performance.”

About Tellius
Tellius is an AI-driven decision intelligence platform that enables anyone to get faster insights from their data. The company helps organizations across industries, including financial services, pharmaceutical and life sciences, retail, healthcare, and high technology, accelerate their journey from data to decisions by augmenting human expertise and curiosity with intelligent automation. The company’s platform combines AI- and ML-driven automation with a search interface for ad hoc exploration, allowing users to ask questions of their business data, analyze billions of records in seconds, and gain comprehensive, automated insights in a single platform. Founded in 2016, Tellius is backed by Sands Capital Ventures, Grotech Ventures, and Veraz Investments.

Spotlight

A new study by the economist intelligence unit, commissioned by Wipro, finds a strong relationship between earnings growth and strategic use of data.


Other News
BUSINESS STRATEGY

Vertica Announces Vertica 12 for Future-Proof Analytics

Vertica | June 08, 2022

Vertica, a Micro Focus line of business, today announced the release of version 12 of the Vertica analytical database. Vertica 12 includes new major features and enhancements for analytics and machine learning across multi-cloud, hybrid on-premises and cloud, and multi-regional deployments. The announcement was made during Vertica Unify 2022, the organization's annual user conference, where attendees learned that Vertica 12 users can now choose from the broadest range of deployment options on the market, with improved automation capabilities as well, to future-proof analytics against constantly changing technology requirements. "While many companies are being forced to choose their analytics deployment strategy, to commit to one thing –public cloud, on-premises, or hybrid –no one knows exactly what the future may hold. "With Vertica 12, we have developed a completely flexible platform that is seamlessly hybrid. It is as capable of deploying in a SaaS model as it is on-premises. The continuous advancement of our analytical capabilities means that no matter what your future data strategies may hold, Vertica brings powerful analytics to your data." Scott Richards, Senior Vice President and General Manager, Vertica at Micro Focus In addition to supporting more on-premises object stores, Vertica 12 expands its Kubernetes support beyond AWS S3 to Google Cloud Storage (GCS), Azure Blob Storage and Hadoop Distributed Filesystem Storage (HDFS), making it fully cloud-native in any environment. Vertica's cloud-optimized architecture also has been enhanced with intelligent subclustering to better manage variable workloads and data sharing, helping to assign costs to owners in a logical way. On the integration front, Vertica 12 increases the interaction with the data analytics ecosystem. Customers will benefit because key proprietary and open-source technologies work seamlessly, including a new version of VerticaPy, the Vertica Python and Jupyter Notebook interface, as well as an enhanced Spark connector and broadened PMML support. About Vertica The core analytical database within the Micro Focus software portfolio, Vertica is the Unified Analytics Platform, based on a massively scalable architecture with the broadest set of analytical functions spanning event and time series, pattern matching, geospatial, and end-to-end in-database machine learning. Vertica enables many customers – from Agoda to Philips to many others – to easily apply these powerful functions to the largest and most demanding analytical workloads, arming businesses and their customers with predictive business insights faster than any analytical database or data warehouse in the market.

Read More

BIG DATA MANAGEMENT

Datafold Launches Open Source data-diff to Compare Tables of Any Size Across Databases

Datafold | June 23, 2022

Datafold, a data reliability company, today announced data-diff, a new open source cross-database diffing package. This new product is an open source extension to Datafold’s original Data Diff tool for comparing data sets. Open source data-diff validates the consistency of data across databases using high-performance algorithms. In the modern data stack, companies extract data from sources, load that data into a warehouse, and transform that data so that it can be used for analysis, activation, or data science use cases. Datafold has been focused on automated testing during the transformation step with Data Diff, ensuring that any change made to a data model does not break a dashboard or cause a predictive algorithm to have the wrong data. With the launch of open source data-diff, Datafold can now help with the extract and load part of the process. Open source data-diff verifies that the data that has been loaded matches the source of that data where it was extracted. All parts of the data stack need testing for data engineers to create reliable data products, and Datafold now gives them coverage throughout the extract, load, transform (ELT) process. “data-diff fulfills a need that wasn’t previously being met. “Every data-savvy business today replicates data between databases in some way, for example, to integrate all available data in a warehouse or data lake to leverage it for analytics and machine learning. Replicating data at scale is a complex and often error-prone process, and although multiple vendors and open source tools provide replication solutions, there was no tooling to validate the correctness of such replication. As a result, engineering teams resorted to manual one-off checks and tedious investigations of discrepancies, and data consumers couldn’t fully trust the data replicated from other systems. Gleb Mezhanskiy, Datafold founder and CEO Mezhanskiy continued, “data-diff solves this problem elegantly by providing an easy way to validate consistency of data sets across databases at scale. It relies on state-of-the art algorithms to achieve incredible speed: e.g., comparing one-billion-row data sets across different databases takes less than five minutes on a regular laptop. And, as an open source tool, it can be easily embedded into existing workflows and systems.” Answering an Important Need Today’s organizations are using data replication to consolidate information from multiple sources into data warehouses or data lakes for analytics. They’re integrating operational systems with real-time data pipelines, consolidating data for search, and migrating data from legacy systems to modern databases. Thanks to amazing tools like Fivetran, Airbyte and Stitch, it’s easier than ever to sync data across multiple systems and applications. Most data synchronization scenarios call for 100% guaranteed data integrity, yet the practical reality is that in any interconnected system, records are sometimes lost due to dropped packets, general replication issues, or configuration errors. To ensure data integrity, it’s necessary to perform validation checks using a data diff tool. Datafold’s approach constitutes a significant step forward for developers and data analysts who wish to compare multiple databases rapidly and efficiently, without building a makeshift diff tool themselves. Currently, data engineers use multiple comparison methods, ranging from simple row counts to comprehensive row-level analysis. The former is fast but not comprehensive, whereas the latter approach is slow but guarantees complete validation. Open source data-diff is fast and provides complete validation. Open Source data-diff for Building and Managing Data Quality Available today, data-diff uses checksums to verify 100% consistency between two different data sources quickly and efficiently. This method allows for a row-level comparison of 100 million records to be done in just a few seconds, without sacrificing the granularity of the resulting comparison. Datafold has released data-diff under the MIT license. Currently, the software includes connectors for Postgres, MySQL, Snowflake, BigQuery, Redshift, Presto and Oracle. Datafold plans to invite contributors to build connectors for additional data sources and for specific business applications. About Datafold Datafold is a data reliability platform that helps data teams deliver reliable data products faster. It has a unique ability to identify, prioritize and investigate data quality issues proactively before they affect production. Founded in 2020 by veteran data engineers, Datafold has raised $22 million from investors including NEA, Amplify Partners, and YCombinator. Customers include Thumbtack, Patreon, Truebill, Faire, and Dutchie.

Read More

DATA VISUALIZATION

Skykit Survey: Sharing Data Dashboards Broadly with Employees is Challenging, but Reaps Big Rewards

Skykit LLC | July 14, 2022

The survey found that when IT managers invest in sharing data visualizations via digital signage, nearly three-fourths can tie the solution directly to ROI, with 81% saying it contributed to increased employee engagement, 67% to faster movement of goods, and 58% to reduced time spent on maintenance. Of those sharing data dashboards on digital signage solutions, 32% said their solution can update data across the enterprise in real time, while 39% said the tool updated data every minute. The most popular methods for sharing data visualizations do not support real-time data updates, as indicated by the survey respondents. Of those responding, 42% still share graphics using USB drives and another 42% email graphics. At this time, only 40% of companies share their data via broadcast tools, like digital signage, some of which have real-time update capabilities. The survey found that when IT managers invest in sharing data visualizations via digital signage, nearly three-fourths can tie the solution directly to ROI, with 81% saying it contributed to increased employee engagement, 67% to faster movement of goods, and 58% to reduced time spent on maintenance. Of those sharing data dashboards on digital signage solutions, 32% said their solution can update data across the enterprise in real time, while 39% said the tool updated data every minute. “The shift to sharing data visualizations via digital signage is underway, and the results speak for themselves – more than four-fifths of users have more engaged employees. “Whether dashboards are shared in the boardrooms, the breakrooms, or the warehouse floor, we’re seeing that true ROI is tied to the ability to share data dashboards in real time.” Irfan Khan, Skykit CEO Still, IT managers say some digital signage providers aren’t quite meeting their expectations for broadcasting data dashboards. 97% of users said they eliminated a potential solution during the sales process based on the provider’s weak security protocols. Security challenges will increase as more businesses look to return to the office post COVID-19. Nearly two-thirds, 60%, said they will need to share real-time updates with remote and hybrid workers. Businesses will also look to invest more in screens to modernize their workspaces, with 47% saying they will purchase new screens to increase employee engagement, and 36% looking to communicate business-critical information via screens throughout the enterprise. “We know from the findings that users need their digital signage tools working as quickly as possible, with as few additional investments as possible,” Khan said. “It’s on providers to ensure we’re safely and efficiently rolling out signage technology – getting solutions live quickly, with easy integration into existing data visualization tools.” About Skykit LLC Skykit is a leading provider of workplace experience tools and cloud-based digital signage solutions that streamline customer and employee communication. The company’s award-winning platform is scalable, making it a perfect choice for businesses of all sizes, across a variety of industries. Launched in 2016, Skykit currently provides digital signage solutions and workplace experience software to hundreds of organizations using tens of thousands of screens around the world.

Read More

BIG DATA MANAGEMENT

Komprise Automates Unstructured Data Discovery with Smart Data Workflows

Komprise | May 20, 2022

Komprise, the leader in analytics-driven unstructured data management and mobility, today announced Komprise Smart Data Workflows, a systematic process to discover relevant file and object data across cloud, edge and on-premises datacenters and feed data in native format to AI and machine learning (ML) tools and data lakes. Industry analysts predict that at least 80% of the world’s data will be unstructured by 2025. This data is critical for AI and ML-driven applications and insights, yet much of it is locked away in disparate data storage silos. This creates an unstructured data blind spot, resulting in billions of dollars in missed big data opportunities. Komprise has expanded Deep Analytics Actions to include copy and confine operations based on Deep Analytics queries, added the ability to execute external functions such as running natural language processing functions via API and expanded global tagging and search to support these workflows. Komprise Smart Data Workflows allow you to define and execute a process with as many of these steps needed in any sequence, including external functions at the edge, datacenter or cloud. Komprise Global File Index and Smart Data Workflows together reduce the time it takes to find, enrich and move the right unstructured data by up to 80%. “Komprise has delivered a rapid way to visualize our petabytes of instrument data and then automate processes such as tiering and deletion for optimal savings,” says Jay Smestad, senior director of information technology at PacBio. “Now, the ability to automate workflows so we can further define this data at a more granular level and then feed it into analytics tools to help meet our scientists’ needs is a game changer.” Komprise Smart Data Workflows are relevant across many sectors. Here’s an example from the pharmaceutical industry: 1) Search: Define and execute a custom query across on-prem, edge and cloud data silos to find all data for Project X with Komprise Deep Analytics and the Komprise Global File Index. 2) Execute & Enrich: Execute an external function on Project X data to look for a specific DNA sequence for a mutation and tag such data as "Mutation XYZ". 3) Cull & Mobilize: Move only Project X data tagged with "Mutation XYZ" to the cloud using Komprise Deep Analytics Actions for central processing. 4) Manage Data Lifecycle: Move the data to a lower storage tier for cost savings once the analysis is complete. Other Smart Data Workflow use cases include: Legal Divestiture: Find and tag all files related to a divestiture project and move sensitive data to an object-locked storage bucket and move the rest to a writable bucket. Autonomous Vehicles: Find crash test data related to abrupt stopping of a specific vehicle model and copy this data to the cloud for further analysis. Execute an external function to identify and tag data with Reason = Abrupt Stop and move only the relevant data to the cloud data lakehouse to reduce time and cost associated with moving and analyzing unrelated data. “Whether it’s massive volumes of genomics data, surveillance data, IoT, GDPR or user shares across the enterprise, Komprise Smart Data Workflows orchestrate the information lifecycle of this data in the cloud to efficiently find, enrich and move the data you need for analytics projects. “We are excited to move to this next phase of our product journey, making it much easier to manage and mobilize massive volumes of unstructured data for cost reduction, compliance and business value.” Kumar Goswami, CEO of Komprise About Komprise Komprise is a provider of unstructured data management and mobility software that frees enterprises to easily analyze, mobilize, and monetize the right file and object data across clouds without shackling data to any vendor. With Komprise Intelligent Data Management, you can cut 70% of enterprise storage, backup and cloud costs while making data easily available to cloud-based data lakes and analytics tools.

Read More