DATA ARCHITECTURE

Dremio Continues to Reduce the Zone of Confusion Between Data Lakes and Data Warehouses with New Dart Initiative Release

Dremio | October 21, 2021

Dremio, the SQL Lakehouse Platform company, today achieved another milestone in closing the gap between cloud data lakes and cloud data warehouses. Today’s release marks the second delivery in the company’s Dart Initiative, which enables customers to run all mission-critical SQL workloads directly on the cloud data lake.

Dremio embarked on the Dart Initiative in June 2021 to help companies run a greater range of mission-critical BI workloads directly on the data lake, delivering over 2x faster performance and drastically improved resource efficiency over previous Dremio versions. This subsequent Dart Initiative release introduces several more enhancements, including over 5x faster SQL expression processing over previous versions.

According to the 2020 Gartner® Market Guide for Analytics Query Accelerators report, “Analytics query accelerators seek to shrink the performance impact of the zone of confusion. Put another way, they are trying to move the “line of good enough” to the point where the data lake can provide sufficient optimization on the data to make it suitable for an increasing percentage of workloads.”1 With the Dart Initiative, Dremio seeks to leapfrog a "good enough" notion of data lakes, and make them the clear and obvious choice for BI and analytics workloads in the enterprise.

“It’s clear that the data lake can already support BI workloads of the most mission-critical nature. Three of the Fortune Five companies that already run Dremio in production today are doing just that. We want to push the boundaries of what’s possible in the data lakehouse and deliver the best BI experience for our customers. To that end, the Dart Initiative has been chipping away at the Zone of Confusion between data lakes and warehouses in critical areas such as query performance and acceleration, SQL coverage, and transactionality.”

Tomer Shiran, founder and Chief Product Officer at Dremio

Here are some of the key innovations of the Dremio Dart Initiative Fall 2021 release.

Scale-out Metadata Collection and Storage

Achieving near-instantaneous query startup times has been out of reach for traditional query engines, which must perform a significant amount of work to parse, plan, and gather dataset metadata for each query before it can be executed. In contrast, Dremio enables interactive performance directly on data lake storage by drastically reducing the amount of computation required at runtime. Dremio’s ability to efficiently compute, store, and leverage metadata plays a major role in enabling this.

This Dart Initiative release delivers near real-time metadata refresh for datasets, ensuring users are leveraging the most current or near real-time version of data, and receiving timely visibility into recent schema and data changes. Dremio has achieved data freshness through carefully refactoring metadata processing to become a parallel, executor-based process, with metadata now stored and managed in Apache Iceberg tables.

Parallelizing metadata processing across executors and leveraging capabilities and best practices from Iceberg makes all metadata operations much faster and more scalable, and in turn gives rise to a variety of benefits for users. In addition to the benefits mentioned, this enhanced metadata management approach enables Dremio to deliver metadata refresh times up to 20x faster than previous versions of Dremio, while governing them with the same workload management capabilities as queries, such as engine routing, priority, and concurrency controls. As demonstrated in Figure 1, performance improves as the dataset size increases. Data freshness effectively leads to more accurate insights and business decisions for enterprises across a variety of use cases, including customer experience and loyalty, marketing campaign optimization, operational efficiency, and customer 360.

Hardware-Optimized Query Processing

Dremio is an in-memory engine powered by Apache Arrow2, an open source columnar standard for in-memory computing that was co-created by Dremio. Gandiva, a component of Arrow, is an LLVM-based toolkit that enables vectorized execution directly on in-memory Arrow buffers, by generating code to evaluate SQL expressions that fully leverage the pipelining and SIMD capabilities of modern CPUs. This Dart Initiative release enables Dremio to dramatically accelerate expression processing rates by over 5x, ultimately providing a significant performance increase for end users.

Expanded SQL Coverage and Data Lakehouse Support

The Summer 2021 Dremio Dart Initiative empowered companies to run an even broader set of enterprise SQL workloads on Dremio by vastly expanding SQL coverage to include additional functions, operators, and SQL grammar constructs. The Fall 2021 Dart Initiative release extends the SQL coverage introduced through the prior Dart release, with functions such as Pivot/Unpivot and filtered aggregates. Risk analysis in insurance, maximizing revenue in travel and transportation, improving clinical trials in pharma, and enabling credit risk assessment in banking are among the many use cases that benefit from the expanded SQL coverage via this Dart release.

Aside from broadening the scope of SQL workloads, this Dart release also expands Dremio’s support for open-source table formats. Table formats, such as Apache Iceberg and Delta Lake, enable companies to perform inserts, updates, and deletes with transactional consistency, and time travel, directly on data lake storage. Table formats have surged in popularity as these features were previously only supported by data warehouses. With this release, companies can now run interactive BI workloads on both of the leading lakehouse table formats, Apache Iceberg and Delta Lake.

Spotlight

Every organization has at least two kinds of data consumers: the first is the executive or business leader who needs quick insights on reliable metrics, so they can make faster decisions. The second is the data scientist and analyst who looks for the stories and connections within the data that no one else can see.With so much activity going on in the big data space and new data touch points being measured every day, there will be an increasing need for data-driven individuals within organizations to make sense of it all.


Other News
BIG DATA MANAGEMENT

Huron to Acquire Healthcare Predictive Analytics Company Perception Health

Huron | December 23, 2021

Global professional services firm Huron today announced it has entered into an agreement to acquire Perception Health Inc., a healthcare predictive analytics company focused on bringing data sources together to illuminate opportunities for improved clinical and business decision-making. Huron’s deep healthcare expertise, technology and analytics capabilities combined with Perception Health’s analytics, predictive models and data platform will strengthen the firm’s ability to help providers uncover patterns of care to lower costs, improve patient outcomes and deliver a better healthcare experience. “The healthcare industry is under immense pressure to deliver high-quality, individualized care. This acquisition allows Huron to offer providers, payors and research institutions data insights across the care continuum to make better decisions and proactively impact patient care and clinical outcomes.” James H. Roth, chief executive officer of Huron Since its founding in 2014, Perception Health has been providing the healthcare industry predictive data insights and intelligence to illuminate opportunities for their clients to gain a competitive advantage. Perception Health’s robust intelligence platform of solutions enables providers, payors and research institutions to analyze network integrity, identify early disease risk factors and optimize patient care. All Perception Health employees will join Huron, including Gregg Loughman, chief executive officer, and Tod Fetherling, co-founder and chief data scientist. “We are thrilled to join a values-led and people-focused organization that shares our vision for transforming healthcare,” said Gregg Loughman, chief executive officer of Perception Health. “Huron and Perception Health are strategically aligned and committed to helping our clients harness the power of curated data and analytics to make smarter decisions that profoundly impact patient outcomes, experience and cost of care.” Perception Health will be included in Huron’s Healthcare operating segment. Terms of the acquisition, which is expected to close in December, were not disclosed. ABOUT HURON Huron is a global consultancy that collaborates with clients to drive strategic growth, ignite innovation and navigate constant change. Through a combination of strategy, expertise and creativity, we help clients accelerate operational, digital and cultural transformation, enabling the change they need to own their future. By embracing diverse perspectives, encouraging new ideas and challenging the status quo, we create sustainable results for the organizations we serve.

Read More

DATA ARCHITECTURE

Compass UOL and Furious Technologies announce partnership to provide data-driven pricing solutions

Compass UOL | December 24, 2021

Compass UOL, a global digital transformation company, and Furious Technologies, a North American revenue management and pricing optimization solutions provider, announced a strategic collaboration to provide online sellers in Latin America and USA with an integrated digital commerce solution. The partnership will combine advanced cloud-based data science and Compass UOL's digital next-generation platforms services portfolio with Furious Technologies' artificial intelligence models for data-driven pricing. It will enable organizations to innovate, make better data-driven decisions, solve business challenges, and increase business value. The objective is to seize insights and use data generated by customer digital interactions to help sellers optimize price and increase average value per customer by dynamically recommending relevant products and purchase incentives. Data-driven pricing, revenue management and higher margins While typical IT departments deal with an overload of demands and daily problems, which makes it difficult to meet business's innovation and expansion goals, Compass UOL and Furious Technologies believe that end-to-end customer journeys must be created in the cloud and optimized to ensure continuous growth. According to Alexis Rockenbach, CEO at Compass UOL, many companies are burdened by the need to increasingly use data to drive business decisions, solve supply chain issues and deal with the increasing material costs. As a result, they lack time and attention to customer touchpoints, or to sales and marketing overall. "Compass UOL and Furious Technologies combine experience, data science and proven methodologies to maximize revenue, deploy state-of-the-art technology, and mentor customer-oriented teams on business acceleration", concludes Rockenbach. Combining diverse US-based business teams, companies will be able to drive significant business development acceleration, having the potential to serve hundreds of B2B and B2C sellers in Latin America and the United States in the short term, and expand to other regions in the future. "Compass UOL and Furious' cloud and virtual structure offer customers a scalable solution to meet their immediate needs to both manage risk and combat disruption. This combination of flexibility and experience provides an optimal environment to perform and deliver business transformation solutions. This effort enables businesses to ensure that the entire online journey is set to improve service, revenue and exponentially expand customer reach." Ashley J. Swartz, CEO at Furious Technologies.

Read More

BIG DATA MANAGEMENT

SentinelOne Launches DataSet, a Revolutionary Live Enterprise Data Platform

SentinelOne | February 17, 2022

SentinelOne , an autonomous cybersecurity platform company, today announced the launch of DataSet, SentinelOne’s data analytics solution. Building upon the acquisition of Scalyr, DataSet expands beyond cybersecurity use cases delivering a limitless enterprise data platform for live data queries, analytics, insights, and retention. SentinelOne’s Singularity XDR platform was purpose-built to autonomously defend against security threats by addressing cybersecurity as a data problem. Data sets power AI models which instantly determine if behaviors are benign or malicious. Individual data points, automatically linked, deliver machine-made contextualized storylines across the enterprise for visibility and response. EDR and XDR hunting queries provide curated data sets for threat hunters to outperform adversaries. Every aspect of SentinelOne’s autonomous cybersecurity is underpinned in data expertise. Our journey in delivering market-leading autonomous cybersecurity spans processing petabytes of data, growing at an exponential scale and doing so in real time. “For cybersecurity to be effective, it must make split-second autonomous decisions because every millisecond matters. The way SentinelOne solves cybersecurity with data inspired us to apply our expertise beyond cybersecurity to a wide range of enterprise use cases. Our enterprise customers have the same data needs as SentinelOne - the ability to understand and action live data sets at speed. We’re announcing DataSet because we believe every business benefits from the power of understanding and acting on its data. Instantaneous, easy to use, and efficient understanding of a data set is the key to making better business decisions.” Tomer Weingarten, CEO, SentinelOne DataSet is a cloud-native flexible enterprise data platform built for all types of data – live or historical, at petabyte scale. By eliminating data schema requirements from the ingestion process and index limitations from querying, DataSet can process massive amounts of live data in real time, delivering log management, data analytics, and alerting with unparalleled speed, performance, and efficiency - built on a security and privacy-first foundation. Entering the Data-Defined Era “Distributed cloud infrastructure and containerized applications contribute to a vast amount of fast-moving data. The amount of data created in the next three years will be more than the data created over the past 30 years,” said Stephen Elliot, Group VP, Research IT, Cloud Operations, and DevOps at IDC. “The ability to cost-effectively analyze data at scale will become a necessity for every organization.” Asana, Copart, TomTom and DoorDash selected DataSet to analyze all types of data from an unbounded time horizon – streaming and historical. CTOs, CIOs, engineering, and IT operations teams select DataSet, replacing Elastic and Splunk, to harness the power of their data. Legacy data solutions are expensive, slow, and unable to scale at the real-time pace business and technology demands. In the data-defined era, we believe enterprises who are able to leverage their data most effectively will win in their respective markets. Market Adoption of DataSet “With DataSet, our engineering, infrastructure and security teams have one single source of truth to make data-driven decisions. We no longer have to stitch context across teams and use cases,” said Joshua Danielson, Chief Information Security Officer at Copart. “DataSet enables us to act based on data, reduce time to detect and resolve anomalies, and maintain security posture.” “Before DataSet, there was no central management of logs due to the diverse technologies at TomTom. Having to search multiple tools was holding us back, certainly during incidents,” said Carl Meert, Product Manager SRE and Observability at TomTom. “DataSet unifies all of our data from all sources. We are now much faster at detecting and responding to incidents.” Experience DataSet With the launch, SentinelOne has appointed Rahul Ravulur to lead DataSet. He brings more than 25 years of experience in building and operating enterprise products at scale, most recently leading product at Splunk. Ravulur will lead the DataSet business to accelerate market traction with leading data-driven enterprises. “SentinelOne is taking a bold step to externalize its data expertise - to help all businesses unlock the power of their data,” said Ravulur. “With the launch of DataSet, we help organizations overcome the slow, costly legacy platforms that can’t handle the scalability requirements of tomorrow. DataSet is built for the future of data insights and action.” About SentinelOne SentinelOne’s cybersecurity solution encompasses AI-powered prevention, detection, response and hunting across endpoints, containers, cloud workloads, and IoT devices in a single autonomous XDR platform.

Read More

BIG DATA MANAGEMENT

Anblicks is now a Microsoft Gold Partner for Data Analytics Competency

Anblicks | March 07, 2022

Anblicks, a US-based Cloud Data Analytics Company, has achieved Microsoft Gold-Certified competency for data analytics in the areas of Business Intelligence, Advanced Analytics, and Big Data. Data Analytics competency, given to the organizations that can demonstrate technical capabilities in creating business intelligence solutions and show proficiency in connecting data sources, performing data transformations, and modeling and visualizing data. As a Microsoft Certified Gold Partner, Anblicks provides Azure-based Data Analytics services that cover the entire data lifecycle, from data discovery, aggregation, storage, and ETL to data warehouse modeling, business intelligence reporting, and advanced analytics. The gold competency in data analytics is a continuation of Anblicks’ path of demonstrating certifications in the data domains that assist customers in generating powerful data insights. Helping them make data-driven decisions to create tailored experiences, reduce unnecessary costs, and generate revenue. “We are committed to helping our customers in leveraging Microsoft Azure for building highly scalable data pipelines from data integration, data storage, data governance, data analytics to business intelligence. Microsoft's GOLD partner status will help us build trust with our customers.” Munwar Shariff, Chief Technology Officer at Anblicks About Anblicks Anblicks is a Cloud Data Analytics company enabling customers to make data-driven decisions since 2004. Headquartered in Addison, Texas, Anblicks helps businesses accelerate their digital transformation journey, paving the road for new and streamlined business across the globe. The company commits to delivering excellence to the customers in Data Analytics, CloudOps, and Modern Apps using state-of-the-art services, solutions, and accelerators.

Read More

Spotlight

Every organization has at least two kinds of data consumers: the first is the executive or business leader who needs quick insights on reliable metrics, so they can make faster decisions. The second is the data scientist and analyst who looks for the stories and connections within the data that no one else can see.With so much activity going on in the big data space and new data touch points being measured every day, there will be an increasing need for data-driven individuals within organizations to make sense of it all.

Resources