DATA ARCHITECTURE

Databricks Launches Data Lakehouse for Retail and Consumer Goods Customers

Databricks | January 14, 2022

Databricks, the Data and AI company and pioneer of the data lakehouse architecture, today announced the Databricks Lakehouse for Retail, the company's first industry-specific data lakehouse for retailers and consumer goods (CG) customers. With Databricks' Lakehouse for Retail, data teams are enabled with a centralized data and AI platform that is tailored to help solve the most critical data challenges that retailers, partners, and their suppliers are facing. Early adopters of Databricks' Lakehouse for Retail include industry-leading customers and partners like  Walgreens, Columbia, H&M Group, Reckitt, Restaurant Brands International, 84.51°(a subsidiary of Kroger Co.), Co-Op Food, Gousto, Acosta and more.

"As the retail and healthcare industries continue to undergo transformative change, Walgreens has embraced a modern, collaborative data platform that provides a competitive edge to the business and, most importantly, equips our pharmacists and technicians with timely, accurate patient insights for better healthcare outcomes," said Luigi Guadagno, Vice President, Pharmacy and HealthCare Platform Technology at Walgreens. "With hundreds of millions of prescriptions processed by Walgreens each year, Databricks' Lakehouse for Retail allows us to unify all of this data and store it in one place for a full range of analytics and ML workloads. By eliminating complex and costly legacy data silos, we've enabled cross-domain collaboration with an intelligent, unified data platform that gives us the flexibility to adapt, scale and better serve our customers and patients."

"Databricks has always innovated on behalf of our customers and the vision of lakehouse helps solve many of the challenges retail organizations have told us they're facing," said Ali Ghodsi, CEO and Co-Founder at Databricks. "This is an important milestone on our journey to help organizations operate in real-time, deliver more accurate analysis, and leverage all of their customer data to uncover valuable insights. Lakehouse for Retail will empower data-driven collaboration and sharing across businesses and  partners in the retail industry."

Databricks' Lakehouse for Retail delivers an open, flexible data platform, data collaboration and sharing, and a collection of powerful tools and partners for the retail and consumer goods industries. Designed to jumpstart the analytics process, new Lakehouse for Retail Solution Accelerators offer a blueprint of data analytics and machine learning use cases and best practices to save weeks or months of development time for an organization's data engineers and data scientists. Popular solution accelerators for Databricks' Lakehouse for Retail customers include:

  • Real-time Streaming Data Ingestion: Power real-time decisions critical to winning in omnichannel retail with point-of-sale, mobile application, inventory and fulfillment data.
  • Demand forecasting and time-series forecasting: Generate more accurate forecasts in less time with fine-grained demand forecasting to better predict demand for all items and stores.
  • ML-powered recommendation engines: Specific recommendations models for every stage of the buyer journey - including neural network, collaborative filtering, content-based recommendations and more - enable retailers to create a more personalized customer experience.
  • Customer Lifetime Value: Examine customer attrition, better predict behaviors of churn, and segment consumers by lifetime and value with a collection of customer analytics accelerators to help improve decisions on product development and personalized promotions.

Additionally, industry-leading Databricks partners like Deloitte and Tredence are driving lakehouse vision and value by delivering pre-built analytics solutions on the lakehouse platform that address real-time customer use cases. Tailor-made for the retail industry, featured partner solutions and platforms include:

  • Deloitte's Trellis solution accelerator for the retail industry is one of many examples of how Deloitte and client partners are adopting the Databricks Lakehouse architecture construct and platform to deliver end-to-end data and AI/ML capabilities in a simple, holistic, and cost-effective way. Trellis provides capabilities that solve retail clients' complex challenges around forecasting, replenishment, procurement, pricing, and promotion services. Deloitte has leveraged their deep industry and client expertise to build an integrated, secured, and multi-cloud ready "as-a-service" solution accelerator on top of Databricks' Lakehouse platform that can be rapidly customized as appropriate based on client's unique needs. Trellis has proven to be a game-changer for our joint clients as it allows them to focus on the critical shifts occurring both on the demand and supply side with the ability to assess recommendations, associated impact, and insights in real-time that result in significant improvement to both topline and bottom line numbers.
  • Tredence will meet the explosive enterprise Data, AI & ML demand and deliver real-time transformative industry value for their business by delivering solutions for Lakehouse for Retail. The partnership first launched the On-Shelf Availability Solution (OSA) accelerator in August 2021, combining Databricks' data processing capability and Tredence's AI/ML expertise to enable Retail, CPG & Manufacturers to solve their trillion dollar out-of-stock challenge. Now with Lakehouse for Retail, Tredence and Databricks will jointly expand the portfolio of industry solutions to address other customer challenges and drive global scale together.

About Databricks
Databricks is the data and AI company. More than 5,000 organizations worldwide — including Comcast, Condé Nast, H&M, and over 40% of the Fortune 500 — rely on the Databricks Lakehouse Platform to unify their data, analytics and AI. Databricks is headquartered in San Francisco, with offices around the globe. Founded by the original creators of Apache Spark™, Delta Lake and MLflow, Databricks is on a mission to help data teams solve the world's toughest problems.

Spotlight

For the past several years, the IT industry has been re-ordered, driven by simultaneous developments in data, analytics, cloud, mobile, social media, and the Internet of Things. At the same time, business leaders have shifted their reliance upon gut instinct for decision making to new forms of business value by becoming data-driven, which is enabling them to reinvent professions and transform their industry with data. Together, these shifts, underpinned by a foundation on data-driven insights are propelling companies into the next business era, where they interact with data to reason, adapt and continuously learn. Once again, IBM is leading the world to this new era of computing: the cognitive era. An era where all data is turned from obstacle to opportunity, and where we can tackle some of the most enduring systemic issues facing our planet in an iterative and rapid fashion. These issues range from cancer and climate change to an increasingly interconnected global economy


Other News
BIG DATA MANAGEMENT

Tamr Introduces Tamr Enrich to Simplify and Improve the Data Mastering Process

Tamr, Inc. | May 21, 2022

Tamr, the leading cloud-native data mastering solution, today announced the introduction of Tamr Enrich, a set of enrichment services built natively into the data mastering process. Using Tamr’s patented human-guided machine learning, Tamr Enrich curates and actively manages external datasets and services, enabling customers to seamlessly embed trusted, high-quality external data insights to their data mastering pipelines for richer business. “Companies go to great lengths, spending millions of dollars to attempt to derive business value from disparate data sources,” said Anthony Deighton, Tamr’s chief product officer. “We’re excited to offer a built-in solution that provides one-click simplicity and makes customers’ data cleaner and more complete.” For example, when a business wants to integrate an application programming interface (API) for address validation and standardization, it requires significant investment to implement the capability, including hiring a vendor to find the source, build the integration, maintain the integration, manage the vendor and make updates. Tamr Enrich eliminates many of these obstacles, enabling clean, trusted data without complexity. Customers also benefit from Tamr’s unique ability to continuously add new enrichment sources versus existing solutions in the market today that offer a static set of services. Tamr Enrich allows customers to unlock more value from mastered data faster to: Improve match rates. Customers can realize a potential 2x improvement. Automate more. Customers see model confidence improved by 20+%. Simplify data enrichment. Services are fully managed by Tamr and delivered in one click. Eliminate broken data. Tamr Enrich allows customers to identify contacts or companies with no valid contact information and accelerate time-to-insight. Standardize values. Data is ready for use in analytics tools. Expand insights. New attributes unlock new uses for existing data. “We previously needed to manage multiple vendors, which was expensive operationally and added significant complexity to our data operations,” said Harveer Singh, Chief Data Architect at Western Union, “Tamr Enrich is a game-changer. It gives us a complete, integrated solution that enables Western Union to deliver a seamless digital experience to our customers. Tamr’s made it easier to maintain clean, curated data across the customer journey.” About Tamr, Inc. Tamr is a leading data mastering company, accelerating the business outcomes of the world’s largest organizations by powering analytic insights, boosting operational efficiency, and enhancing data operations. Tamr’s cloud-native solutions offer an effective alternative to traditional Master Data Management (MDM) tools, using machine learning to do the heavy lifting to consolidate, cleanse, and categorize data. Tamr is the foundation for modern DataOps at large organizations, including industry leaders like Toyota, Santander, and GSK. Backed by investors including NEA and Google Ventures, Tamr transforms how companies get value from their data.

Read More

BIG DATA MANAGEMENT

Cloudian Partners with WEKA to Deliver High-Performance, Exabyte-Scalable Storage for AI, Machine Learning and Other Advanced Analytics

Cloudian | January 20, 2022

Cloudian® today announced the integration of its HyperStore® object storage with the WEKA Data Platform for AI, providing high-performance, exabyte-scalable private cloud storage for processing iterative analytical workloads. The combined solution unifies and simplifies the data pipeline for performance-intensive workloads and accelerated DataOps, all easily managed under a single namespace. In addition, the new solution reduces the storage TCO associated with data analytics by a third, compared to traditional storage systems. Advanced Analytics Workloads Create Data Storage Challenges Organizations are consuming and creating more data than ever before, and many are applying AI, machine learning (ML) and other advanced analytics on these large data sets to make better decisions in real-time and unlock new revenue streams. These analytics workloads create and use massive data sets that pose significant storage challenges, most importantly the ability to manage the data growth and enable users to extract timely insights from that data. Traditional storage systems simply can’t handle the processing needs or the scalability required for iterative analytics workloads and introduce bottlenecks to productivity and data-driven decision making. Cloudian-WEKA Next Generation Storage Platform Together, Cloudian and WEKA enable organizations to overcome the challenges of accelerating and scaling their data pipelines while lowering data analytics storage costs. WEKA’s data platform, built on WekaFS, addresses the storage challenges posed by today’s enterprise AI workloads and other high-performance applications running on-premises, in the cloud or bursting between platforms. The joint solution offers the simplicity of NAS, the performance of SAN or DAS and the scale of object storage, along with accelerating every stage of the data pipeline from data ingestion to cleansing to modeled results. Integrated through WEKA’s tiering function, Cloudian’s enterprise-grade, software-defined object storage provides the following key benefits: High Performance – Run concurrent workloads while eliminating compute cluster bottlenecks and reducing processing times. Exabyte Scalability – Grow deployments on demand, from terabytes to an exabyte without disruption, achieving the flexibility and elasticity of the public cloud within a private data center or hybrid cloud model. Enterprise-grade Security – Protect data with encryption in flight and at rest, integrated firewall, RBAC/IAM and SAML access controls, and certification with the most rigorous regulatory requirements, such as Common Criteria, FIPS and SEC Rule 17a-4(f). Resiliency – Achieve high data durability with the option to protect and distribute data using replication or erasure coding, thereby eliminating the need for a separate data backup process. Multi-tenancy – Provision multiple users on shared infrastructure without compromising security. Cost-effective – Save on storage costs, as the solution runs on standard x86 hardware with local NVMe SSDs. “As organizations increasingly employ AI, ML and other advanced analytics to extract greater value from their data, they need a modern storage platform that enables fast, easy data processing and management,” said Jonathan Martin, president, WEKA. “The combination of the WEKA Data Platform and Cloudian object storage provides an ideal solution that can seamlessly and cost-effectively scale to meet growing demands.” “When it comes to supporting advanced analytics applications, users shouldn’t have to make tradeoffs between storage performance and capacity,” said Jon Toor, chief marketing officer, Cloudian. “By eliminating any need to compromise, the integration of our HyperStore software with the WEKA Data Platform gives customers a storage foundation that enables them to fully leverage these applications so they can gain new insights from their data and drive greater business and operational success.” About Cloudian Cloudian is the most widely deployed independent provider of object storage. With a native S3 API, it brings the scalability and flexibility of public cloud storage into the data center while providing ransomware protection and reducing TCO by up to 70% compared to traditional SAN/NAS and public cloud. The geo-distributed architecture enables users to manage and protect object and file data across sites—on-premises and in the cloud—from a single platform.

Read More

BIG DATA MANAGEMENT

Alibaba Cloud Forms Partnership with Starburst To Bring The Analytics Engine For Data Mesh to Asia-Pacific Region

Starburst | December 22, 2021

After a year of record financing and year-over-year growth across sales channels, hiring and the global customer base, Starburst, the analytics anywhere company, is announcing a partnership with Alibaba Cloud, the digital technology and intelligence backbone of Alibaba Group, to deliver Starburst Enterprise to the Greater China market. Through this partnership, Alibaba Cloud is providing engineering to integrate Starburst Enterprise on Alibaba Cloud, as well as providing the sales, services, and support resources needed to deliver a seamless customer experience. Starburst is currently available on major public and private cloud platforms, including AWS, Azure, GCP, Red Hat, and HPE, but this partnership is a key step in providing the analytics engine for data mesh to Alibaba Cloud customers in Greater China. According to IDC, over the past five years, the compound annual growth rate of China's public cloud market has reached 61.1%, which is significantly higher than the 23.8% growth in the United States. With 87.4% of Chinese firms using open-source technologies, making Chinese users the second most prolific group on GitHub after the United States, this expansion presents a huge market opportunity for Starburst to enable Alibaba Cloud customers to provide better data access through a data mesh architecture, deliver data as a product and empower business users to make more informed decisions. "China, like the rest of the world, has been impacted by the digital pressure spurred on by the pandemic," said Dr Jia Yangqing, VP of Alibaba Group, Senior Fellow of the Computing Platform, Alibaba Cloud Intelligence. "As long-time users and supporters of the Trino project, we're very excited to leverage the open-source community and now this enterprise distribution of Starburst on Alibaba Cloud so that large enterprises in China can take advantage of its power to accelerate their digital transformation initiatives in the face of the challenges created by the pandemic." With this partnership, Starburst is now uniquely available on nearly every major platform and can be seamlessly procured through their marketplaces. The Alibaba Cloud partnership comes on the heels of Starburst's recent release of Stargate, a new product offering that is intended to serve as a single point of access to data across borders, with fast query performance, while meeting data privacy and sovereignty requirements. Through this partnership, companies based in China and China-based subsidiaries of multinational companies can now easily leverage these powerful, global analytics capabilities to build architectures that reflect today's global nature of business. "Global companies find that their enterprise data is increasingly spread across multiple clouds and disparate geographic regions. Starburst's vision is to unlock the analyst within us all and allow access to all of a company's data, no matter where it resides. So we naturally want to be everywhere our customers want to be. Alibaba Cloud is the perfect partner to help us extend our analytics engine for this global data mesh to the greater China market and facilitate a deeper level of hybrid analytics that hasn't previously been possible in this region." Justin Borgman, CEO of Starburst After a year of explosive growth, Starburst is poised to continue its mission of bringing analytics anywhere. This partnership is a key step in achieving that mission. To learn more, please visit starburst.io. About Starburst Starburst is the analytics engine for the data mesh. We unlock the value of distributed data by making it fast and easy to access, no matter where it lives. Starburst queries data across any database, making it instantly actionable for data-driven organizations. With Starburst, teams can lower the total cost of their infrastructure and analytics investments, prevent vendor lock-in, and use the existing tools that work for their business. Trusted by companies like Comcast, FINRA, and Condé Nast, Starburst helps companies make better decisions faster on all data.

Read More

DATA SCIENCE

DataRobot Core Unveiled, Complete with Capabilities for the Expert Data Scientist

DataRobot | December 17, 2021

DataRobot today announced DataRobot Core, a comprehensive offering that broadens its AI Cloud platform for code-first data science experts. DataRobot also announced its latest platform release, extending the capabilities of AI Cloud for all users with broader and more sophisticated analytical capabilities for data scientists, enhanced decision intelligence, and new features to manage and scale operations in production. The unprecedented demand for AI, combined with the complexity in delivering AI to production, has created significant delays in data science initiatives for all businesses at a time when AI has never been more vital to business outcomes: 87% of organizations continue to struggle with long deployment timelines, while data scientists spend at least 50% of their time on non-strategic model deployment. To scale quickly and remain agile, data science teams need the tools and product capabilities to deliver high-impact results, faster. DataRobot Core brings together a complete portfolio of purpose-built capabilities that give data scientists ultimate flexibility in how they deliver AI to the business, enabling faster experimentation and rapid time to value, while making teams more efficient and effective at driving clear business impact from AI: Platform: Unified environment with first-class, embedded and multilanguage notebook experience; Composable ML to seamlessly pivot between code-first and automated model generation; code-centric pipelines on top of Apache Spark; open API to enable programmatic access to the full AI Cloud platform; and built for the modern enterprise with support for the reliability, governance, compliance and scale needs across industries. Resources: Extensive portfolio of accelerators, third-party integrations and libraries to expedite AI delivery and drive efficiency, along with evolving education resources to advance skills and enable data scientists to stay at the cutting edge. Community: Shared knowledge and access to the unique expertise of the DataRobot team, industry experts and thousands of community members from DataRobot customers representing some of the largest and most successful AI implementations in the world. DataRobot’s team of over 300 data scientists are pioneering efforts in AI, with applied expertise across more than a million active projects for customers across industries on a global scale. Leveraging DataRobot AI Cloud, full service direct mortgage lender Embrace Home Loans eliminated 43 million lines of code, freeing up their data scientists to build even more complex and strategic solutions. “DataRobot has been transformational for our business,” said Keith Portman, Chief Analytics Officer at Embrace Home Loans. “DataRobot’s AI Cloud platform enabled us to double our return on marketing investment spend and maintain a notebook-first approach. Our data scientists can now build complex models with flexibility and seamless integration, gaining back hours of time.” Alongside Core, the launch of DataRobot 7.3 introduces over 80 new features and capabilities designed for all users to enable AI-driven decisions across all lines of business, within a single platform. DataRobot 7.3 offers: Expanded Support for Diverse Use Cases. Giving data science teams native, out-of-the-box flexibility across data types, users can now run anomaly detection with images and leverage the next generation of Text AI, as well as comprehensive tools, including Multimodal Clustering, Time-Series Segmented Modeling and Multilabel Classification. Better, Faster Decisions with Decision Intelligence. Teams can rapidly deploy models that combine complex rules and business logic with post-process prediction scores with simple APIs, and build fully customized AI applications in a matter of minutes with no coding required. Enhanced Performance Monitoring, Compliance and Regulatory Capabilities. Automated compliance documentation now extends to custom models built outside of DataRobot, streamlining regulation readiness for all users. With all models in production, users can easily evaluate and compare challenger models against live models, and clearly see if a model should be replaced in order to maintain peak performance for the business. “For organizations today, translating data and AI into tangible outcomes is critical in order to remain competitive and thrive,” said Nenshad Bardoliwalla, Chief Product Officer at DataRobot. “DataRobot Core and 7.3 are designed to meet increasing demand and scale, and empower the largest number of AI creators, from code-centric data science teams to business analysts and decision makers, to experiment fast and collaborate effectively on the same platform. Together, these solutions provide the much-needed flexibility, speed and control that brings trustworthy AI solutions to life for every organization.” In support of DataRobot Core, DataRobot is also announcing an expanded partnership with AtScale to deliver more comprehensive data access and feature modeling to customers. AtScale brings its semantic layer technology to DataRobot Core, simplifying connections from DataRobot to a broad range of cloud data platforms and providing a powerful modeling canvas for feature engineering. Together, DataRobot and AtScale deliver complete services for organizations to operationalize AI/ML workloads with support for a wide range of data platforms, protocols and visualization platforms. About DataRobot DataRobot AI Cloud is the next generation of AI. DataRobot's AI Cloud vision is to bring together all data types, all users, and all environments to deliver critical business insights for every organization. DataRobot is trusted by global customers across industries and verticals, including a third of the Fortune 50.

Read More

Spotlight

For the past several years, the IT industry has been re-ordered, driven by simultaneous developments in data, analytics, cloud, mobile, social media, and the Internet of Things. At the same time, business leaders have shifted their reliance upon gut instinct for decision making to new forms of business value by becoming data-driven, which is enabling them to reinvent professions and transform their industry with data. Together, these shifts, underpinned by a foundation on data-driven insights are propelling companies into the next business era, where they interact with data to reason, adapt and continuously learn. Once again, IBM is leading the world to this new era of computing: the cognitive era. An era where all data is turned from obstacle to opportunity, and where we can tackle some of the most enduring systemic issues facing our planet in an iterative and rapid fashion. These issues range from cancer and climate change to an increasingly interconnected global economy

Resources