BIG DATA MANAGEMENT

Alluxio Boosts AI/ML Support for Its Hybrid and Multi-Cloud Data Orchestration Platform

Alluxio | November 16, 2021

Data Orchestration Platform
Alluxio, the developer of open source data orchestration software for large-scale workloads, today announced the immediate availability of version 2.7 of its Data Orchestration Platform. This new release has led to 5x improved I/O efficiency for Machine Learning (ML) training at significantly lower cost by parallelizing data loading, data preprocessing and training pipelines. Alluxio 2.7 also provides enhanced performance insights and support for open table formats like Apache Hudi and Iceberg to more easily scale access to data lakes for faster Presto and Spark-based analytics.

“Alluxio 2.7 further strengthens Alluxio’s position as a key component for AI, Machine Learning, and deep learning in the cloud. With the age of growing datasets and increased computing power from CPUs and GPUs, machine learning and deep learning have become popular techniques for AI. This rise of these techniques advances the state-of-the-art for AI, but also exposes some challenges for the access to data and storage systems.”

Haoyuan Li, Founder and CEO, Alluxio

"We deployed Alluxio in a cluster of 1000 nodes to accelerate the data preprocessing of model training on our game AI platform. Alluxio has proven to be stable, scalable and manageable,” said Peng Chen, Engineer Manager in the big data team at Tencent. “As more and more big data and AI applications are containerized, Alluxio is becoming the top choice for large organizations as an intermediate layer to accelerate data analytics and model training."

“Data teams with large-scale analytics and AI/ML computing frameworks are under increasing pressure to make a growing number of data sources more easily accessible, while also maintaining performance levels as data locality, network IO, and rising costs come into play,” said Mike Leone, Analyst, ESG. “Organizations want to use more affordable and scalable storage options like cloud object stores, but they want peace of mind knowing they don’t have to make costly application changes or experience new performance issues. Alluxio is helping organizations address these challenges by abstracting away storage details while bringing data closer to compute, especially in hybrid cloud and multi-cloud environments.”

Alluxio 2.7 Community and Enterprise Edition features new capabilities, including:

Alluxio and NVIDIA’s DALI for ML
NVIDIA’s Data Loading Library (DALI) is a commonly used python library which supports CPU and GPU execution for data loading and preprocessing to accelerate deep learning. With release 2.7, the Alluxio platform has been optimized to work with DALI for python-based ML applications which include a data loading and preprocessing step as a precursor to model training and inference. By accelerating I/O heavy stages and allowing parallel processing of the following compute intensive training, end-to-end training on the Alluxio data platform achieves significant performance gains over traditional solutions. The solution is scale-out as opposed to other solutions suitable for smaller data set sizes.

Data Loading at Scale
At the heart of Alluxio’s value proposition is data management capabilities complimenting caching and unification of disparate data sources. As the use of Alluxio has grown for compute and storage spanning multiple geographical locations, the software continues to evolve to keep scaling using a new technique for batching data management jobs. Batching jobs, performed using an embedded execution engine for tasks such as data loading, reduces the resource requirements for the management controller lowering cost of provisioned infrastructure.

Ease of Use on Kubernetes
Alluxio now supports a native Container Storage Interface (CSI) Driver for Kubernetes, as well as a Kubernetes operator for ML making it easier than ever before to operate ML pipelines on the Alluxio platform in containerized environments. The Alluxio volume type is now natively available for Kubernetes environments. Agility and ease-of-use are a constant focus in this release.

Insight Driven Dynamic Cache Sizing for Presto
An intelligent new capability, called Shadow Cache, makes striking the balance between high performance and cost easy by dynamically delivering insights to measure the impact of cache size on response times. For multi-tenant Presto environments at scale, this new feature significantly reduces the management overhead with self-managing capabilities.

“Data platform teams utilize Alluxio to streamline data preprocessing and loading phases in a world where storage is separated from ML computation,” said Adit Madan, Senior Product Manager, Alluxio. “This simplicity enables maximum utilization of GPUs with frameworks such as Spark ML, Tensorflow and PyTorch. The Alluxio solution is available on multiple cloud platforms such as AWS, GCP, and Azure Cloud, and now also on Kubernetes in private data centers or public clouds.”

About Alluxio
Proven at global web scale in production for modern data services, Alluxio is the developer of open source data orchestration software for the cloud. Alluxio moves data closer to data analytics and machine learning compute frameworks in any cloud across clusters, regions, clouds and countries, providing memory-speed data access to files and objects. Intelligent data tiering and data management deliver consistent high performance to customers in financial services, high tech, retail and telecommunications. Alluxio is in production use today at eight out of the top ten internet companies. Venture-backed by Andreessen Horowitz, Seven Seas Partners and Volcanics Ventures. Alluxio was founded at UC Berkeley’s AMPLab by the creators of the Tachyon open source project.

Spotlight

Big data is a field that treats ways to analyze, systematically extract information from, or otherwise deal with data sets that are too large or complex to be dealt with by traditional data-processing application software.


Other News
BIG DATA MANAGEMENT

IBM Watson Advertising Brings AI-Driven Weather Analytics to AWS Data Exchange

IBM Watson | January 10, 2022

IBM Watson Advertising today announced the availability of data from The Weather Company, an IBM Business, on AWS Data Exchange, an Amazon Web Services (AWS) platform. The AWS Data Exchange allows businesses to easily find and subscribe to third-party data in the cloud. Providing data from the world's most accurate weather forecaster1, IBM Watson Advertising's Weather Analytics harness the relationship between weather and consumer behavior using artificial intelligence to extract deep insights to help businesses make more confident, data-driven and insightful enterprise decisions. The weather datasets can help analyze how weather affects consumer purchasing across different categories such as pharmaceuticals, apparel, consumer packaged goods and indoor and outdoor activities. Local data by ZIP code including historical weather data, 15-day forecast weather data, and relative data such as hot, cold, windy and other conditions could also be used to help inform campaigns, supply chain and forecasting decisions. This data can even help surface unique or non-obvious relationships between weather and consumer behavior. "We know that weather can impact nearly everything in daily life -- how we feel, what we do, even what we buy," said Sheri Bachstein, Chief Executive Officer at The Weather Company and General Manager of IBM Watson Advertising. "This expanded relationship with AWS gives more businesses access to the weather data that can drive consumer behavior and purchasing. We are committed to opening up our insights and technology to a broad set of organizations, and giving more companies access to what we know can be growth- and efficiency-driving data and tech." Insights that show locations where weather could affect sales can help businesses drive revenue based on predictive consumer behavior. According to past IBM Watson Advertising research, data revealed that while chocolate candy bar sales generally go up in colder months across the U.S., sales can spike in the Southwest when a higher heat index is expected, and in the Northeast during muggy nights. In another example, while more bug spray is purchased during the summer months, foggy conditions in the Northwest can drive more sales while clear conditions can drive demand in central states. AWS Data Exchange helps make it easy to find, subscribe to, and use third-party data from providers in the cloud. Subscribers can use the AWS console or APIs to load IBM Watson Advertising solutions into a wide variety of AWS analytics and machine learning services. This is the latest example of how IBM is building together with ecosystem partners of all types to create solutions for developers to address the needs of the hybrid cloud era. IBM is committed to a $1 billion investment in its partner ecosystem over the next three years. This investment is already being utilized to support a coalition of enterprises that are helping customers migrate their mission-critical workloads using IBM's open hybrid cloud architecture. About IBM Watson Watson is IBM's AI technology for business, helping organizations to better predict and shape future outcomes, automate complex processes, and optimize employees' time. Watson has evolved from an IBM Research project, to experimentation, to a scaled, open set of products that run anywhere. With more than 40,000 client engagements, Watson is being applied by leading global brands across a variety of industries to transform how people work.

Read More

DATA VISUALIZATION

Next Generation of Tableau Cloud Brings Advanced Analytics and Automated Insights to Business Users

Tableau Cloud | May 17, 2022

Today, at the annual Tableau Conference, Salesforce introduced Tableau Cloud, the fastest and easiest way for customers to get the full value of Tableau at enterprise scale. The offering is the next generation of what was formerly known as Tableau Online and includes new innovations to boost productivity by delivering intelligent, powerful and easy-to-use analytical tools to help anyone uncover insights and confidently make data-driven decisions. An integral part of the Salesforce Customer 360, Tableau empowers customers to surface and gain actionable insights from all their trusted data, creating a single source of truth, accessible anytime, anywhere. Market volatility and widespread supply chain disruptions make it increasingly challenging for companies to contain costs and keep their businesses moving forward. Data can help manage these complexities and changes. For example, connected supply chains and production lines generate a wealth of data and customers expect real-time visibility into when goods will arrive. A recent McKinsey study found a strong correlation between the success of an organization's planning and adoption of advanced analytics. Data-driven supply chain management offers new ways to avoid disruption and respond to unforeseen circumstances with speed and confidence. Tableau Cloud provides the leading analytics platform to meet customers where they prefer to operate their business. In fact, 70 percent of new customers choose Tableau Cloud over an on-premise or hybrid solution to power their analytics. Tableau also continues to offer self-managed solutions and is committed to providing customers the flexible options they need. "Speed, ease of use and flexibility have been key differentiators for Tableau and the reasons why customers rely on us to help transform their business through data-driven decision making and increased efficiency, With Tableau Cloud, we're making it easier for our customers to drive even more analytics success. Tableau Cloud helps our customers deliver the analytics they need to their users, while we ensure the highest levels of trust, availability and performance." -Francois Ajenstat, Chief Product Officer, Tableau at Salesforce. As part of the launch, Tableau is working with Snowflake to provide an extended promotional trial which includes Tableau Cloud licenses for Snowflake customers and, subject to program requirements, Snowflake credits upon conversion to a Tableau Cloud customer. New Tableau innovations delivers automated insights faster and more easily Tableau leverages the leading natural language and augmented analytics capabilities to help everyone use data to drive meaningful decisions. Data Stories adds automated plain-language explanations to Tableau dashboards at scale, helping customers understand and interact with data faster. Automating the analysis, build and communication of insights from data in a modern, easy-to-understand story format eliminates the need to explain dashboards repeatedly, makes data more accessible to business users and helps increase analytics adoption across the enterprise. Tableau is also expanding its Accelerators offering and the capabilities of the Tableau Exchange, a trusted hub of offerings that extend the Tableau Platform and help customers get faster time to value. Accelerators are ready-to-use, customizable dashboards that can be used across multiple industries, departments and enterprise applications to quickly deliver insights and value. Tableau now has more than 100 Accelerators on the Tableau Exchange, including those built by experts across the Tableau Partner Network, further expanding the unique use cases customers can apply. The Tableau Exchange also features a new in-product capability, enabling customers to explore and use any offering from the Tableau exchange directly in the product without requiring a separate download. This keeps people in the flow and enables them to get the right solution when they need it. New enterprise-ready capabilities to increase efficiency Tableau is also introducing Advanced Management, which helps Tableau customers manage, secure and scale mission-critical analytics across the enterprise. Administrators can gain deep insight into adoption and performance, leverage advanced encryption capabilities to meet security requirements and gain increased capacity limits to ensure teams and individuals have access to relevant data. Examples include: Customer-Managed Encryption Keys helps customers meet organizational compliance standards and add an additional layer of protection for their data. Activity Log provides detailed event data to help administrators keep track of how individuals are using Tableau. It also enables permission auditing to better implement controls over an enterprise's deployment. And with Admin Insights, data is retained for up to one year to help track dataset usage, license adoption and visualization load times. "Data is critical to delivering on the promise of leveraging mRNA science to create a new generation of transformative medicines for patients, Security, governance, scalability and manageability are important components of our overall data analytics strategy and we're excited to see how Advanced Management will make it easier and faster to optimize our deployment." -Adam Mico, Principal, Data Visualization and Enablement at Moderna. Bringing Einstein into Tableau delivers deeper insights across the Salesforce Customer 360 Tableau is also helping drive the Salesforce Customer 360 and empowering customers to fully leverage their data to gain actionable insights from their CRM data. Powered by Einstein Discovery's artificial intelligence (AI) and machine learning (ML) technology, Tableau is helping people with domain expertise make better decisions faster and with more confidence. For example, Model Builder enables business teams to collaboratively build and consume predictive models, using the Einstein Discovery engine, without having to leave their Tableau workflows. Infusing Einstein Discovery into CRM Analytics, the advanced analytics solution for CRM users, will help customers surface actionable insights directly in the Salesforce workflow: Einstein Discovery: Text Clustering leverages machine learning (ML) models to extract keywords from large text fields to quickly reveal hidden insights and improve decision making. Einstein Discovery: Bias Detection for multiclass models expands the use-cases for multi-class models by rooting out bias by variable, preventing the need to re-train an entire model. More information: Tableau's new capabilities will be available by the end of 2022 About Tableau Tableau helps people see and understand data. Tableau offers visual analytics with powerful AI, data management and collaboration. From individuals to organizations of all sizes, customers around the world love using Tableau's advanced analytics to fuel impactful, data-driven decisions. For more information, please visit www.tableau.com. About Salesforce Salesforce, the global CRM leader, empowers companies of every size and industry to digitally transform and create a 360° view of their customers.

Read More

DATA ARCHITECTURE

VAST Data Announces Newest Feature Releases

VAST Data | May 23, 2022

VAST Data, the data platform company for the AI-powered world, today announced the latest versions of Universal Storage, bringing enhanced enterprise security features, performance and scale to its flagship software offering. With an install base of multiple exabytes and an annual growth rate of 300%, customers are continually challenging VAST with new feature requests to power their data-intensive use cases. This release, in total, represents more than 30 new features that have been directly requested by VAST customers. Testament to VAST’s distinguished R&D team, the average turnaround of “feature request to code” is four months. “Since our founding, we have always maintained a customer-first mindset, continuously adding new features and functionality at a rapid pace to solve their ever-growing and changing application needs. Ultimately, we work to foster a close collaboration with our customers whose use cases are our product’s North Star. This level of agility is never seen from legacy providers of data infrastructure. Our responsive approach not only helps our customers manage their data easier, it also creates long-term business partnerships with customers and the broader market that is good for VAST.” Jeff Denworth, co-founder of VAST Data VAST’s Universal Storage data platform provides customers with a cloud-native containerized storage architecture, and eliminates storage tiering to unleash insights on their massive reserves of data. Versions 4.2 and 4.3 expand on an already stellar feature set, while continuing to deliver increased functionality, scalability and security features — and still improving system performance. Notable features in the latest release include: Enhanced Security VAST expands protection against ransomware attacks with Object Locks. Customers can set policies on buckets and objects to make them immutable, preventing users and applications from deleting or modifying an object before its expiry. Admins can also use S3 bucket policies to define permissions, enabling secure identity and access management. Now generally available in Universal Storage, Indestructible Snapshots provide an additional layer of protection that safeguards immutable snapshots and policies from sophisticated external or internal attackers. Compliance with Federal Information Processing Standards (FIPS) 140-2, using validated cryptographic libraries for encryption at rest. Flexible Cloud Data Management One platform that integrates S3 bucket management for integrated file and object storage. Customers can easily share data between file and object storage protocols — Universal Storage is the only platform that provides this functionality, giving customers the best of both worlds. Check out this blog post and deep-dive demo. Further improvements to VAST’s centralized Uplink Cloud Management system include integration with Zendesk, providing customers with a smooth and intuitive support experience to create, track and manage their support tickets. For more information about VAST’s Uplink Cloud Management service, check out this blog post and deep-dive demo. Enhanced Performance for Secure Protocols Support for NFS4 over RDMA, delivering a performance boost for NFS4. By extending NFS4 over RDMA, VAST is increasing speed while providing customers with an enhanced security blanket via a Kerberized connection. VAST is the only vendor to accelerate NFS4 with RDMA, making it possible to power high-performance high-scale HPC, AI, media and analytics workloads with a simple and secure client interface. About VAST Data VAST Data delivers the data platform at the heart of the AI-powered world, accelerating time-to-insight for workload-intensive applications. The performance, scalability, ease of use and cost efficiencies of VAST’s software helps enterprise organizations overcome the historic barriers to building all-flash data centers. Founded in 2019, VAST is the fastest-selling data infrastructure startup in history.

Read More

BIG DATA MANAGEMENT

SentinelOne Launches DataSet, a Revolutionary Live Enterprise Data Platform

SentinelOne | February 17, 2022

SentinelOne , an autonomous cybersecurity platform company, today announced the launch of DataSet, SentinelOne’s data analytics solution. Building upon the acquisition of Scalyr, DataSet expands beyond cybersecurity use cases delivering a limitless enterprise data platform for live data queries, analytics, insights, and retention. SentinelOne’s Singularity XDR platform was purpose-built to autonomously defend against security threats by addressing cybersecurity as a data problem. Data sets power AI models which instantly determine if behaviors are benign or malicious. Individual data points, automatically linked, deliver machine-made contextualized storylines across the enterprise for visibility and response. EDR and XDR hunting queries provide curated data sets for threat hunters to outperform adversaries. Every aspect of SentinelOne’s autonomous cybersecurity is underpinned in data expertise. Our journey in delivering market-leading autonomous cybersecurity spans processing petabytes of data, growing at an exponential scale and doing so in real time. “For cybersecurity to be effective, it must make split-second autonomous decisions because every millisecond matters. The way SentinelOne solves cybersecurity with data inspired us to apply our expertise beyond cybersecurity to a wide range of enterprise use cases. Our enterprise customers have the same data needs as SentinelOne - the ability to understand and action live data sets at speed. We’re announcing DataSet because we believe every business benefits from the power of understanding and acting on its data. Instantaneous, easy to use, and efficient understanding of a data set is the key to making better business decisions.” Tomer Weingarten, CEO, SentinelOne DataSet is a cloud-native flexible enterprise data platform built for all types of data – live or historical, at petabyte scale. By eliminating data schema requirements from the ingestion process and index limitations from querying, DataSet can process massive amounts of live data in real time, delivering log management, data analytics, and alerting with unparalleled speed, performance, and efficiency - built on a security and privacy-first foundation. Entering the Data-Defined Era “Distributed cloud infrastructure and containerized applications contribute to a vast amount of fast-moving data. The amount of data created in the next three years will be more than the data created over the past 30 years,” said Stephen Elliot, Group VP, Research IT, Cloud Operations, and DevOps at IDC. “The ability to cost-effectively analyze data at scale will become a necessity for every organization.” Asana, Copart, TomTom and DoorDash selected DataSet to analyze all types of data from an unbounded time horizon – streaming and historical. CTOs, CIOs, engineering, and IT operations teams select DataSet, replacing Elastic and Splunk, to harness the power of their data. Legacy data solutions are expensive, slow, and unable to scale at the real-time pace business and technology demands. In the data-defined era, we believe enterprises who are able to leverage their data most effectively will win in their respective markets. Market Adoption of DataSet “With DataSet, our engineering, infrastructure and security teams have one single source of truth to make data-driven decisions. We no longer have to stitch context across teams and use cases,” said Joshua Danielson, Chief Information Security Officer at Copart. “DataSet enables us to act based on data, reduce time to detect and resolve anomalies, and maintain security posture.” “Before DataSet, there was no central management of logs due to the diverse technologies at TomTom. Having to search multiple tools was holding us back, certainly during incidents,” said Carl Meert, Product Manager SRE and Observability at TomTom. “DataSet unifies all of our data from all sources. We are now much faster at detecting and responding to incidents.” Experience DataSet With the launch, SentinelOne has appointed Rahul Ravulur to lead DataSet. He brings more than 25 years of experience in building and operating enterprise products at scale, most recently leading product at Splunk. Ravulur will lead the DataSet business to accelerate market traction with leading data-driven enterprises. “SentinelOne is taking a bold step to externalize its data expertise - to help all businesses unlock the power of their data,” said Ravulur. “With the launch of DataSet, we help organizations overcome the slow, costly legacy platforms that can’t handle the scalability requirements of tomorrow. DataSet is built for the future of data insights and action.” About SentinelOne SentinelOne’s cybersecurity solution encompasses AI-powered prevention, detection, response and hunting across endpoints, containers, cloud workloads, and IoT devices in a single autonomous XDR platform.

Read More

Spotlight

Big data is a field that treats ways to analyze, systematically extract information from, or otherwise deal with data sets that are too large or complex to be dealt with by traditional data-processing application software.

Resources