Making Big Data Processing Simple with Spark," Matei Zaharia

As data volumes grow, we need programming tools for parallel applications that are as easy to use and versatile as those for single machines. The Spark project started at UC Berkeley to meet these goals. Spark is based on two main ideas. First, it has a language-integrated API in Python, Java, Scala and R, based on functional programming, that makes it easy to build applications out of functions to run on a cluster. Second, it offers a general engine that can support streaming, batch, and interactive computations, as well as advanced analytics such as machine learning, and lets users combine them in one program. Since its release in 2010, Spark has become a highly active open source project, with over 900 contributors and a broad set of built-in libraries. This talk will cover the main ideas behind the Spark programming model, and recent additions to the project…

Spotlight

EdgeLeap

EdgeLeap is a data science company, providing consultancy and data analysis services to and life science and healthcare industry. The EdgeLeap team helps its clients to keep up with the quick pace of developments in the data science field and bridge the gap between data generation and data-driven decision making. As an agile partner for clinicians and food and pharma R&D teams, we deliver tailored solutions to help our clients to leverage their investments in data generation and stay ahead in an increasingly data-driven world.

OTHER VIDEOS

Project Management: Data analysis and project success | What is data analysis in projects?

video | October 8, 2021

Project data analytics uses past and current project data to enable effective decisions on project delivery. All projects create data and this can be used to help project managers make choices about current and future projects. Project data analysis is all about being inquisitive and asking questions to find information but we must also interpret data in a meaningful way....

Watch Now

Power of industrial data analytics resides in data transformation

video | June 1, 2021

The power of data analytics in industrial context resides in how you transform and prepare your data. Let Wizata Data Scientist, Jeremy Lambert, explains why and how you can get more insights from your data with a properly prepared dataset....

Watch Now

InsightSquared Revenue Analytics & Forecasting

video | May 21, 2021

InsightSquared delivers full funnel revenue analytics and predictive forecasting for your entire customer journey. Finally complete visibility into your revenue processes—the visibility you need to understand where, why and how you win. It's your blueprint to more revenue....

Watch Now

NodeGraph — The smarter, all-in-one data intelligence stack

video | April 12, 2021

Choose NodeGraph — the automated solution for all your data & analytics tools — collecting data governance, data quality, data catalog and data lineage functionalities all under one roof....

Watch Now

Spotlight

EdgeLeap

EdgeLeap is a data science company, providing consultancy and data analysis services to and life science and healthcare industry. The EdgeLeap team helps its clients to keep up with the quick pace of developments in the data science field and bridge the gap between data generation and data-driven decision making. As an agile partner for clinicians and food and pharma R&D teams, we deliver tailored solutions to help our clients to leverage their investments in data generation and stay ahead in an increasingly data-driven world.

Events