Build a model to detect credit card fraud using thousands of features and billions of transactions. Intelligently recommend millions of products to millions of users. Estimate financial risk through simulations of portfolios including millions of instruments. Easily manipulate data from thousands of human genomes to detect genetic associations with disease. These are tasks that simply could not be accomplished 5 or 10 years ago. When people say that we live in an age of “big data,” they mean that we have tools for collecting, storing, and processing information at a scale previously unheard of. Sitting behind these capabilities is an ecosystem of open source software that can leverage clusters of commodity computers to chug through massive amounts of data.