. home.aspx



Data and containers and the keys to success

September 05, 2018 / Paul Curtis

In the beginning, workloads, tools, and requirements for big data were simple because big data wasn’t really all that big. When we hit 5TB of data, however, things got complicated. Large data sets weren’t well suited to traditional storage like NAS, and large sequential reading of terabytes of data didn’t work well with traditional shared storage. As big data evolved, the analytics tools graduated from custom code like MapReduce, Hive, and Pig to tools like Spark, Python, and Tensorflow, which made analysis easier. With these newer tools came additional requirements that traditional big data storage couldn’t handle, including millions of files, read-writes, and random access for updates. The only constant was the data itself.