ML Powers Discovery In GE’s 500 PB Lake
September 25, 2018 / Alex Woodie
Like most Fortune 50 firms, General Electric relies on an abundance of computer systems to power its enterprise. And like most firms that size, synching up and aligning the data emitted by different systems is major challenge. But thanks to an innovative data discovery solution powered by machine learning, GE found a solution. GE’s Hadoop-based data lake contains 500 PB of data that originated from about 120 different systems, according to Diwakar Goel, the VP and Chief Data Officer of GE Digital and Finance. Data is sourced from a variety of ERP packages, accounting systems, and other applications, such as Ariba, Concur, and Salesforce.com. Even LinkedIn and Twitter data makes it into the lake for downstream sentiment analysis.