Logical Architectures for Big Data Analytics

| February 17, 2017

article image
If you check the reference architectures for big data analytics proposed by Forrester and Gartner, or ask your colleagues building big data analytics platforms for their companies (typically under the ‘enterprise data lake’ tag), they will all tell you that modern analytics need a plurality of systems: one or several Hadoop clusters, in-memory processing systems, streaming tools, NoSQL databases, analytical appliances and operational data stores, among others (see Figure 1 for an example architecture).

Spotlight

ThoughtFocus

ThoughtFocus is a US based, privately held consulting, software engineering and business process management firm with offices in the US, India and the Philippines. It helps clients in the financial services, manufacturing, education, aerospace and technology industries with their key business and technology challenges

OTHER ARTICLES

Understanding Big Data and Artificial Intelligence

Article | June 18, 2021

Data is an important asset. Data leads to innovation and organizations tend to compete for leading these innovations on a global scale. Today, every business requires data and insights to stay relevant in the market. Big Data has a huge impact on the way organizations conduct their businesses. Big Data is used in different enterprises like travel, healthcare, manufacturing, governments, and more. If they need to determine their audience, understand what clients want, forecast the needs of the customers and the clients, AI and big data analysis is vital to every decision-making scenario. When companies process the collected data accurately, they get the desired results, which leads them to their desired goals. The term Big Data has been around since the 1990s. By the time we could fully comprehend it, Big Data had already amassed a huge amount of stored data. If this data is analyzed properly, it would reveal valuable industry insights into the industry to which the data belonged. IT professionals and computer scientists realized that going through all of the data and analyzing it for the purpose was too big of a task for humans to undertake. When artificial intelligence (AI) algorithm came into the picture, it accomplished analyzing the accumulated data and deriving insights. The use of AI in Big Data is fundamental to get desired results for organizations. According to Northeastern University, the amount of data in the world was 4.4 zettabytes in 2013. By of 2020, the data rose to 44 zettabytes. When there is this amount of data produced globally, this information is invaluable to the enterprises and now can leverage AI algorithms to process it. Because of this, the companies can understand and influence customer behavior. By 2018, over 50% of countries had adopted Big Data. Let us understand what Big Data, convergence of big data and AI, and impact of AI on big data analytics. Understanding Big Data In simple words, Big Data is a term that comprises every tool and process that helps people use and manage vast sets of data. According to Gartner, Big Data is a “high-volume and/or high-variety information assets that demand cost-effective, innovative forms of information processing to enable enhanced insight, decision-making, and process automation.” The concept of Big Data was created to capture trends, preferences, and user behavior in one place called the data lake. Big Data in enterprises can help them analyze and configure their customers’ motivations and come up with new ideas for the creation of new offerings. Big Data studies different methods of extracting, analyzing, or dealing with data sets that are too complicated for traditional data processing systems. To analyze a large amount of data requires a system designed to stretch its extraction and analysis capability. Data is everywhere. This stockpile of data can give us insights and business analytics to the industry belonging to the data set. Therefore, the AI algorithms are written to benefit from large and complex data. Importance of Big Data Data is an integral part of understanding customer demographics and their motivations. When customers interact with technology in active or passive manner, these actions create a new set of data. What contributes to this data creation is what they carry with them every day - their smartphones. Their cameras, credit cards, purchased products all contribute to their growing data profile. A correctly done analysis can tell a lot about their behavior patterns, personality, and events in the customer’s life. Companies can use this information to rethink their strategies, improve on their product, and create targeted marketing campaigns, which would ultimately lead them to their target customer. Industry experts, for years and years, have discussed Big Data and its impact on businesses. Only in recent years, however, has it become possible to calculate that impact. Algorithms and software can now analyze large datasets quickly and efficiently.The forty-four zettabyte of data will only quadruple in the coming years. This collection and analysis of the data will help companies get the AI insights that will aid them in generating profits and be future-ready. Organizations have been using Big Data for a long time. Here’s how those organizations are using Big Data to drive success: Answering customer questions Using big data and analytics, companies can learn the following things: • What do customers want? • Where are they missing out on? • Who are their best and loyal customers? • Why people choose different products? Every day, as organizations gather more information, they can get more insights into sales and marketing. Once they get this data, they can optimize their campaigns to suit the customer’s needs. Learning from their online habits and with correct analysis, companies can send personalized promotional emails. These emails may prompt this target audience to convert into full-time customers. Making confident decisions As companies grow, they all need to make complex decisions. With in-depth analysis of marketplace knowledge, industry, and customers, Big Data can help you make confident choices. Big Data gives you a complete overview of everything you need to know. With the help of this, you can launch your marketing campaign or launch a new product in the market, or make a focused decision to generate the highest ROI. Once you add machine learning and AI to the mix, your Big Data collections can form a neural network to help your AI suggest useful company changes. Optimizing and Understanding Business Processes Cloud computing and machine learning help you to stay ahead by identifying opportunities in your company’s practices. Big Data analytics can tell you if your email strategy is working even when your social media marketing isn’t gaining you any following. You can also check which parts of your company culture have the right impact and result in the desired turnover. The existing evidence can help you make quick decisions and ensure you spend more of your budget on things that help your business grow. Convergence of Big Data and AI Big Data and Artificial Intelligence have a synergistic relationship. Data powers AI. The constantly evolving data sets or Big Data makes it possible for machine learning applications to learn and acquire new skills. This is what they were built to do. Big Data’s role in AI is supplying algorithms with all the essential information for developing and improving features, pattern recognition capabilities. AI and machine learning use data that has been cleansed of duplicate and unnecessary data. This clean and high-quality big data is then utilized to create and train intelligent AI algorithms, neural networks, and predictive models. AI applications rarely stop working and learning. Once the “initial training” is done (initial training is preparing already collected data), they adjust their work as and when the data changes. This makes it necessary for data to be constantly collected. When it comes to businesses using this technology, AI helps them use Big Data for analytics by making advanced tools accessible and obtainable to help users gain insights that would otherwise have been hidden in the huge amount of data. Once firms and businesses gain a hold on using AI and Big Data, they can provide decision-makers with a clear understanding of factors that affect their businesses. Impact of AI on Big Data Analytics AI supports users in the Big Data cycle, including aggregation, storage, and retrieval of diverse data types from different data sources. This includes data management, context management, decision management, action management, and risk management. Big Data can help alert problems and help find new solutions and get ideas about any new prospects. With the amount of information stream that comes in, it can be difficult to determine what is important and what isn’t. This is where AI and machine learning come in. It can help identify unusual patterns in the processes, help in the analysis, and suggest further steps to be taken. It can also learn how users interact with analytics and learn subtle differences in meanings or context-specific nuances to understand numeric data sources. AI can also caution users about anomalies, unforeseen data patterns, monitoring events, and threats from system logs or social networking data. Application of Big Data and Artificial Intelligence After establishing how AI and Big Data work together, let us look at how some applications are benefitting from their synergy: Banking and financial sectors The banking and financial sectors apply these to monitor financial marketing activities. These institutions also use AI to keep an eye on any illegal trading activities. Trading data analytics are obtained for high-frequency trading, and decision making based on trading, risk analysis, and predictive analysis. It is also used for fraud warning and detection, archival and analysis of audit trails, reporting enterprise credit, customer data transformation, etc. Healthcare AI has simplified health data prescriptions and health analysis, thus benefitting healthcare providers from the large data pool. Hospitals are using millions of collected data that allow doctors to use evidence-based medicine. Chronic diseases can be tracked faster by AI. Manufacturing and supply chain AI and Big Data in manufacturing, production management, supply chain management and analysis, and customer satisfaction techniques are flawless. The quality of products is thus much better with higher energy efficiency, reliable increase in levels, and profit increase. Governments Governments worldwide use AI applications like facial recognition, vehicle recognition for traffic management, population demographics, financial classifications, energy explorations, environmental conservation, criminal investigations, and more. Other sectors that use AI are mainly retail, entertainment, education, and more. Conclusion According to Gartner’s predictions, artificial intelligence will replace one in five workers by 2022. Firms and businesses can no longer afford to avoid using artificial intelligence and Big Data in their day-to-day. Investments in AI and Big Data analysis will be beneficial for everyone. Data sets will increase in the future, and with it, its application and investment will grow over time. Human relevance will continue to decrease as time goes by. AI enables machine learning to be the future of the development of business technologies. It will automate data analysis and find new insights that were previously impossible to imagine by processing data manually. With machine learning, AI, and Big Data, we can redraw the way we approach everything else. Frequently Asked Questions Why does big data affect artificial intelligence? Big Data and AI customize business processes and make better-suited decisions for individual needs and expectations. This improves its efficiency of processes and decisions. Data has the potential to give insights into a variety of predicted behaviors and incidents. Is AI or big data better? AI becomes better as it is fed more and more information. This information is gathered from Big Data which helps companies understand their customers better. On the other hand, Big Data is useless if there is no AI to analyze it. Humans are not capable of analyzing the data on a large scale. Is AI used in big data? When the gathered Big Data is to be analyzed, AI steps in to do the job. Big Data makes use of AI. What is the future of AI in big data? AI’s ability to work so well with data analytics is the primary reason why AI and Big Data now seem inseparable. AI machine learning and deep learning are learning from every data input and using those inputs to generate new rules for future business analytics. { "@context": "https://schema.org", "@type": "FAQPage", "mainEntity": [{ "@type": "Question", "name": "Why does big data affect artificial intelligence?", "acceptedAnswer": { "@type": "Answer", "text": "Big Data and AI customize business processes and make better-suited decisions for individual needs and expectations. This improves its efficiency of processes and decisions. Data has the potential to give insights into a variety of predicted behaviors and incidents." } },{ "@type": "Question", "name": "Is AI or big data better?", "acceptedAnswer": { "@type": "Answer", "text": "AI becomes better as it is fed more and more information. This information is gathered from Big Data which helps companies understand their customers better. On the other hand, Big Data is useless if there is no AI to analyze it. Humans are not capable of analyzing the data on a large scale." } },{ "@type": "Question", "name": "Is AI used in big data?", "acceptedAnswer": { "@type": "Answer", "text": "When the gathered Big Data is to be analyzed, AI steps in to do the job. Big Data makes use of AI." } },{ "@type": "Question", "name": "What is the future of AI in big data?", "acceptedAnswer": { "@type": "Answer", "text": "AI’s ability to work so well with data analytics is the primary reason why AI and Big Data now seem inseparable. AI machine learning and deep learning are learning from every data input and using those inputs to generate new rules for future business analytics." } }] }

Read More

MiPasa project and IBM Blockchain team on open data platform to support Covid-19 response

Article | April 1, 2020

Powerful technologies and expertise can help provide better data and help people better understand their situation. As the world contends with the ongoing coronavirus outbreak, officials battling the pandemic need tools and valid information at scale to help foster a greater sense of security for the public. As technologists, we have been heartened by the prevalence of projects such as Call for Code, hackathons and other attempts by our colleagues to rapidly create tools that might be able to help stem the crisis. But for these tools to work, they need data from sources they can validate. For example, reopening the world’s economy will likely require not only testing millions of people, but also being able to map who tested positive, where people can and can’t go and who is at exceptionally high risk of exposure and must be quarantined again.

Read More

Why Data Science Needs DataOps

Article | March 31, 2020

DataOps helps reduce the time data scientists spend preparing data for use in applications. Such tasks consume roughly 80% of their time now.We’re still hopeful that the digital transformation will provide the insights businesses need from big data. As a data scientist, you’re probably aware of the growing pressure from companies to extract meaningful insights from data and find the stories needed for impact.No matter how in-demand data science is in the employment numbers, equal pressure is rising for data scientists to deliver business value and no wonder. We’re approaching the age where data science and AI draw a line in the sand for which companies remain competitive and which ones collapse.One answer to this pressure is the rise of DataOps. Let’s take a look at what it is and how it could provide a path for data scientists to give businesses what they’ve been after.

Read More

Evolution of capabilities of Data Platforms & data ecosystem

Article | October 27, 2020

Data Platforms and frameworks have been constantly evolving. At some point of time; we are excited by Hadoop (well for almost 10 years); followed by Snowflake or as I say Snowflake Blizzard (who managed to launch biggest IPO win historically) and the Google (Google solves problems and serves use cases in a way that few companies can match). The end of the data warehouse Once upon a time, life was simple; or at least, the basic approach to Business Intelligence was fairly easy to describe… A process of collecting information from systems, building a repository of consistent data, and bolting on one or more reporting and visualisation tools which presented information to users. Data used to be managed in expensive, slow, inaccessible SQL data warehouses. SQL systems were notorious for their lack of scalability. Their demise is coming from a few technological advances. One of these is the ubiquitous, and growing, Hadoop. On April 1, 2006, Apache Hadoop was unleashed upon Silicon Valley. Inspired by Google, Hadoop’s primary purpose was to improve the flexibility and scalability of data processing by splitting the process into smaller functions that run on commodity hardware. Hadoop’s intent was to replace enterprise data warehouses based on SQL. Unfortunately, a technology used by Google may not be the best solution for everyone else. It’s not that others are incompetent: Google solves problems and serves use cases in a way that few companies can match. Google has been running massive-scale applications such as its eponymous search engine, YouTube and the Ads platform. The technologies and infrastructure that make the geographically distributed offerings perform at scale are what make various components of Google Cloud Platform enterprise ready and well-featured. Google has shown leadership in developing innovations that have been made available to the open-source community and are being used extensively by other public cloud vendors and Gartner clients. Examples of these include the Kubernetes container management framework, TensorFlow machine learning platform and the Apache Beam data processing programming model. GCP also uses open-source offerings in its cloud while treating third-party data and analytics providers as first-class citizens on its cloud and providing unified billing for its customers. The examples of the latter include DataStax, Redis Labs, InfluxData, MongoDB, Elastic, Neo4j and Confluent. Silicon Valley tried to make Hadoop work. The technology was extremely complicated and nearly impossible to use efficiently. Hadoop’s lack of speed was compounded by its focus on unstructured data — you had to be a “flip-flop wearing” data scientist to truly make use of it. Unstructured datasets are very difficult to query and analyze without deep knowledge of computer science. At one point, Gartner estimated that 70% of Hadoop deployments would not achieve the goal of cost savings and revenue growth, mainly due to insufficient skills and technical integration difficulties. And seventy percent seems like an understatement. Data storage through the years: from GFS to Snowflake or Snowflake blizzard Developing in parallel with Hadoop’s journey was that of Marcin Zukowski — co-founder and CEO of Vectorwise. Marcin took the data warehouse in another direction, to the world of advanced vector processing. Despite being almost unheard of among the general public, Snowflake was actually founded back in 2012. Firstly, Snowflake is not a consumer tech firm like Netflix or Uber. It's business-to-business only, which may explain its high valuation – enterprise companies are often seen as a more "stable" investment. In short, Snowflake helps businesses manage data that's stored on the cloud. The firm's motto is "mobilising the world's data", because it allows big companies to make better use of their vast data stores. Marcin and his teammates rethought the data warehouse by leveraging the elasticity of the public cloud in an unexpected way: separating storage and compute. Their message was this: don’t pay for a data warehouse you don’t need. Only pay for the storage you need, and add capacity as you go. This is considered one of Snowflake’s key innovations: separating storage (where the data is held) from computing (the act of querying). By offering this service before Google, Amazon, and Microsoft had equivalent products of their own, Snowflake was able to attract customers, and build market share in the data warehousing space. Naming the company after a discredited database concept was very brave. For those of us not in the details of the Snowflake schema, it is a logical arrangement of tables in a multidimensional database such that the entity-relationship diagram resembles a snowflake shape. … When it is completely normalized along all the dimension tables, the resultant structure resembles a snowflake with the fact table in the middle. Needless to say, the “snowflake” schema is as far from Hadoop’s design philosophy as technically possible. While Silicon Valley was headed toward a dead end, Snowflake captured an entire cloud data market.

Read More

Spotlight

ThoughtFocus

ThoughtFocus is a US based, privately held consulting, software engineering and business process management firm with offices in the US, India and the Philippines. It helps clients in the financial services, manufacturing, education, aerospace and technology industries with their key business and technology challenges

Events