Top 10 Big Data Tools in 2019

KARTIK SINGH | March 18, 2019

article image
The amount of data produced by humans has exploded to unheard-of levels, with nearly 2.5 quintillion bytes of data created daily. With advances in the Internet of Things and mobile technology, data has become a central interest for most organizations. More importantly than simply collecting it, though, is the real need to properly analyze and interpret the data that is being gathered. Also, most businesses collect data from a variety of sources, and each data stream provides signals that ideally come together to form useful insights. However, getting the most out of your data depends on having the right tools to clean it, prepare it, merge it and analyze it properly.

Spotlight

PredictiveHire

PredictiveHire is a team of data scientists, artificial intelligence experts, IO psychologists and recruitment industry veterans who’ve joined together on a mission to change the way candidate assessment is done, by leveraging the power of smart technology. We believe all people decisions should be based on data and analytics – not gut feeling. With the help of technology, we want to level the candidate playing field and help reduce unconscious biases in the hiring process.

OTHER ARTICLES

How to Overcome Challenges in Adopting Data Analytics

Article | April 20, 2020

Achieving organizational success and making data-driven decisions in 2020 requires embracing tech tools like Data Analytics and collecting, storing and analysing data isn’t.The real data-driven, measurable growth, and development come with the establishment of data-driven company culture.In this type of culture company actively uses data resources as a primary asset to make smart decisions and ensure future growth. Despite the rapid growth of analytic solutions, a recent Gartner survey revealed that almost 75% of organizations thought their analytics maturity had not reached a level that optimized business outcomes. Just like with any endeavor, your organization must have a planned strategy to achieve its analytical goals. Let’s explore ways for overcoming common blockers, and elements used in successful analytics adoption strategies. Table of Contents: - AMM: Analytic Maturity Model - What are the blockers to achieving a strategy-driven analytics? - What are the adoption strategies to achieve an analytics success? - Conclusion AMM: Analytic Maturity Model The Analytic Maturity Model (AMM) evaluates the analytic maturity of an organization.The model identifies the five stages an organization travels through to reach optimization. Organizations must implement the right tools, engage their team in proper training, and provide the management support necessary to generate predictable outcomes with their analytics. Based on the maturity of these processes, the AMM divides organizations into five maturity levels: - Organizations that can build reports. - Organizations that can build and deploy models. - Organizations that have repeatable processes for building and deploying analytics. - Organizations that have consistent enterprise-wide processes for analytics. - Enterprises whose analytics is strategy driven. READ MORE:EFFECTIVE STRATEGIES TO DEMOCRATIZE DATA SCIENCE IN YOUR ORGANIZATION What are the blockers to achieving a strategy-driven analytics? - Missing an Analytics Strategy - Analytics is not for everyone - Data quality presents unique challenges - Siloed Data - Changing the culture What are the adoption strategies to achieve analytic success? • Have you got a plan to achieve analytic success? The strategy begins with business intelligence and moves toward advanced analytics. The approach differs based on the AMM level. The plan may address the strategy for a single year, or it may span 3 or more years. It ideally has milestones for what the team will do. When forming an analytics strategy, it can be expensive and time consuming at the outset. While organizations are encouraged to seek projects that can generate quick wins, the truth is that it may be months before any actionable results are available. During this period, the management team is frantically diverting resources from other high-profile projects. If funds are tight, this situation alone may cause friction. It may not be apparent to everyone how the changes are expected to help. Here are the elements of a successful analytics strategy: • Keep the focus tied to tangible business outcomes The strategy must support business goals first. With as few words as possible, your plan should outline what you intend to achieve, how to complete it, and a target date for completion of the plan. Companies may fail at this step because they mistake implementing a tool for having a strategy. To keep it relevant, tie it to customer-focused goals. The strategy must dig below the surface with the questions that it asks. Instead of asking surface questions such as “How can we save money?”, instead ask, “How can we improve the quality of the outcomes for our customers?” or “What would improve the productivity of each worker?” These questions are more specific and will get the results the business wants. You may need to use actual business cases from your organization to think through the questions. • Select modern, multi-purpose tools The organization should be looking for an enterprise tool that supports integrating data from various databases, spreadsheets, or even external web based sources. Typically, organizations may have their data stored across multiple databases such as Salesforce, Oracle, and even Microsoft Access. The organization can move ahead quicker when access to the relevant data is in a single repository. With the data combined, the analysts have a specific location to find reports and dashboards. The interface needs to be robust enough to show the data from multiple points of view. It should also allow future enhancements, such as when the organization makes the jump into data science. Incorta’s Data Analytics platform simplifies and processes data to provide meaningful information at speed that helps make informed decisions. Incorta is special in that it allows business users to ask the same complex and meaningful questions of their data that typically require many IT people and data scientist to get the answers they need to improve their line of business. At the digital pace of business today, that can mean millions of dollars for business leaders in finance, supply chain or even marketing. Speed is a key differentiator for Incorta in that rarely has anyone been able to query billions of rows of data in seconds for a line of business owner. - Tara Ryan, CMO, Incorta Technology implementations take time. That should not stop you from starting in small areas of the company to look for quick wins. Typically, the customer-facing processes have areas where it is easier to collect data and show opportunities for improvement. • Ensure staff readiness If your current organization is not data literate, then you will need resources who understand how to analyze and use data for process improvement. It is possible that you can make data available and the workers still not realize what they can do with it. The senior leadership may also need training about how to use data and what data analytics makes possible. • Start Small to Control Costs and Show Potential If the leadership team questions the expense, consider doing a proof of concept that focuses on the tools and data being integrated quickly and efficiently to show measurable success. The business may favor specific projects or initiatives to move the company forward over long-term enterprise transformations (Bean & Davenport, 2019). Keeping the project goals precise and directed helps control costs and improve the business. As said earlier, the strategy needs to answer deeper business questions. Consider other ways to introduce analytics into the business. Use initiatives that target smaller areas of the company to build competencies. Provide an analytics sandbox with access to tools and training to encourage other non-analytics workers (or citizen data scientists) to play with the data. One company formed a SWAT team, including individuals from across the organization. The smaller team with various domain experience was better able to drive results. There are also other approaches to use – the key is to show immediate and desirable results that align with organizational goals. • Treating the poor data quality What can you do about poor data quality at your company? Several solutions that can help to improve productivity and reduce the financial impact of poor data quality in your organization include: • Create a team to set the proper objectives Create a team who owns the data quality process. This is important to prove to yourself and to anyone with whom you are conversing about data that you are serious about data quality. The size of the team is not as important as the membership from the parts of the organization that have the right impact and knowledge in the process. When the team is set, make sure that they create a set of goals and objectives for data quality. To gauge performance, you need a set of metrics to measure the performance. After you create the proper team to govern your data quality, ensure that the team focuses on the data you need first. Everyone knows the rules of "good data in, good data out" and "bad data in, bad data out." To put this to work, make sure that your team knows the relevant business questions that are in progress across various data projects to make sure that they focus on the data that supports those business questions. • Focus on the data you need now as the highest priority Once you do that, you can look at the potential data quality issues associated with each of the relevant downstream business questions and put the proper processes and data quality routines in place to ensure that poor data quality has a low probability of Successful Analytics Adoption Strategies, continuing to affect that data. As you decide which data to focus on, remember that the key for innovators across industries is that the size of the data isn’t the most critical factor — having the right data is (Wessel, 2016). • Automate the process of data quality when data volumes grow too large When data volumes become unwieldy and difficult to manage the quality, automate the process. Many data quality tools in the market do a good job of removing the manual effort from the process. Open source options include Talend and DataCleaner. Commercial products include offerings from DataFlux, Informatica, Alteryx and Software AG. As you search for the right tool for you and your team, beware that although the tools help with the organization and automation, the right processes and knowledge of your company's data are paramount to success. • Make the process of data quality repeatable It needs regular care and feeding. Remember that the process is not a one-time activity. It needs regular care and feeding. While good data quality can save you a lot of time, energy, and money downstream, it does take time, investment, and practice to do well. As you improve the quality of your data and the processes around that quality, you will want to look for other opportunities to avoid data quality mishaps. • Beware of data that lives in separate databases When data is stored in different databases, there can be issues with different terms being used for the same subject. The good news is that if you have followed the former solutions, you should have more time to invest in looking for the best cases. As always, look for the opportunities with the biggest bang for the buck first. You don't want to be answering questions from the steering committee about why you are looking for differences between "HR" and "Hr" if you haven't solved bigger issues like knowing the difference between "Human Resources" and "Resources," for example. • De-Siloing Data The solution to removing data silos typically isn’t some neatly packaged, off-the-shelf product. Attempts to quickly create a data lake by simply pouring all the siloed data together can result in an unusable mess, turning more into a data swamp. This is a process that must be done carefully to avoid confusion, liability, and error. Try to identify high-value opportunities and find the various data stores required to execute those projects. Working with various business groups to find business problems that are well-suited to data science solutions and then gathering the necessary data from the various data stores can lead to high-visibility successes. As value is proved from joining disparate data sources together to create new insights, it will be easier to get buy-in from upper levels to invest time and money into consolidating key data stores. In the first efforts, getting data from different areas may be akin to pulling teeth, but as with most things in life, the more you do it, the easier it gets. Once the wheels get moving on a few of these integration projects, make wide-scale integration the new focus. Many organizations at this stage appoint a Chief Analytics Officer (CAO) who helps increase collaboration between the IT and business units ensuring their priorities are aligned. As you work to integrate the data, make sure that you don’t inadvertently create a new “analytics silo.” The final aim here is an integrated platform for your enterprise data. • Education is essential When nearly 45% of workers generally prefer status quo over innovation, how do you encourage an organization to move forward? If the workers are not engaged or see the program as merely just the latest management trend, it may be tricky to convince them. Larger organizations may have a culture that is slow to change due to their size or outside forces. There’s also a culture shift required - moving from experience and knee-jerk reactions to immersion and exploration of rich insights and situational awareness. - Walter Storm, the Chief Data Scientist, Lockheed Martin Companies spend a year talking about an approved analytics tool before moving forward. The employees had time to consider the change and to understand the new skill sets needed. Once the entire team embraced the change, the organization moved forward swiftly to convert existing data and reports into the new tool. In the end, the corporation is more successful, and the employees are still in alignment with the corporate strategy. If using data to support decisions is a foreign concept to the organization, it’s a smart idea to ensure the managers and workers have similar training. This training may involve everything from basic data literacy to selecting the right data for management presentations. However, it cannot stop at the training; the leaders must then ask for the data to move forward with requests that will support conclusions that will be used to make critical decisions across the business. These methods make it easier to sell the idea and keep the organization’s analytic strategy moving forward. Once senior leadership uses data to make decisions, everyone else will follow their lead. It is that simple. Conclusion The analytics maturity model serves as a useful framework for understanding where your organization currently stands regarding strategy, progress, and skill sets. Advancing along the various levels of the model will become increasingly imperative as early adopters of advanced analytics gain a competitive edge in their respective industries. Delay or failure to design and incorporate a clearly defined analytics strategy into an organization’s existing plan will likely result in a significant missed opportunity. READ MORE:BIG DATA ANALYTICS STRATEGIES ARE MATURING QUICKLY IN HEALTHCARE

Read More
none

Natural Language Desiderata: Understanding, explaining and interpreting a model.

Article | May 3, 2021

Clear conceptualization, taxonomies, categories, criteria, properties when solving complex real-life contextualized problems is non-negotiable, a “must” to unveil the hidden potential of NPL impacting on the transparency of a model. It is common knowledge that many authors and researchers in the field of natural language processing (NLP) and machine learning (ML) are prone to use explainability and interpretability interchangeably, which from the start constitutes a fallacy. They do not mean the same, even when looking for a definition from different perspectives. A formal definition of what explanation, explainable, explainability mean can be traced to social science, psychology, hermeneutics, philosophy, physics and biology. In The Nature of Explanation, Craik (1967:7) states that “explanations are not purely subjective things; they win general approval or have to be withdrawn in the face of evidence or criticism.” Moreover, the power of explanation means the power of insight and anticipation and why one explanation is satisfactory involves a prior question why any explanation at all should be satisfactory or in machine learning terminology how a model is performant in different contextual situations. Besides its utilitarian value, that impulse to resolve a problem whether or not (in the end) there is a practical application and which will be verified or disapproved in the course of time, explanations should be “meaningful”. We come across explanations every day. Perhaps the most common are reason-giving ones. Before advancing in the realm of ExNLP, it is crucial to conceptualize what constitutes an explanation. Miller (2017) considered explanations as “social interactions between the explainer and explainee”, therefore the social context has a significant impact in the actual content of an explanation. Explanations in general terms, seek to answer the why type of question. There is a need for justification. According to Bengtsson (2003) “we will accept an explanation when we feel satisfied that the explanans reaches what we already hold to be true of the explanandum”, (being the explanandum a statement that describes the phenomenon to be explained (it is a description, not the phenomenon itself) and the explanan at least two sets of statements, used for the purpose of elucidating the phenomenon). In discourse theory (my approach), it is important to highlight that there is a correlation between understanding and explanation, first and foremost. Both are articulated although they belong to different paradigmatic fields. This dichotomous pair is perceived as a duality, which represents an irreducible form of intelligibility. When there are observable external facts subject to empirical validation, systematicity, subordination to hypothetic procedures then we can say that we explain. An explanation is inscribed in the analytical domain, the realm of rules, laws and structures. When we explain we display propositions and meaning. But we do not explain in a vacuum. The contextual situation permeates the content of an explanation, in other words, explanation is an epistemic activity: it can only relate things described or conceptualized in a certain way. Explanations are answers to questions in the form: why fact, which most authors agree upon. Understanding can mean a number of things in different contexts. According to Ricoeur “understanding precedes, accompanies and swathes an explanation, and an explanation analytically develops understanding.” Following this line of thought, when we understand we grasp or perceive the chain of partial senses as a whole in a single act of synthesis. Originally, belonging to the field of the so-called human science, then, understanding refers to a circular process and it is directed to the intentional unit of discourse whereas an explanation is oriented to the analytical structure of a discourse. Now, to ground any discussion on what interpretation is, it is crucial to highlight that the concept of interpretation opposes the concept of explanation. They cannot be used interchangeably. If considered as a unit, they composed what is called une combinaison éprouvé (a contrasted dichotomy). Besides, in dissecting both definitions we will see that the agent that performs the explanation differs from the one that produce the interpretation. At present there is a challenge of defining—and evaluating—what constitutes a quality interpretation. Linguistically speaking, “interpretation” is the complete process that encompasses understanding and explanation. It is true that there is more than one way to interprete an explanation (and then, an explanation of a prediction) but it is also true that there is a limited number of possible explanations if not a unique one since they are contextualized. And it is also true that an interpretation must not only be plausible, but more plausible than another interpretation. Of course there are certain criteria to solve this conflict. And to prove that an interpretation is more plausible based on an explanation or the knowledge could be related to the logic of validation rather than to the logic of subjective probability. Narrowing it down How are these concepts transferred from theory to praxis? What is the importance of the "interpretability" of an explainable model? What do we call a "good" explainable model? What constitutes a "good explanation"? These are some of the many questions that researchers from both academia and industry are still trying to answer. In the realm on machine learning current approaches conceptualize interpretation in a rather ad-hoc manner, motivated by practical use cases and applications. Some suggest model interpretability as a remedy, but only a few are able to articulate precisely what interpretability means or why it is important. Hence more, most in the research community and industry use this term as synonym of explainability, which is certainly not. They are not overlapping terms. Needless to say, in most cases technical descriptions of interpretable models are diverse and occasionally discordant. A model is better interpretable than another model if its decisions are easier for a human to comprehend than decisions from the other model (Molnar, 2021). For a model to be interpretable (being interpretable the quality of the model), the information conferred by an interpretation may be useful. Thus, one purpose of interpretations may be to convey useful information of any kind. In Molnar’s words the higher the interpretability of a machine learning model, the easier it is for someone to comprehend why certain decisions or predictions have been made.” I will make an observation here and add “the higher the interpretability of an explainable machine learning model”. Luo et. al. (2021) defines “interpretability as ‘the ability [of a model] to explain or to present [its predictions] in understandable terms to a human.” Notice that in this definition the author includes “understanding” as part of the definition, giving the idea of completeness. Thus, the triadic closure explanation-understanding-interpretation is fulfilled, in which the explainer and interpretant (the agents) belong to different instances and where interpretation allows the extraction and formation of additional knowledge captured by the explainable model. Now are the models inherently interpretable? Well, it is more a matter of selecting the methods of achieving interpretability: by (a) interpreting existing models via post-hoc techniques, or (b) designing inherently interpretable models, which claim to provide more faithful interpretations than post-hoc interpretation of blackbox models. The difference also lies in the agency –like I said before– , and how in one case interpretation may affect the explanation process, that is model’s inner working or just include natural language explanations of learned representations or models.

Read More

Splunk Big Data Big Opportunity

Article | March 21, 2020

Splunk extracts insights from big data. It is growing rapidly, it has a large total addressable market, and it has tremendous momentum from its exposure to industry megatrends (i.e. the cloud, big data, the "internet of things," and security). Further, its strategy of continuous innovation is being validated as the company wins very large deals. Investors should not be distracted by a temporary slowdown in revenue growth, as the company has wisely transitioned to a subscription model. This article reviews the business, its strategy, valuation the sell-off is overdone and risks. We conclude with our thoughts on investing.

Read More

What is the Difference Between Business Intelligence, Data Warehousing and Data Analytics

Article | March 16, 2020

In the age of Big Data, you’ll hear a lot of terms tossed around. Three of the most commonly used are business intelligence,” data warehousing and data analytics.You may wonder, however, what distinguishes these three concepts from each other so let’s take a look. What differentiates business intelligence from the other two on the list is the idea of presentation. Business intelligence is primarily about how you take the insights you’ve developed from the use of analytics to produce action. BI tools include items like To put it simply, business intelligence is the final product. It’s the yummy cooked food that comes out of the frying pan when everything is done.In the flow of things, business intelligence interacts heavily with data warehousing and analytics systems. Information can be fed into analytics packages from warehouses. It then comes out of the analytics software and is routed back into storage and also into BI. Once the BI products have been created, information may yet again be fed back into data storage and warehousing.

Read More

Spotlight

PredictiveHire

PredictiveHire is a team of data scientists, artificial intelligence experts, IO psychologists and recruitment industry veterans who’ve joined together on a mission to change the way candidate assessment is done, by leveraging the power of smart technology. We believe all people decisions should be based on data and analytics – not gut feeling. With the help of technology, we want to level the candidate playing field and help reduce unconscious biases in the hiring process.

Events