Article | September 2, 2021
Massive amount of data is collected and stored by companies in the search for the “Holy Grail”. One crucial component is the discovery and application of novel approaches to achieve a more complete picture of datasets provided by the local (sometimes global) event-based analytic strategy that currently dominates a specific field.
Bringing qualitative data to life is essential since it provides management decisions’ context and nuance. An NLP perspective for uncovering word-based themes across documents will facilitate the exploration and exploitation of qualitative data which are often hard to “identify” in a global setting. NLP can be used to perform different analysis mapping drivers.
Broadly speaking, drivers are factors that cause change and affect institutions, policies and management decision making. Being more precise, a “driver” is a force that has a material impact on a specific activity or an entity, which is contextually dependent, and which affects the financial market at a specific time. (Litterio, 2018). Major drivers often lie outside the immediate institutional environment such as elections or regional upheavals, or non-institutional factors such as Covid or climate change. In Total global strategy: Managing for worldwide competitive advantage, Yip (1992) develops a framework based on a set of four industry globalization drivers, which highlights the conditions for a company to become more global but also reflecting differentials in a competitive environment. In The lexicons: NLP in the design of Market Drivers Lexicon in Spanish, I have proposed a categorization into micro, macro drivers and temporality and a distinction among social, political, economic and technological drivers. Considering the “big picture”, “digging” beyond usual sectors and timeframes is key in state-of-the-art findings.
Working with qualitative data.
There is certainly not a unique “recipe” when applying NLP strategies. Different pipelines could be used to analyse any sort of textual data, from social media and reviews to focus group notes, blog comments and transcripts to name just a few when a MetaQuant team is looking for drivers.
Generally, being textual data the source, it is preferable to avoid manual task on the part of the analyst, though sometimes, depending on the domain, content, cultural variables, etc. it might be required. If qualitative data is the core, then the preferred format is .csv. because of its plain nature which typically handle written responses better. Once the data has been collected and exported, the next step is to do some pre-processing. The basics include normalisation, morphosyntactic analysis, sentence structural analysis, tokenization, lexicalization, contextualization. Just simplify the data to make analysis easier.
Topic modelling refers to the task of recognizing words from the main topics that best describe a document or the corpus of data. LAD (Latent Dirichlet Allocation) is one of the most powerful algorithms with excellent implementations in the Python’s Gensim package.
The challenge: how to extract good quality of topics that are clear and meaningful. Of course, this depends mostly on the nature of text pre-processing and the strategy of finding the optimal number of topics, the creation of a lexicon(s) and the corpora. We can say that a topic is defined or construed around the most representative keywords. But are keywords enough? Well, there are some other factors to be observed such as:
1. The variety of topics included in the corpora.
2. The choice of topic modelling algorithm.
3. The number of topics fed to the algorithm.
4. The algorithms tuning parameters.
As you probably have noticed finding “the needle in the haystack” is not that easy. And only those who can use creatively NLP will have the advantage of positioning for global success.
Article | July 13, 2021
We are living in the age of Big Data, and data has become the heart and the most valuable asset for businesses across industry verticals. In the hyper-competitive market that exists today, data acts as a major contributor to achieving business intelligence and brand equity. Thus, effective data management is the key to accelerating the success of businesses. For effective data management to take place, organizations must ensure that the data that is used is accurate and reliable. With the advent of AI, businesses can now leverage machine learning to predict outcomes using historical data. This is called predictive analytics. With predictive analytics, organizations can predict anything from customer turnover to forecasting equipment maintenance. Moreover, the data that is acquired through predictive analytics is of high quality and very accurate. Let us take a look at how AI enables accurate data prediction and helps businesses to equip themselves for the digital future.
Article | March 30, 2020
Quantum Mechanics created their chapter in the history of the early 20th Century. With its regular binary computing twin going out of style, quantum mechanics led quantum computing to be the new belle of the ball! While the memory used in a classical computer encodes binary ‘bits’ – one and zero, quantum computers use qubits (quantum bits). And Qubit is not confined to a two-state solution, but can also exist in superposition i.e., qubits can be employed at 0, 1 and both 1 and 0 at the same time.
Article | February 10, 2020
We are a species invested in predicting the future as if our lives depended on it. Indeed, good predictions of where wolves might lurk were once a matter of survival. Even as civilization made us physically safer, prediction has remained a mainstay of culture, from the haruspices of ancient Rome inspecting animal entrails to business analysts dissecting a wealth of transactions to foretell future sales. With these caveats in mind, I predict that in 2020 (and the decade ahead) we will struggle if we unquestioningly adopt artificial intelligence (AI) in predictive analytics, founded on an unjustified overconfidence in the almost mythical power of AI's mathematical foundations. This is another form of the disease of technochauvinism I discussed in a previous article.