Article | December 21, 2020
Machine Learning (ML) has taken strides over the past few years, establishing its place in data analytics. In particular, ML has become a cornerstone in data science, alongside data wrangling, and data visualization, among other facets of the field. Yet, we observe many organizations still hesitant when allocating a budget for it in their data pipelines. The data engineer role seems to attract lots of attention, but few companies leverage the machine learning expert/engineer. Could it be that ML can add value to other enterprises too? Let's find out by clarifying certain concepts.
What Machine Learning is
So that we are all on the same page, let's look at a down-to-earth definition of ML that you can include in a company meeting, a report, or even within an email to a colleague who isn't in this field. Investopedia defines ML as "the concept that a computer program can learn and adapt to new data without human intervention." In other words, if your machine (be it a computer, a smartphone, or even a smart device) can learn on its own, using some specialized software, then it's under the ML umbrella. It's important to note that ML is also a stand-alone field of research, predating most AI systems, even if the two are linked, as we'll see later on.
How Machine Learning is different from Statistics
It's also important to note that ML is different from Statistics, even if some people like to view the former as an extension of the latter. However, there is a fundamental difference that most people aren't aware of yet. Namely, ML is data-driven while Statistics is, for the most part, model-driven. This statement means that most Stats-based inferences are made by assuming a particular distribution in the data, or the interactions of different variables, and making predictions based on our mathematical models of these distributions. ML may employ distributions in some niche cases, but for the most part, it looks at data as-is, without making any assumptions about it.
Machine Learning’s role in data science work
Let’s now get to the crux of the matter and explore how ML can be a significant value-add to a data science pipeline. First of all, ML can potentially offer better predictions than most Stats models in terms of accuracy, F1 score, etc. Also, ML can work alongside existing models to form model ensembles that can tackle the problems more effectively. Additionally, if transparency is important to the project stakeholders, there are ML-based options for offering some insight as to what variables are important in the data at hand, for making predictions based on it. Moreover, ML is more parametrized, meaning that you can tweak an ML model more, adapting it to the data you have and ensuring more robustness (i.e., reliability). Finally, you can learn ML without needing a Math degree or any other formal training. The latter, however, may prove useful, if you wish to delve deeper into the topic and develop your own models. This innovation potential is a significant aspect of ML since it's not as easy to develop new models in Stats (unless you are an experienced Statistics researcher) or even in AI. Besides, there are a bunch of various "heuristics" that are part of the ML group of algorithms, facilitating your data science work, regardless of what predictive model you end up using.
Machine Learning and AI
Many people conflate ML with AI these days. This confusion is partly because many ML models involve artificial neural networks (ANNs) which are the most modern manifestation of AI. Also, many AI systems are employed in ML tasks, so they are referred to as ML systems since AI can be a bit generic as a term. However, not all ML algorithms are AI-related, nor are all AI algorithms under the ML umbrella. This distinction is of import because certain limitations of AI systems (e.g., the need for lots and lots of data) don't apply to most ML models, while AI systems tend to be more time-consuming and resource-heavy than the average ML one. There are several ML algorithms you can use without breaking the bank and derive value from your data through them. Then, if you find that you need something better, in terms of accuracy, you can explore AI-based ones. Keep in mind, however, that some ML models (e.g., Decision Trees, Random Forests, etc.) offer some transparency, while the vast majority of AI ones are black boxes.
Learning more about the topic
Naturally, it's hard to do this topic justice in a single article. It is so vast that someone can write a book on it! That's what I've done earlier this year, through the Technics Publications publishing house. You can learn more about this topic via this book, which is titled Julia for Machine Learning(Julia is a modern programming language used in data science, among other fields, and it's popular among various technical professionals). Feel free to check it out and explore how you can use ML in your work. Cheers!
THEORY AND STRATEGIES
Article | December 21, 2020
We discursive creatures are construed within a meaningful, bounded communicative environment, namely context(s) and not in a vacuum.
Context(s) co-occur in different scenarios, that is, in mundane talk as well as in academic discourse where the goal of natural language communication is mutual intelligibility, hence the negotiation of meaning. Discursive research focuses on the context-sensitive use of the linguistic code and its social practice in particular settings, such as medical talk, courtroom interactions, financial/economic and political discourse which may restrict its validity when ascribing to a theoretical framework and its propositions regarding its application. This is also reflected in the case of artificial intelligence approaches to context(s) such as the development of context-sensitive parsers, context-sensitive translation machines and context-sensitive information systems where the validity of an argument and its propositions is at stake.
Context is at the heart of pragmatics or even better said context is the anchor of any pragmatic theory: sociopragmatics, discourse analysis and ethnomethodological conversation analysis. Academic disciplines, such as linguistics, philosophy, anthropology, psychology and literary theory have also studied various aspects of the context phenomena. Yet, the concept of context has remained fuzzy or is generally undefined. It seems that the denotation of the word [context] has become murkier as its uses have been extended in many directions.
Context or/ and contexts? Now in order to be “felicitous” integrated into the pragmatic construct, the definition of context needs some delimitations. Depending on the frame of research, context is delimitated to the global surroundings of the phenomenon to be investigated, for instance if its surrounding is of extra-linguistic nature it is called the socio-cultural context, if it comprises features of a speech situation, it is called the linguistic context and if it refers to the cognitive material, that is a mental representation, it is called the cognitive context. Context is a transcendental notion which plays a key role in interpretation.
Language is no longer considered as decontextualized sentences. Instead language is seen as embedded in larger activities, through which they become meaningful. In a dynamic outlook on communication, the acts of speaking (which generates a form discourse, for instance, conversational discourse, lecture or speech) and interpreting build contexts and at the same time constrain the building of such contexts. In Heritage’s terminology, “the production of talk is doubly contextual” (Heritage 1984: 242). An utterance relies upon the existing context for its production and interpretation, and it is, in its own right, an event that shapes a new context for the action that will follow. A linguistic context can be decontextualized at a local level, and it can be recontextualized at a global level. There is intra-discursive recontextualization anchored to local decontextualization, and there is interdiscursive recontextualization anchored to global recontextualization. “A given context not only 'legislates' the interpretation of indexical elements; indexical elements can also mold the background of the context” (Ochs, 1990). In the case of recontextualization, in a particular scenario, it is valid to ask what do you mean or how do you mean. Making a reference to context and a reference to meaning helps to clarify when there is a controversy about the communicative status and at the same time provides a frame for the recontextualization.
A linguistic context is intrinsically linked to a social context and a subcategory of the latter, the socio-cultural context. The social context can be considered as unmarked, hence a default context, whereas a socio-cultural context can be conceived as a marked type of context in which specific variables are interpreted in a particular mode. Culture provides us, the participants, with a filter mechanism which allows us to interpret a social context in accordance with particular socio-cultural context constraints and requirements. Besides, socially constitutive qualities of context are unavoidable since each interaction updates the existing context and prepares new ground for subsequent interaction.
Now, how these aforementioned conceptualizations and views are reflected in NLP? Most of the research work has focused in the linguistic context, that is, in the word level surroundings and the lexical meaning. An approach to producing sense embeddings for the lexical meanings within a lexical knowledge base which lie in a space that is comparable to that of contextualized word vectors.
Contextualized word embeddings have been used effectively across several tasks in Natural Language Processing, as they have proved to carry useful semantic information. The task of associating a word in context with the most suitable meaning from a predefined sense inventory is better known as Word Sense Disambiguation (Navigli, 2009). Linguistically speaking, “context encompasses the total linguistic and non-linguistic background of a text” (Crystal, 1991). Notice that the nature of context(s) is clearly crucial when reconstructing the meaning of a text. Therefore, “meaning-in-context should be regarded as a probabilistic weighting, of the list of potential meanings available to the user of the language.” The so-called disambiguating role of context should be taken with a pinch of salt.
The main reason for language models such as BERT (Devlin et al., 2019), RoBERTA (Liu et al., 2019) and SBERT (Reimers, 2019) proved to be beneficial in most NLP task is that contextualized embeddings of words encode the semantics defined by their input context. In the same vein, a novel method for contextualized sense representations has recently been employed: SensEmBERT (Scarlini et al., 2020) which computes sense representations that can be applied directly to disambiguation.
Still, there is a long way to go regarding context(s) research. The linguistic context is just one of the necessary conditions for sentence embeddedness in “a” context. For interpretation to take place, well-formed sentences and well-formed constructions, that is, linguistic strings which must be grammatical but may be constrained by cognitive sentence-processability and pragmatic relevance, particular linguistic-context and social-context configurations, which make their production and interpretation meaningful, will be needed.
Article | December 21, 2020
In recent years, we have seen more industries adopt data analytics as they realize how important it is. Even the hotel industry is not left behind in this.
This is because the hospitality industry is data-rich. And the key to maintaining a competitive advantage has come down to ‘how hotels manage and analyze this data’.
With the changes taking place in the hospitality industry, data analysis can help you gain meaningful insights that can redefine the way hotels conduct business.
Article | December 21, 2020
Homeless policy needs to join the big data revolution. A data tsunami is transforming our world. Ninety percent of existing data was created in the last two years, and Silicon Valley is leveraging it with powerful analytics to create self-driving cars and to revolutionize business decision-making in ways that drive innovation and efficiency.Unfortunately, this revolution has yet to help the homeless. It is not due to a lack of data. Sacramento alone maintains data on half a million service interactions with more than 65,000 homeless individuals. California is considering integrating the data from its 44 continuums of care to create a richer pool of data. Additionally, researchers are uncovering troves of relevant information in educational and social service databases.These data, however, are only useful if they are aggressively mined for insights, looking for problems to solve and successful practices to replicate. At that juncture California falls short.