Data Science Lessons from The Bachelorette

| July 18, 2016

article image
Another week of The Bachelorette, another man (or two) sent home by JoJo, and another night of volatile tweets from Bachelorette fans. So what's new? This week, for your enjoyment, the DataScience team draws novel connections between shallow reality television and deep data analysis

Spotlight

Market Equations

Market Equations is a result oriented Research and Business Analytics Outsourcing Company, headquartered in India, offering high quality Online Media Research & Monitoring services to organizations worldwide. We provide traditional and social media intelligence by offering our clients insight into their external environment and exposure to various media channels including broadcast, print and online media. Market Equations India offers unique Qualitative and Quantitative Online Media Research, Monitoring and Measurement services to help organizations listen to what their investors, consumers, competitors and employees are communicating in this medium…

OTHER ARTICLES

Data Analytics Convergence: Business Intelligence(BI) Meets Machine Learning (ML)

Article | July 29, 2020

Headquartered in London, England, BP (NYSE: BP) is a multinational oil and gas company. Operating since 1909, the organization offers its customers with fuel for transportation, energy for heat and light, lubricants to keep engines moving, and the petrochemicals products. Business intelligence has always been a key enabler for improving decision making processes in large enterprises from early days of spreadsheet software to building enterprise data warehouses for housing large sets of enterprise data and to more recent developments of mining those datasets to unearth hidden relationships. One underlying theme throughout this evolution has been the delegation of crucial task of finding out the remarkable relationships between various objects of interest to human beings. What BI technology has been doing, in other words, is to make it possible (and often easy too) to find the needle in the proverbial haystack if you somehow know in which sectors of the barn it is likely to be. It is a validatory as opposed to a predictory technology. When the amount of data is huge in terms of variety, amount, and dimensionality (a.k.a. Big Data) and/or the relationship between datasets are beyond first-order linear relationships amicable to human intuition, the above strategy of relying solely on humans to make essential thinking about the datasets and utilizing machines only for crucial but dumb data infrastructure tasks becomes totally inadequate. The remedy to the problem follows directly from our characterization of it: finding ways to utilize the machines beyond menial tasks and offloading some or most of cognitive work from humans to the machines. Does this mean all the technology and associated practices developed over the decades in BI space are not useful anymore in Big Data age? Not at all. On the contrary, they are more useful than ever: whereas in the past humans were in the driving seat and controlling the demand for the use of the datasets acquired and curated diligently, we have now machines taking up that important role and hence unleashing manifold different ways of using the data and finding out obscure, non-intuitive relationships that allude humans. Moreover, machines can bring unprecedented speed and processing scalability to the game that would be either prohibitively expensive or outright impossible to do with human workforce. Companies have to realize both the enormous potential of using new automated, predictive analytics technologies such as machine learning and how to successfully incorporate and utilize those advanced technologies into the data analysis and processing fabric of their existing infrastructure. It is this marrying of relatively old, stable technologies of data mining, data warehousing, enterprise data models, etc. with the new automated predictive technologies that has the huge potential to unleash the benefits so often being hyped by the vested interests of new tools and applications as the answer to all data analytical problems. To see this in the context of predictive analytics, let's consider the machine learning(ML) technology. The easiest way to understand machine learning would be to look at the simplest ML algorithm: linear regression. ML technology will build on basic interpolation idea of the regression and extend it using sophisticated mathematical techniques that would not necessarily be obvious to the causal users. For example, some ML algorithms would extend linear regression approach to model non-linear (i.e. higher order) relationships between dependent and independent variables in the dataset via clever mathematical transformations (a.k.a kernel methods) that will express those non-linear relationship in a linear form and hence suitable to be run through a linear algorithm. Be it a simple linear algorithm or its more sophisticated kernel methods variation, ML algorithms will not have any context on the data they process. This is both a strength and weakness at the same time. Strength because the same algorithms could process a variety of different kinds of data, allowing us to leverage all the work gone through the development of those algorithms in different business contexts, weakness because since the algorithms lack any contextual understanding of the data, perennial computer science truth of garbage in, garbage out manifests itself unceremoniously here : ML models have to be fed "right" kind of data to draw out correct insights that explain the inner relationships in the data being processed. ML technology provides an impressive set of sophisticated data analysis and modelling algorithms that could find out very intricate relationships among the datasets they process. It provides not only very sophisticated, advanced data analysis and modeling methods but also the ability to use these methods in an automated, hence massively distributed and scalable ways. Its Achilles' heel however is its heavy dependence on the data it is being fed with. Best analytic methods would be useless, as far as drawing out useful insights from them are concerned, if they are applied on the wrong kind of data. More seriously, the use of advanced analytical technology could give a false sense of confidence to their users over the analysis results those methods produce, making the whole undertaking not just useless but actually dangerous. We can address the fundamental weakness of ML technology by deploying its advanced, raw algorithmic processing capabilities in conjunction with the existing data analytics technology whereby contextual data relationships and key domain knowledge coming from existing BI estate (data mining efforts, data warehouses, enterprise data models, business rules, etc.) are used to feed ML analytics pipeline. This approach will combine superior algorithmic processing capabilities of the new ML technology with the enterprise knowledge accumulated through BI efforts and will allow companies build on their existing data analytics investments while transitioning to use incoming advanced technologies. This, I believe, is effectively a win-win situation and will be key to the success of any company involved in data analytics efforts.

Read More

Thinking Like a Data Scientist

Article | December 23, 2020

Introduction Nowadays, everyone with some technical expertise and a data science bootcamp under their belt calls themselves a data scientist. Also, most managers don't know enough about the field to distinguish an actual data scientist from a make-believe one someone who calls themselves a data science professional today but may work as a cab driver next year. As data science is a very responsible field dealing with complex problems that require serious attention and work, the data scientist role has never been more significant. So, perhaps instead of arguing about which programming language or which all-in-one solution is the best one, we should focus on something more fundamental. More specifically, the thinking process of a data scientist. The challenges of the Data Science professional Any data science professional, regardless of his specialization, faces certain challenges in his day-to-day work. The most important of these involves decisions regarding how he goes about his work. He may have planned to use a particular model for his predictions or that model may not yield adequate performance (e.g., not high enough accuracy or too high computational cost, among other issues). What should he do then? Also, it could be that the data doesn't have a strong enough signal, and last time I checked, there wasn't a fool-proof method on any data science programming library that provided a clear-cut view on this matter. These are calls that the data scientist has to make and shoulder all the responsibility that goes with them. Why Data Science automation often fails Then there is the matter of automation of data science tasks. Although the idea sounds promising, it's probably the most challenging task in a data science pipeline. It's not unfeasible, but it takes a lot of work and a lot of expertise that's usually impossible to find in a single data scientist. Often, you need to combine the work of data engineers, software developers, data scientists, and even data modelers. Since most organizations don't have all that expertise or don't know how to manage it effectively, automation doesn't happen as they envision, resulting in a large part of the data science pipeline needing to be done manually. The Data Science mindset overall The data science mindset is the thinking process of the data scientist, the operating system of her mind. Without it, she can't do her work properly, in the large variety of circumstances she may find herself in. It's her mindset that organizes her know-how and helps her find solutions to the complex problems she encounters, whether it is wrangling data, building and testing a model or deploying the model on the cloud. This mindset is her strategy potential, the think tank within, which enables her to make the tough calls she often needs to make for the data science projects to move forward. Specific aspects of the Data Science mindset Of course, the data science mindset is more than a general thing. It involves specific components, such as specialized know-how, tools that are compatible with each other and relevant to the task at hand, a deep understanding of the methodologies used in data science work, problem-solving skills, and most importantly, communication abilities. The latter involves both the data scientist expressing himself clearly and also him understanding what the stakeholders need and expect of him. Naturally, the data science mindset also includes organizational skills (project management), the ability to work well with other professionals (even those not directly related to data science), and the ability to come up with creative approaches to the problem at hand. The Data Science process The data science process/pipeline is a distillation of data science work in a comprehensible manner. It's particularly useful for understanding the various stages of a data science project and help plan accordingly. You can view one version of it in Fig. 1 below. If the data science mindset is one's ability to navigate the data science landscape, the data science process is a map of that landscape. It's not 100% accurate but good enough to help you gain perspective if you feel overwhelmed or need to get a better grip on the bigger picture. Learning more about the topic Naturally, it's impossible to exhaust this topic in a single article (or even a series of articles). The material I've gathered on it can fill a book! If you are interested in such a book, feel free to check out the one I put together a few years back; it's called Data Science Mindset, Methodologies, and Misconceptions and it's geared both towards data scientist, data science learners, and people involved in data science work in some way (e.g. project leaders or data analysts). Check it out when you have a moment. Cheers!

Read More

Soft Skills in Data Science

Article | April 29, 2021

We live in a world convulsed by new technologies and we are witnessing how more and more processes are automated in order to be executed with the same skill or even with better results than if they were carried out by a human, all this in order to be more efficient and effective. In this context the world of work is becoming increasingly competitive, because to remain employable we need to learn to manage or find a way to adapt our knowledge and skills to new technologies. With the spread of e-learning platforms and the tutorials that we can find available on the internet, acquiring new knowledge is within everyone's reach. For this reason, it is necessary to differentiate ourselves in order to stand out from other professionals, who have the hard skills similar to ours and this is precisely where Soft Skills play a very important role. What are Soft Skills? Soft skills are actually a combination of individual social skills, communication skills, personality traits, attitudes, social intelligence and emotional intelligence. Which facilitate relationships with others, making us more effective when interacting with other people. We could say that Soft Skills are the human interface that allow us to adapt to different working environments and industries. They are powerful tools for personal and professional growth. Why are Soft Skills key in our professional growth? Nowadays, standing out in the world of work is getting increasingly difficult, regardless of whether you are part of a corporation or work independently, due to the great competition within the labor market. That is why we must develop certain skills and attitudes that help us to function properly and successfully meet professional demands. Soft Skills are the point of differentiation that allows us to be selected for a position. The reason is very simple, we could be applying for a position and competing with people that are equal or even more qualified than us at a technical level, but to achieve the collaborative objectives of the company, more is required than just the technical and rational part. Also the way of communicating, values, ethics, as well as personality traits are highly valued factors since they help to drive organizations through high-performance teams, guaranteeing the achievement of their objectives. The background of the Soft Skills that we have trained throughout our lives make us unique, because it is unlikely that two people have the same combination of Soft Skills and been trained in a similar way, and that makes us more competitive against certain job opportunities where perhaps many will have the same Hard Skills, but where our Soft Skills will be the ones that will make us stand out to continue advancing in our professional career. How to sharpen our Soft Skills? To perform in any job we necessarily need to interact with other people, even if we work independently or remotely, so we must have the necessary skills that allow us to connect successfully with our teammates and stakeholders. Starting from the fact that Soft Skills are human skills, we can say that we have them pre-installed and the way to start using them (installing them) is through the experiences we undergo every day. Imagine being able to communicate assertively in your work environment and in your personal life. Master the use of tools installed in you to improve your interpersonal relationships within your work teams and reduce conflict. This would allow you to foster a healthy working environment and be able to lead any team in any environment in a strategic and effective way. Think of Soft Skills as a set of Apps that are ready to be used (like a toolbox) and that according to the experiences that are presented in our personal and / or professional lives, we are going to choose to use these applications to achieve our goals. Every time we access one of these applications, we are giving it the opportunity to collect data that will allow it to personalize its insights according to our needs and to fine-tune its effectiveness each time we use it. One of the best ways to train our Soft Skills is by leaving our comfort zone, because that will allow us to 'install' more and more Soft Skills. Another way to refine our Soft Skills is by participating in activities that involve people we do not know and even better if we involve people from other cultures, because we will achieve a beneficial exchange of experiences and knowledge for both parties that will enrich and make the training of our Soft Skills even more valuable. Some examples of activities that will enhance your Soft Skills: • Participate in competitions (e.g. Hackathons) • Found or be a lead of a community that shares your interests, and organizes small or large projects. • Organize a study group aimed at carrying out a technical or business project in order to confront professionals from various fields or industries. • Find resources and experts to help you. There are Soft Skills trainers who know useful techniques and tips to develop/sharpen your skills. • Participate in volunteer activities. You will meet new people with whom to put your Soft Skills in action. These activities will train/sharpen your leadership skills, teamwork, delegation, interpersonal communication, persuasion, etc. These are skills that we do not have as much facility to train while we are students or when we have just started working after finishing our studies, and that are required in the labor market to continue climbing in our professional career. Why do Soft Skills matter in the Data Science universe? A consequence of the use of Artificial Intelligence and Data Science is that many of the jobs that we know today will be automated and this is a matter of concern for many professionals who see their careers are in danger, but the good news is that in the future many new jobs the Soft Skills will be the main protagonists, this is what John Thompson explains us in his book "Building Analytics Teams" In other words, it is precisely our human skills that will allow us to be more employable in the future, and they will be highly requested skills because according to what the experts envision which is, that the machines will not be able to match us in this field, and that is why training our Soft Skills becomes a priority because they will allow us to be the key players of the future. On the other hand, Data Science is an interdisciplinary field where Soft Skills such as cooperation and communication are essential to achieve the goals set. Denis Rothman, author of the book "Transformers for Natural Language Processing" in an interview that I conducted, mentioned that The Human Quality is the most important thing for him when choosing the members of his work team. These are some considerations to take into account to generate a culture of cooperation: • People work harder and need less supervision, when they themselves control their work and have more freedom to choose how to do it. When they work as a team, they show greater motivation, their sense of pride increases and productivity reaches higher levels. • Solid teams that seek quality and excellence correct themselves; that is, they identify problems and correct them very quickly. Thus, they gain work experience and increase their performance. • Forming a solid and efficient work team requires patience. You need to give them time to see your results. They will have to establish procedures to complete tasks, handle administrative functions and work together efficiently, they will even have to adapt to their own decisions and accept their consequences. • A manager or team leader must recognize the team building process without expecting immediate results. The group will have to go through a learning process and this will take longer in some groups than in others. Another key component to achieving high levels of cooperation is fluid communication among team members and stakeholders. For instance defining the communication channels and the contact points in the different teams involved, guarantees the constant flow of communication during the life cycle of a Data Science project. One of the most critical moments is the presentation of the results to the stakeholders. In some cases the results of a project are not taken into consideration not so much because the expected results are not achieved, but because the way in which these results are presented are not meaningful for the stakeholders, and this, in most cases, it is due to the existence of communication barriers that is a consequence of the use of a language (terminologies) used in the technical world but not in the business world. After taking a tour of the world of Soft Skills, we can conclude by saying that Soft Skills are like superpowers that are waiting for the opportunity to be put into action, to make you a superhero or superheroine. Keep climbing positions in your professional career depends on you, on how much you use these superpowers but above all on your skills to refine them and make them available to the work team of which you are part. Don't wait any longer and start discovering your potential, start training your Soft Skills! If you want to know more about Soft Skills, I invite you to visit The Soft Skills Show

Read More

Forward-thinking Business And The Implications Of Big Data

Article | March 23, 2020

Big data is a modern phenomenon transforming businesses of today. Organisations hold vast swathes of data, from historic and current orders to detailed insights about supply chain operations. This information, combined with external data such as market intelligence and even weather patterns, can provide businesses with a foundation on which to base their planning and decision-making. Business intelligence and analytical solutions pull valuable insights from huge datasets. From workforce optimisation to cost management, access to big data and the tools that manage and evaluate it allows firms to streamline key parts of their business. Adopters of modern solutions are seeing vast improvements in all areas of the company.

Read More

Spotlight

Market Equations

Market Equations is a result oriented Research and Business Analytics Outsourcing Company, headquartered in India, offering high quality Online Media Research & Monitoring services to organizations worldwide. We provide traditional and social media intelligence by offering our clients insight into their external environment and exposure to various media channels including broadcast, print and online media. Market Equations India offers unique Qualitative and Quantitative Online Media Research, Monitoring and Measurement services to help organizations listen to what their investors, consumers, competitors and employees are communicating in this medium…

Events