Five Common Hadoopable Problems

|

article image
Apache Hadoop has evolved into the standard platform solution for data storage and analysis. Large, successful companies are increasingly adopting Hadoop to perform powerful analyses of their ever-growing business data.Two key aspects of Hadoop have driven its rapid adoption by companies hungry for improved insights into the data they collect: Hadoop can store data of any type and from any source—inexpensively and at very large scale. Hadoop enables the sophisticated analysis of even very large data sets, easily and quickly. However, Hadoop concepts are unfamiliar to many people with a background in traditional databaseand data warehousing systems, and its business value is often underappreciated.

Spotlight

RM Dayton Analytics

RM Dayton Analytics, a CIO Review Top 20 Most Promising Solution Provider 2016, is trusted by Fortune 100 companies, as well as industry disruptors and other admired Global 2000 brands, to find the sharpest minds that embrace technology and enhance the customer experience.

OTHER ARTICLES
DATA SCIENCE

How Machine Learning Can Take Data Science to a Whole New Level

Article | December 21, 2020

Introduction Machine Learning (ML) has taken strides over the past few years, establishing its place in data analytics. In particular, ML has become a cornerstone in data science, alongside data wrangling, and data visualization, among other facets of the field. Yet, we observe many organizations still hesitant when allocating a budget for it in their data pipelines. The data engineer role seems to attract lots of attention, but few companies leverage the machine learning expert/engineer. Could it be that ML can add value to other enterprises too? Let's find out by clarifying certain concepts. What Machine Learning is So that we are all on the same page, let's look at a down-to-earth definition of ML that you can include in a company meeting, a report, or even within an email to a colleague who isn't in this field. Investopedia defines ML as "the concept that a computer program can learn and adapt to new data without human intervention." In other words, if your machine (be it a computer, a smartphone, or even a smart device) can learn on its own, using some specialized software, then it's under the ML umbrella. It's important to note that ML is also a stand-alone field of research, predating most AI systems, even if the two are linked, as we'll see later on. How Machine Learning is different from Statistics It's also important to note that ML is different from Statistics, even if some people like to view the former as an extension of the latter. However, there is a fundamental difference that most people aren't aware of yet. Namely, ML is data-driven while Statistics is, for the most part, model-driven. This statement means that most Stats-based inferences are made by assuming a particular distribution in the data, or the interactions of different variables, and making predictions based on our mathematical models of these distributions. ML may employ distributions in some niche cases, but for the most part, it looks at data as-is, without making any assumptions about it. Machine Learning’s role in data science work Let’s now get to the crux of the matter and explore how ML can be a significant value-add to a data science pipeline. First of all, ML can potentially offer better predictions than most Stats models in terms of accuracy, F1 score, etc. Also, ML can work alongside existing models to form model ensembles that can tackle the problems more effectively. Additionally, if transparency is important to the project stakeholders, there are ML-based options for offering some insight as to what variables are important in the data at hand, for making predictions based on it. Moreover, ML is more parametrized, meaning that you can tweak an ML model more, adapting it to the data you have and ensuring more robustness (i.e., reliability). Finally, you can learn ML without needing a Math degree or any other formal training. The latter, however, may prove useful, if you wish to delve deeper into the topic and develop your own models. This innovation potential is a significant aspect of ML since it's not as easy to develop new models in Stats (unless you are an experienced Statistics researcher) or even in AI. Besides, there are a bunch of various "heuristics" that are part of the ML group of algorithms, facilitating your data science work, regardless of what predictive model you end up using. Machine Learning and AI Many people conflate ML with AI these days. This confusion is partly because many ML models involve artificial neural networks (ANNs) which are the most modern manifestation of AI. Also, many AI systems are employed in ML tasks, so they are referred to as ML systems since AI can be a bit generic as a term. However, not all ML algorithms are AI-related, nor are all AI algorithms under the ML umbrella. This distinction is of import because certain limitations of AI systems (e.g., the need for lots and lots of data) don't apply to most ML models, while AI systems tend to be more time-consuming and resource-heavy than the average ML one. There are several ML algorithms you can use without breaking the bank and derive value from your data through them. Then, if you find that you need something better, in terms of accuracy, you can explore AI-based ones. Keep in mind, however, that some ML models (e.g., Decision Trees, Random Forests, etc.) offer some transparency, while the vast majority of AI ones are black boxes. Learning more about the topic Naturally, it's hard to do this topic justice in a single article. It is so vast that someone can write a book on it! That's what I've done earlier this year, through the Technics Publications publishing house. You can learn more about this topic via this book, which is titled Julia for Machine Learning(Julia is a modern programming language used in data science, among other fields, and it's popular among various technical professionals). Feel free to check it out and explore how you can use ML in your work. Cheers!

Read More
BIG DATA MANAGEMENT

How Should Data Science Teams Deal with Operational Tasks?

Article | December 21, 2020

Introduction There are many articles explaining advanced methods on AI, Machine Learning or Reinforcement Learning. Yet, when it comes to real life, data scientists often have to deal with smaller, operational tasks, that are not necessarily at the edge of science, such as building simple SQL queries to generate lists of email addresses to target for CRM campaigns. In theory, these tasks should be assigned to someone more suited, such as Business Analysts or Data Analysts, but it is not always the case that the company has people dedicated specifically to those tasks, especially if it’s a smaller structure. In some cases, these activities might consume so much of our time that we don’t have much left for the stuff that matters, and might end up doing a less than optimal work in both. That said, how should we deal with those tasks? In one hand, not only we usually don’t like doing operational tasks, but they are also a bad use of an expensive professional. On the other hand, someone has to do them, and not everyone has the necessary SQL knowledge for it. Let’s see some ways in which you can deal with them in order to optimize your team’s time. Reduce The first and most obvious way of doing less operational tasks is by simply refusing to do them. I know it sounds harsh, and it might be impractical depending on your company and its hierarchy, but it’s worth trying it in some cases. By “refusing”, I mean questioning if that task is really necessary, and trying to find best ways of doing it. Let’s say that every month you have to prepare 3 different reports, for different areas, that contain similar information. You have managed to automate the SQL queries, but you still have to double check the results and eventually add/remove some information upon the user’s request or change something in the charts layout. In this example, you could see if all of the 3 different reports are necessary, or if you could adapt them so they become one report that you send to the 3 different users. Anyways, think of ways through which you can reduce the necessary time for those tasks or, ideally, stop performing them at all. Empower Sometimes it can pay to take the time to empower your users to perform some of those tasks themselves. If there is a specific team that demands most of the operational tasks, try encouraging them to use no-code tools, putting it in a way that they fell they will be more autonomous. You can either use already existing solutions or develop them in-house (this could be a great learning opportunity to develop your data scientists’ app-building skills). Automate If you notice it’s a task that you can’t get rid of and can’t delegate, then try to automate it as much as possible. For reports, try to migrate them to a data visualization tool such as Tableau or Google Data Studio and synchronize them with your database. If it’s related to ad hoc requests, try to make your SQL queries as flexible as possible, with variable dates and names, so that you don’t have to re-write them every time. Organize Especially when you are a manager, you have to prioritize, so you and your team don’t get drowned in the endless operational tasks. In order to do this, set aside one or two days in your week which you will assign to that kind of work, and don’t look at it in the remaining 3–4 days. To achieve this, you will have to adapt your workload by following the previous steps and also manage expectations by taking this smaller amount of work hours when setting deadlines. This also means explaining the paradigm shift to your internal clients, so they can adapt to these new deadlines. This step might require some internal politics, negotiating with your superiors and with other departments. Conclusion Once you have mapped all your operational activities, you start by eliminating as much as possible from your pipeline, first by getting rid of unnecessary activities for good, then by delegating them to the teams that request them. Then, whatever is left for you to do, you automate and organize, to make sure you are making time for the relevant work your team has to do. This way you make sure expensive employees’ time is being well spent, maximizing company’s profit.

Read More
DATA SCIENCE

Soft Skills in Data Science

Article | December 21, 2020

We live in a world convulsed by new technologies and we are witnessing how more and more processes are automated in order to be executed with the same skill or even with better results than if they were carried out by a human, all this in order to be more efficient and effective. In this context the world of work is becoming increasingly competitive, because to remain employable we need to learn to manage or find a way to adapt our knowledge and skills to new technologies. With the spread of e-learning platforms and the tutorials that we can find available on the internet, acquiring new knowledge is within everyone's reach. For this reason, it is necessary to differentiate ourselves in order to stand out from other professionals, who have the hard skills similar to ours and this is precisely where Soft Skills play a very important role. What are Soft Skills? Soft skills are actually a combination of individual social skills, communication skills, personality traits, attitudes, social intelligence and emotional intelligence. Which facilitate relationships with others, making us more effective when interacting with other people. We could say that Soft Skills are the human interface that allow us to adapt to different working environments and industries. They are powerful tools for personal and professional growth. Why are Soft Skills key in our professional growth? Nowadays, standing out in the world of work is getting increasingly difficult, regardless of whether you are part of a corporation or work independently, due to the great competition within the labor market. That is why we must develop certain skills and attitudes that help us to function properly and successfully meet professional demands. Soft Skills are the point of differentiation that allows us to be selected for a position. The reason is very simple, we could be applying for a position and competing with people that are equal or even more qualified than us at a technical level, but to achieve the collaborative objectives of the company, more is required than just the technical and rational part. Also the way of communicating, values, ethics, as well as personality traits are highly valued factors since they help to drive organizations through high-performance teams, guaranteeing the achievement of their objectives. The background of the Soft Skills that we have trained throughout our lives make us unique, because it is unlikely that two people have the same combination of Soft Skills and been trained in a similar way, and that makes us more competitive against certain job opportunities where perhaps many will have the same Hard Skills, but where our Soft Skills will be the ones that will make us stand out to continue advancing in our professional career. How to sharpen our Soft Skills? To perform in any job we necessarily need to interact with other people, even if we work independently or remotely, so we must have the necessary skills that allow us to connect successfully with our teammates and stakeholders. Starting from the fact that Soft Skills are human skills, we can say that we have them pre-installed and the way to start using them (installing them) is through the experiences we undergo every day. Imagine being able to communicate assertively in your work environment and in your personal life. Master the use of tools installed in you to improve your interpersonal relationships within your work teams and reduce conflict. This would allow you to foster a healthy working environment and be able to lead any team in any environment in a strategic and effective way. Think of Soft Skills as a set of Apps that are ready to be used (like a toolbox) and that according to the experiences that are presented in our personal and / or professional lives, we are going to choose to use these applications to achieve our goals. Every time we access one of these applications, we are giving it the opportunity to collect data that will allow it to personalize its insights according to our needs and to fine-tune its effectiveness each time we use it. One of the best ways to train our Soft Skills is by leaving our comfort zone, because that will allow us to 'install' more and more Soft Skills. Another way to refine our Soft Skills is by participating in activities that involve people we do not know and even better if we involve people from other cultures, because we will achieve a beneficial exchange of experiences and knowledge for both parties that will enrich and make the training of our Soft Skills even more valuable. Some examples of activities that will enhance your Soft Skills: • Participate in competitions (e.g. Hackathons) • Found or be a lead of a community that shares your interests, and organizes small or large projects. • Organize a study group aimed at carrying out a technical or business project in order to confront professionals from various fields or industries. • Find resources and experts to help you. There are Soft Skills trainers who know useful techniques and tips to develop/sharpen your skills. • Participate in volunteer activities. You will meet new people with whom to put your Soft Skills in action. These activities will train/sharpen your leadership skills, teamwork, delegation, interpersonal communication, persuasion, etc. These are skills that we do not have as much facility to train while we are students or when we have just started working after finishing our studies, and that are required in the labor market to continue climbing in our professional career. Why do Soft Skills matter in the Data Science universe? A consequence of the use of Artificial Intelligence and Data Science is that many of the jobs that we know today will be automated and this is a matter of concern for many professionals who see their careers are in danger, but the good news is that in the future many new jobs the Soft Skills will be the main protagonists, this is what John Thompson explains us in his book "Building Analytics Teams" In other words, it is precisely our human skills that will allow us to be more employable in the future, and they will be highly requested skills because according to what the experts envision which is, that the machines will not be able to match us in this field, and that is why training our Soft Skills becomes a priority because they will allow us to be the key players of the future. On the other hand, Data Science is an interdisciplinary field where Soft Skills such as cooperation and communication are essential to achieve the goals set. Denis Rothman, author of the book "Transformers for Natural Language Processing" in an interview that I conducted, mentioned that The Human Quality is the most important thing for him when choosing the members of his work team. These are some considerations to take into account to generate a culture of cooperation: • People work harder and need less supervision, when they themselves control their work and have more freedom to choose how to do it. When they work as a team, they show greater motivation, their sense of pride increases and productivity reaches higher levels. • Solid teams that seek quality and excellence correct themselves; that is, they identify problems and correct them very quickly. Thus, they gain work experience and increase their performance. • Forming a solid and efficient work team requires patience. You need to give them time to see your results. They will have to establish procedures to complete tasks, handle administrative functions and work together efficiently, they will even have to adapt to their own decisions and accept their consequences. • A manager or team leader must recognize the team building process without expecting immediate results. The group will have to go through a learning process and this will take longer in some groups than in others. Another key component to achieving high levels of cooperation is fluid communication among team members and stakeholders. For instance defining the communication channels and the contact points in the different teams involved, guarantees the constant flow of communication during the life cycle of a Data Science project. One of the most critical moments is the presentation of the results to the stakeholders. In some cases the results of a project are not taken into consideration not so much because the expected results are not achieved, but because the way in which these results are presented are not meaningful for the stakeholders, and this, in most cases, it is due to the existence of communication barriers that is a consequence of the use of a language (terminologies) used in the technical world but not in the business world. After taking a tour of the world of Soft Skills, we can conclude by saying that Soft Skills are like superpowers that are waiting for the opportunity to be put into action, to make you a superhero or superheroine. Keep climbing positions in your professional career depends on you, on how much you use these superpowers but above all on your skills to refine them and make them available to the work team of which you are part. Don't wait any longer and start discovering your potential, start training your Soft Skills! If you want to know more about Soft Skills, I invite you to visit The Soft Skills Show

Read More

What is the Difference Between Business Intelligence, Data Warehousing and Data Analytics

Article | December 21, 2020

In the age of Big Data, you’ll hear a lot of terms tossed around. Three of the most commonly used are business intelligence,” data warehousing and data analytics.You may wonder, however, what distinguishes these three concepts from each other so let’s take a look. What differentiates business intelligence from the other two on the list is the idea of presentation. Business intelligence is primarily about how you take the insights you’ve developed from the use of analytics to produce action. BI tools include items like To put it simply, business intelligence is the final product. It’s the yummy cooked food that comes out of the frying pan when everything is done.In the flow of things, business intelligence interacts heavily with data warehousing and analytics systems. Information can be fed into analytics packages from warehouses. It then comes out of the analytics software and is routed back into storage and also into BI. Once the BI products have been created, information may yet again be fed back into data storage and warehousing.

Read More

Spotlight

RM Dayton Analytics

RM Dayton Analytics, a CIO Review Top 20 Most Promising Solution Provider 2016, is trusted by Fortune 100 companies, as well as industry disruptors and other admired Global 2000 brands, to find the sharpest minds that embrace technology and enhance the customer experience.

Events