Article | March 21, 2020
Splunk extracts insights from big data. It is growing rapidly, it has a large total addressable market, and it has tremendous momentum from its exposure to industry megatrends (i.e. the cloud, big data, the "internet of things," and security). Further, its strategy of continuous innovation is being validated as the company wins very large deals. Investors should not be distracted by a temporary slowdown in revenue growth, as the company has wisely transitioned to a subscription model. This article reviews the business, its strategy, valuation the sell-off is overdone and risks. We conclude with our thoughts on investing.
Article | December 23, 2020
Nowadays, everyone with some technical expertise and a data science bootcamp under their belt calls themselves a data scientist. Also, most managers don't know enough about the field to distinguish an actual data scientist from a make-believe one someone who calls themselves a data science professional today but may work as a cab driver next year. As data science is a very responsible field dealing with complex problems that require serious attention and work, the data scientist role has never been more significant. So, perhaps instead of arguing about which programming language or which all-in-one solution is the best one, we should focus on something more fundamental. More specifically, the thinking process of a data scientist.
The challenges of the Data Science professional
Any data science professional, regardless of his specialization, faces certain challenges in his day-to-day work. The most important of these involves decisions regarding how he goes about his work. He may have planned to use a particular model for his predictions or that model may not yield adequate performance (e.g., not high enough accuracy or too high computational cost, among other issues). What should he do then? Also, it could be that the data doesn't have a strong enough signal, and last time I checked, there wasn't a fool-proof method on any data science programming library that provided a clear-cut view on this matter. These are calls that the data scientist has to make and shoulder all the responsibility that goes with them.
Why Data Science automation often fails
Then there is the matter of automation of data science tasks. Although the idea sounds promising, it's probably the most challenging task in a data science pipeline. It's not unfeasible, but it takes a lot of work and a lot of expertise that's usually impossible to find in a single data scientist. Often, you need to combine the work of data engineers, software developers, data scientists, and even data modelers. Since most organizations don't have all that expertise or don't know how to manage it effectively, automation doesn't happen as they envision, resulting in a large part of the data science pipeline needing to be done manually.
The Data Science mindset overall
The data science mindset is the thinking process of the data scientist, the operating system of her mind. Without it, she can't do her work properly, in the large variety of circumstances she may find herself in. It's her mindset that organizes her know-how and helps her find solutions to the complex problems she encounters, whether it is wrangling data, building and testing a model or deploying the model on the cloud. This mindset is her strategy potential, the think tank within, which enables her to make the tough calls she often needs to make for the data science projects to move forward.
Specific aspects of the Data Science mindset
Of course, the data science mindset is more than a general thing. It involves specific components, such as specialized know-how, tools that are compatible with each other and relevant to the task at hand, a deep understanding of the methodologies used in data science work, problem-solving skills, and most importantly, communication abilities. The latter involves both the data scientist expressing himself clearly and also him understanding what the stakeholders need and expect of him. Naturally, the data science mindset also includes organizational skills (project management), the ability to work well with other professionals (even those not directly related to data science), and the ability to come up with creative approaches to the problem at hand.
The Data Science process
The data science process/pipeline is a distillation of data science work in a comprehensible manner. It's particularly useful for understanding the various stages of a data science project and help plan accordingly. You can view one version of it in Fig. 1 below. If the data science mindset is one's ability to navigate the data science landscape, the data science process is a map of that landscape. It's not 100% accurate but good enough to help you gain perspective if you feel overwhelmed or need to get a better grip on the bigger picture.
Learning more about the topic
Naturally, it's impossible to exhaust this topic in a single article (or even a series of articles). The material I've gathered on it can fill a book! If you are interested in such a book, feel free to check out the one I put together a few years back; it's called Data Science Mindset, Methodologies, and Misconceptions and it's geared both towards data scientist, data science learners, and people involved in data science work in some way (e.g. project leaders or data analysts). Check it out when you have a moment. Cheers!
Article | February 23, 2020
Does the success of companies like Google depend on that of the algorithms or that of data? Today’s fascination with artificial intelligence (AI) reflects both our appetite for data and our excitement about the new opportunities in machine learning. Amalio Telenti, Chief Data Scientist and Head of Computational Biology at Vir Biotechnology Inc. argue that newcomers to the field of data science are blinded by the shiny object of magical algorithms and that they forget the critical infrastructures that are needed to create and to manage data in the first place.Data management and infrastructures are the little ugly duckling of data science but they are necessary for a successful program and therefore need to be built with purpose. This requires careful consideration of strategies for data capture, storage of raw and processed data and instruments for retrieval. Beyond the virtues of analysis, there are also the benefits of facilitated retrieval. While there are many solutions for visualization of corporate or industrial data, there is still a need for flexible retrieval tools in the form of search engines that query the diverse sources and forms of data and information that are generated at a given company or institution.
Article | February 11, 2020
Whilst there are many people that associate AI with sci-fi novels and films, its reputation as an antagonist to fictional dystopic worlds is now becoming a thing of the past, as the technology becomes more and more integrated into our everyday lives.AI technologies have become increasingly more present in our daily lives, not just with Alexa’s in the home, but also throughout businesses everywhere, disrupting a variety of different industries with often tremendous results. The technology has helped to streamline even the most mundane of tasks whilst having a breath-taking impact on a company’s efficiency and productivity.However, AI has not only transformed administrative processes and freed up more time for companies, it has also contributed to some ground-breaking moments in business, being a must-have for many in order to keep up with the competition.