Cultivating Data Scientists—from the Ground Up

In today’s digitally driven world, enterprises have gotten pretty good at the business of capturing and storing enormous amounts of data. Now they are challenged to unearth the business value that is buried in all of that data by using advanced analytics techniques.

For many organizations, the hard part is figuring out how to move forward in a strategic manner. They have the data, they have business analysts, and they have the IT infrastructure. Now they are asking:  What’s possible with all this data? How do we get started down the path to advanced analytics and machine learning? How do we cultivate the expertise we need?

This really isn’t a technology problem. This is a people problem.  At a technology level, powerful, easy-to-use analytical platforms and solutions are now more accessible than ever before. The real organizational challenge is one of cultivating the expertise and experience needed to capitalize on data analytics. This challenge is exacerbated by the speed at which analytics tools and platforms are evolving. Even if you have bona fide data scientists on your team, they will need help keeping up with the latest methods.

Furthermore, the transition to a more data-analytics-driven organization is not just a shift in expertise, but in the way you ask and answer important business questions.  The questions that your data science program answers will result from your organization’s unique mix of business challenges and opportunities, data sources and strategic goals.  A key role of the advanced analytics team is to facilitate the iterative process of identifying valuable questions and then assessing whether they can be usefully answered given the available data, expertise and organizational workflows.

To have the best chance of success, your data science team will need a core group of members whose sole responsibility is to accelerate the adoption of the data science methodology across the organization.  There are a couple of reasons to make data science a central function, rather than sprinkling data scientists throughout the organization.

First, data science is a team sport.  As we just described, the goal of the team is to iterate through different potential business questions to settle on key problems to solve, and the strategies that will be needed to solve them.  This requires technical understanding of both data analytics and IT infrastructure, and it requires business domain knowledge and an appreciation for how real business value can be created for the organization.  This is already potentially three different individuals.  Furthermore, data science is not one thing: It is a constellation of different techniques, all of which have domains of specific applicability.  Each project is likely to require more than one data science skillset.  For example, to create a system to recommend products to customers, you will need to apply expertise about recommendation systems, but you will also likely need to use natural language processing to extract key product attributes from text that describes the product.  This may require input from two different experts. In our experience, almost every advanced analytics project requires multiple skillsets.

From a resource allocation perspective, you don’t want to have to hire multiple unicorn data scientists, who have a deep understanding of all the analytics tools, for every small project in the company.  Instead, you want to have a data science “SWAT Team,” with members who can apply their expertise anywhere, when it’s needed, and then move on to the next project.  This also creates a more dynamic environment for these data scientists so that they don’t become bored working on a single project all the time.

An additional benefit of building a core team dedicated to data science is that they can act as evangelists across the organization, teaching others key skills and articulating clearly the value of the work they have done. We have seen data science evangelism become a critical success factor in many organizations.

At this stage you are asking yourself, “Where do I get the talented folks to staff this SWAT team?” The first place to look is your current analytics and reporting organization. Where to look:

Technical-minded business managers: You will find that some of your business leaders are more technical-minded.  Perhaps they began their career with a technical degree.  Maybe they just have a knack for, or an interest in technology. These technical business folks have a passion for newly available technology and will be great at nudging the discussion of data and models toward real business value.

Data Engineers and Extract-Transform-Load (ETL): A lot of the real work that data scientists do is centered on cleaning and organizing data, work that the ETL folks have been doing for years.

Business Intelligence Analysts: Furthermore, your BI analysts have been developing reports, metrics and queries to expose business value for years, too.

There are two major gaps for analysts making the leap to data science.  First, they are accustomed to creating reports with SQL-based tools from structured data that has already been organized into a well-defined schema and has reasonably well-understood relationships.  Data science creates value by moving beyond structured data with a single representation, to diverse, often unstructured data.  This requires a mindset shift.

The second gap for traditional analysts is actually a consequence of the first: Modeling data and extracting key attributes (“features” in data science parlance) requires new tools, such as text mining, natural language processing, machine learning along with using a new programming language such as Python in place of the old, familiar SQL.

The good news is that the number of training tools, courses and technologies for making data science accessible to analytics practitioners is exploding.  Here at Intel, we are dedicated to the democratization of data science, advanced analytics and artificial intelligence, and we have created a number of ways to help educate and inform the analytics community.  Here are a few examples:

  • Intel® Nervana™ AI Academy puts forth the most recent, open-sourced and optimized frameworks with tutorials. It’s supported by the very engineers working on the supporting technology and backed with a community effort where support and training are provided through the portal, via partnership with institutions like Coursera, and in person at area meetups and events.
  • The team behind the Academy is selecting a growing number of schools to support the Intel® Student Developer Program. Continuing into 2017, Intel will collaborate with school programs to hold workshops, and nurture curriculum and project development.
  • Intel’s recently announced partnership with Kaggle will bring about incredible opportunities to develop and showcase data science skills in the context of worthwhile efforts extending from Intel’s AI for Good initiatives.

Armed with new skills you will want your data science team to start by defining and solving one key business question. In this step, your goal is to come up with a simple problem that you already understand well, and that relies on data you have ready access to. The idea is to be able to work through the process of assembling and cleaning data and then applying analytical tools to it to get to a new result.  If you understand the problem well already, then you will be prepared to interpret the results of the data analytics process. It is critical to define a business performance metric that you will improve with this work.  Be sure to measure a baseline value for this metric that is irrefutable before you implement any new analysis.  You will need the before and after comparison to help you tell the story.

In the meantime, the keys to success for your nascent data science team are to:

Ultimately, as your data science efforts gain traction and your organization’s hunger for powerful analytical results grows, you will want to hire new talent.  In an upcoming blog we will look at the current education system for data science students with the goal of helping you define what skills you can look for in recently matriculated students, and how you can keep them on a learning path once they are on your team. We’ll also dive a little deeper into Intel’s efforts to help educate the community.

  • Assemble a team from existing resources
  • Give them some opportunities to expand their skillsets
  • Answer a very specific business question with new data
  • Evangelize your results across the organization

We hope to see you in the next edition of this series where we dive deeper into the education and training resources available for advancing your data scientist career.