Managing the Changing IT Landscape: Creating Data Scientists
The men’s college basketball tournament is a wild ride from start to finish—a notoriously unpredictable contest. I’m especially excited this year to watch the outcome because there’s money riding on it. Not a bet, but a cash prize for the best analytics model accurately predicting the winning teams during the tournament. I’m not referring to Warren Buffet’s prize either, but rather to March Machine Learning Mania, a competition sponsored by Intel on the Kaggle* platform with a $15K award. A previous blog describes the predictive analytics competition in more detail.
Unlike other bracket prediction contests or office pools that reward an individual’s predictions (dare I say guesses) for the outcome of the actual tournament, the March Machine Learning Mania contest focuses on creating the best analytics model that most accurately predicts the outcome of the tournament, regardless of who is playing. The goal here is to put science and technology behind the predictions to help make those guesses more accurate in the future.
Cutting-edge data science
I’m intrigued by Kaggle, a global community of data scientists that solves data problems in a competitive framework. Predictive analytics is by definition an inexact science. Based on the idea that there are multiple ways to develop a predictive model, Kaggle uses crowdsourcing and gamification to inspire “players” to work on specific problems. Scored and ranked throughout the competition, players have the opportunity to continuously improve their models. Ultimately the most effective one is developed by the winning player. Various levels of reward serve as incentives.
Back in 2012, Harvard Business Review* magazine called the data scientist the sexiest job of the twenty-first century. As companies seek to implement predictive analytics and big data solutions across their business, IT organizations are finding that there is a skill set shortage for data scientists. Competitions like March Machine Learning Mania serve to stimulate these skills, which will be critical for future business applications.
The Kaggle draws community members from more than 100 countries and 200 universities who hone their skills playing in sponsored competitions. They are computer scientists, statisticians, data scientists, mathematicians, and physicists. Newbies are ranked by Kaggle as novices (just getting started with competitions). The widest group is classified as “Kagglers” (active competition participants). Top players achieve the rank of Masters by consistently providing stellar competition results. Like a top-ranking basketball player, these guys have skills.
Kaggle offers companies a way to solve problems with analytics, especially organizations without in-house data science resources. But large companies are using the Kaggle platform to solve business problems as well. For example, Allstate is currently running the Purchase Prediction Challenge, and Walmart is running a Store Sales Forecasting competition as a recruitment tool for filling its data scientist positions.
Predictive analytics meets men’s basketball
I’m excited about how the March Machine Learning Mania competition will turn out and what the results can tell us about the Kaggle platform as a model for effective IT-based data analytics for business.
Here’s what we know so far: As of March 15, the last day to submit for the first stage of the March Machine Mania competition, Kaggle had received 1,866 submissions. Some 255 teams with 342 players developed analytics models to predict the NCAA tournament outcome based on historical data from the last five tournaments. A dynamic leaderboard ranking listed the final top 10 teams based on how effectively the submissions predict historical tournament outcomes. Congratulations to these awesome teams!
The second half of the game
This week kicked off the second stage of the competition—using the models to predict the outcome for the 2014 tournament. The solution file from the first stage plus data for the regular 2014 season results were provided to players to refine their submissions. New entries (not in the first-stage competition) were also allowed. Kaggle accepted entries from March 17 to March 19. Teams will be ranked against the tournament results as it progresses. I expect to see the leaderboards change as winning teams progress through the brackets.
Tune into the Big Data Dance
This year, the NCAA tournament is more than fast plays, surprising upsets, and Cinderella teams. It’s a chance to see data analytics in action. While you’re watching the real games during the “Big Dance,” I’ll be right there with you, watching Kaggle’s “Big Data Dance” to see how predictive analytics play out on the basketball court.
What’s your take on crowdsourcing big data analytics? I welcome your comments along with more intuitive predictions on the NCAA tournament.