Automating Big Data Analytics

Big data analytics can find your most profitable customers, worst employees and best suppliers. Management is enthusiastic about getting some value out of this new technology. But wait. Do you really understand what you’ve learned? Even if you do, have you considered how to operationalize what you’ve learned?

Michael Schrage, in “Big Data’s Dangerous New Era of Discrimination” at the Harvard Review Blog, writes about some of the potential pitfalls of making ‘discriminating’ decisions based on analytics. Companies could find less profitable segments and work to make them more profitable, but it might be easier to just incentivize less profitable customers to go somewhere else.

Be careful, as governments can take a dim view of unequal or unfair treatment of customers. Consider the example of St George’s Hospital Medical School. With years of historical examples of admissions, the school decided to use a model to automate the first round of the process. Ultimately, it was discovered that the model showed a bias against females and non-Europeans.

So what do you do? One recommendation is to think carefully about what your customers, employees and/or suppliers would think if they find out. Make sure to plan for keeping humans in the loop. Before operationalizing a model, plan for people to review some portion of the decisions made by the model. (This can be done offline if the model makes decisions in real-time.) A review provides an opportunity to continuously check the accuracy of the model as well as provide a training set of data to update the model when appropriate. A key point in reviewing model results is to use a random sample of data. Take the medical school example. If only the applicants that made it past the first round were reviewed, you would never see applicants that should have made it to the second round, but were rejected by the model.

As big data analytics is used to automate more decisions, remember to keep the human in the loop.

Does anyone have any other suggestions for ways to monitor models?

Michael Cavaretta is a Data Scientist and Manager at Ford Motor Company. He is a leader for the Predictive Analytics group in Research and Advanced Engineering.

Check out his previous posts and discussions.