Chances are, you’ve already checked your email several times today. And you’ve probably sent 10 replies, or more. Email is ubiquitous in the business world—so much so that it is becoming a productivity sponge, soaking up employees’ time and energy. At Intel, it’s no different, and our CIO wanted to reduce email usage at Intel. The problem was that we had hundreds of unprocessed, unanalyzed email log files—we had no way to show the CIO what Intel’s email usage really was, let alone understand how to reduce it.
To solve this problem we created an email analytics solution in just two months. The solution is based entirely on the Apache Hadoop* ecosystem: Cloudera Distribution of Apache Hadoop plus Hive*, Pig*, Impala*, and Sqoop*. The solution includes front-end reporting with visualizations and dashboards that provide access to eight metrics. Using the email analytics solution, we took the first step in driving down email usage at Intel: establish a baseline and bring Intel’s email usage to light for the first time.
Aside from the technical details of structuring data from the email servers and integrating that data with the tools, several other aspects of the project were interesting.
Maintaining employee privacy was a high priority. Intel is strongly committed to Privacy by Design—a framework that takes privacy into account and builds in protections at each phase of a product or service development process. Therefore our first step, even before doing a proof of concept, was to work with Intel’s Human Resources and Legal departments to develop a Privacy Plan. For example, the plan stipulated that we could analyze only email header information; we could not look at email content or the subject line, and we could not disclose employee ID numbers. Header information includes the sender and receiver’s names, the date, the file size, the email server name and a few other bits of data.
Intel is a global corporation, with offices all over the world. Privacy laws differ by region, so we had to take that into account during the project. For example, European privacy regulations are quite different from those in the United States. Because Europe accounts for only about 20 percent of Intel’s email traffic, we decided to exclude European email from our analysis, because it would have taken more time than it was worth to investigate the regulations.
Data aggregation was another way we protected privacy. We could not disclose the data if there were less than 100 employees. So we excluded small offices from the study.
From a purely logistical standpoint, the project required creative approaches to staffing. During the proof of concept, we had no funding, because the project was not on the list of budgeted projects. I pulled in a few volunteers (including myself). We thought “hey, this is good thing for Intel, and it’s a great chance to learn new technology.” Our email messaging architect was in India, so scheduling meetings across geos and time zones was sometimes challenging.
Despite these technical and logistical hurdles, the results from our project had business value. We obtained funding for going to production, and email analytic jobs are now running every day. I have presented the project and its results at a few internal roadshows and on several internal venues, such as our Big Data forum and Analytics forum. It has generated quite a bit of interest, with application developers asking questions about privacy and insights we gained.
Speaking of insights, we intend to expand the project. We want to connect email and social media, and further explore email analytics, such as studying what job roles use more email than others. We’re hoping that by further exploring the impact of email on business, we can better collocate infrastructure for better performance and also work with business groups and teams to help them reduce email usage.
I’d be interested in hearing from other IT professionals about the impact of email on business, email analytics, and other related topics. Share your expertise and experiences by leaving a comment below. Or, if you have a question about our email study and future plans, I’d be happy to answer it. Join the conversation!