When I heard that Andy Grove died, I immediately thought back to the one direct interaction that I had with him. Early in my Intel career, I worked in a research group as a systems administrator handling an e-mail gateway to CSNET, a network that was a key milestone to the development of the global Internet. We set up the connection as a way for our own researchers to connect via e-mail to other corporate and academic researchers. At an internal review session describing our projects, Andy Grove asked me, “so what are people doing with it?” I remember that I really couldn’t give him a good answer to what was a perfectly valid and logical question, and I was in an uncomfortable situation if you know anything about Andy Grove. But it did help drive me to think about e-mail analytics. What we implemented then in a primitive way (compared to how we do it today) could address Andy’s question.
In the early days of Internet connectivity through CSNET, e-mail was the major form of communication. Our experience started in early 1987 at a few hundred e-mail messages a week. Over the first eight years of connectivity, Internet e-mail traffic grew exponentially. With that growth came challenges in analyzing that e-mail traffic. Our initial analysis listed the top senders and receivers both to and from the Internet in terms of total messages and bytes. We also flagged very large messages going in and out.
A simple PERL script was once able to perform all of Internet e-mail analysis. As the number of messages began to grow dramatically, we ended up generating summaries on a daily basis and then collating those into a weekly format. Now, when Intel analyzes its internal e-mail, Big Data techniques are needed to deal with the sheer scale of data available (our analysis infrastructure shown below). Another difference from the early Internet days regards privacy. We once simply analyzed mail logs and listed top users. Now with worldwide privacy requirements, analysis of things like mail logs and other usage logs that contain Personally Identifiable Information (PII) must to be carefully managed to ensure compliance with approved privacy plans and policies.
Looking back to Andy Grove’s question, what were people doing with e-mail? Back then, mail-based discussion lists, a primitive predecessor to today’s social media, were a major generator of e-mail. In Intel’s e-mail world today, messages from automated processes and software make up a significant portion of the volume. That information and other findings from the use of Big Data analytic techniques on e-mail offer a way to reduce the e-mail burden on Intel employees, and I am sure that other organizations could benefit from a similar approach.
For more information, see the following: