Maximizing the ROI of Cybersecurity Data

Intel IT is continuously improving cybersecurity at Intel, as described in the IT@Intel white paper, “Transforming Intel’s Security Posture with Innovations in Data Intelligence.” Our new Cyber Intelligence Platform (CIP) ingests data from hundreds of sources into Splunk Enterprise*, creating a 10-PB data lake. We source data from firewalls, network connections, endpoint devices, and other enterprise tools to build a context-rich set of data. To achieve the highest return on investment (ROI) from our CIP, we launched a Cybersecurity Data Insights solution that measures the value of each data source.

Cybersecurity Data Insights uses Splunk’s built-in audit logs, query language, and visualization tools to show how data flows in, is processed, and then surfaces in query results, either as a notable event or in one of many reports or dashboards. Beyond helping rationalize the business value of a particular sourcetype, Cybersecurity Data Insights also helps us monitor data integrity (is the time stamp valid?), data availability (is the expected data there, and if not, why?), and data relationships (if this data changes, what else is affected?).

 The Challenge – Not all Data Is Valuable

In general, we extract value from data by searching and reporting. For example, we can examine firewall traffic to see how much data is going from one place to another. Or, we can detect that a laptop is reaching out to questionable sites on the internet at odd hours—so maybe it has a virus. For our Splunk Enterprise Security* users to gain such insights, it’s much more complicated than simply typing “show me the viruses.”

First, the user must formulate a query using Splunk’s query language on the Splunk Enterprise Security Search Head (ESSH), which then applies search macros, event types, and tags to generate an expanded query. The expanded query is then sent to the Splunk indexers, which make up the data lake. The indexers search through the raw data and/or data models for relevant events and send the data back to the ESSH, which adds reference data, such as the last time a laptop was patched. Then Splunk Enterprise Security presents the query results to the user.

Although the user performs queries and the results are readily available, they don’t often contain the information we need to correlate the user query with the data sourcetype(s) used to generate the result—it’s sort of a black box. This is a problem. We need to know which sourcetypes are being used and how often, to justify continued or additional investment.

The Solution – Visualizing the Data’s Value

Our Cybersecurity Data Insights solution helps solve this problem. We created a set of metadata and tracking mechanisms that go beyond Splunk’s out-of-the-box mapping and tracking capabilities. Cybersecurity Data Insights uses these extra components to collect the information from the expanded query and the data models to determine which sourcetypes were used in the search. We use Splunk’s native visualization tools to display this information. For example, in the following diagram, the data sourcetypes are listed on the left (WinEventLog Security* and Microsoft: Authentication). The Indexing portion is the data lake and the Data Model portion is where all the data is combined. The right side shows all the queries, reports, and dashboards that use the data from the sourcetype. In this example, many queries and reports use Microsoft: Authentication and WinEventLog Security data, though many of the messages are barely used and only a handful are used in the data models. We filter out low-value messages, so only a subset is ingested into the Splunk Enterprise data lake. This lowers our costs without negatively affecting our cybersecurity mission.

It is important to note, however, that ROI calculation is not always simply about how many searches use a particular sourcetype, although it is a good indicator that the data is being used. Some data may not be used on a daily basis, but its availability may be critical in the event of a serious incident. Another consideration is that generating this additional metadata is computationally intense and requires running extra searches. Our underlying infrastructure based on Intel® Xeon® Platinum processors and Intel® SSD storage is up for the task.


Using Intel IT’s new CIP is a transformational way to tackle cybersecurity operations. We were struggling under the weight of a growing number of legacy cybersecurity tools and incredibly large volumes of data. The amount of data will likely continue to grow as we support more users, add more use cases, and run more queries. But with our Cybersecurity Data Insights solution, the results are no longer coming from a black box. Now, we continuously manage our data portfolio and measure the ROI of our data investment.

Read the white paper, “Transforming Intel’s Security Posture with Innovations in Data Intelligence” for more information.

To learn more about how you can deploy your own Cyber Intelligence Platform, download the solution brief.

Published on Categories SecurityTags , , ,

About Jerome Swanson

Jerome Swanson, Data Scientist at Intel, started his career in data analysis after he graduated with a bachelor's degree in Physics and worked as a spacecraft engineer for EchoStar Corporation. There, he helped develop a telemetry analysis tool, additional visualizations, and became familiar with the log analysis software, Splunk. Jerome then received the Splunk Revolution Award. He expanded his experience in the Cyber Threat Analytics space as a Splunk expert, and joined Intel in 2018 as the lead for Splunk Enterprise Security. In his free time, Jerome enjoys hiking the beautiful Oregon wild places.