Finding Relationships in Data to Drive Business Results

Picture this: You’re on Facebook (or the social network of your choice) and the algorithm recommends a friend for you – someone like your best friend’s sister’s co-worker, Bob Smith. And you actually do know Bob from that party you went to one time. How did Facebook know to make that friend suggestion? How did it know you both enjoyed similar movies? Most likely by using graph analytics, a technique for sorting through large amounts of data to visualize meaningful relationships in a network. Analyzing the data helps scientists draw conclusions from the patterns in the graphs.

Let’s Take a Step Back

Graphs are mathematical structures used to model types of relationships and consist of Nodes and Edges. Nodes are things like people, businesses, places, devices, bank accounts, or any data point we might want to track. Edges are the lines that connect the Nodes, and can be anything from number of phone calls to payment details to likes on social media. The Edges can also represent the strength of the connection (I call you many times a week, you buy a lot of merchandise from an online retailer) as well as the direction. An edge can be one-directional (I follow you on Twitter), two-directional (we follow each other on Twitter), or non-directional (a taxonomy of the animal kingdom). Graphs become interesting when investigating the Edges between Nodes, as well as the weights between them. By applying analytical techniques to the graphs, patterns begin to emerge like sub-communities on social networks.

Most enterprises today understand the business imperative to capture data, and they’re gathering terabytes of it. Today, some of the largest cloud vendors are dealing with petabytes of new data every day. Most advanced enterprises are running some kind of predictive analytics internally – if a customer bought our widget on this date, when can we expect them to buy another? This kind of relational “one-to-one” and “one-to-many” structured data analysis offers insight into purchasing behavior, and can be done with very simple analysis now. But what about when we start combining that data with things like sensor data, product search histories, and social media activity? This is where graph analytics, or the “many-to-many” nature of connections, will help make the mappings clear between sparse data sets. Understanding the relationship (or patterns) between different data sets can help data scientists form a more complete picture for the question they are seeking to answer.

Enterprise Success with Graph Analytics

Graph analytics can be utilized with great success in the enterprise – one widespread example is Google’s PageRank implementation. The most obvious commercial use example of these analytics in social networks is for targeted advertising. Graph analytics can be used to identify a set of influencers within a social network, enabling enterprises to customize advertising and promotions for a specific audience. In the healthcare industry, graph analytics makes it possible to identify potential fraud cases or in the analysis of genes and proteins. In the financial sector, graph analytics show fraud, and insider trading. Graph analytics can also be utilized to identify hacking attempts by looking at the network of connectivity and transaction patterns. The industries where graph analytics can be applied are limitless, including manufacturing (supply chain analysis), oil and gas (optimizing energy production), healthcare (patient outcomes), financial services (money laundering), and many more. In the future, we anticipate that a significant portion of machine learning will be sparse data analyzed via graph analytics combined with current dense computation around recognition and classification.

Looking at a more technically deep example of graph analytics operation, graph isomorphism, is illustrative of decades of research work in mathematics and statistics. Graph isomorphism is the test to see if two graphs have the same basic shape, no matter how very different they may or may not appear when viewed from different positions. Sub-graph isomorphism checks to see if subset regions of one graph are isomorphic to another graph. Why is isomorphism interesting? Isomorphism is used to check for similarities in structures – such as genomics, chemistry, or even the automated design and placement of circuits. A more common application is to study social networks, identifying cliques and using those for tailored recommendations and experiences. Isomorphism is also a basis of artificial intelligence, since it is part of the pattern matching problem.

The HIVE Program

It’s clear that graph analytics is an expanding field with many potential applications. The Defense Advanced Research Projects Agency (DARPA), recently announced the HIVE program to advance graph analytics. The goal is to build a graph analytics processor that can process streaming graphs 1,000X better than today’s best-in-class technology, measured in performance-per-Watt. This 1,000X gain is to be realized in a demonstration platform by a target date of mid-2021, enabling large scale sparse data analysis in real time.

Intel has been selected as one architecture path for the HIVE project. We’ll be working to design a set of controllers for memory and network operations, coupled with fully programmable compute engines that are optimized for data off-load functionality, to handle Graph Analytics and similar problems. The primary group leading the investigation is part of DCG’s Innovation, Pathfinding and Architecture Group, led by VP Dhiraj Mallick.

Dhiraj explains, “We [Intel] anticipate that Graph Analytics will be a significant portion of Artificial Intelligence. It allows data scientists to answer their questions by studying the relationship between different datasets.” He adds, “Graph Analytics will run on sparse data sets. In order to process these large graphs in a timely manner, we need to rethink the architecture, and the HIVE program allows us to deliver on this need.”

Intel Federal was instrumental in partnering on the proposal, and the work is being carried out across Intel, including the Platform Engineering Group (PEG) and Intel Labs. While PEG will be working with the HIVE team on controllers and feature integration into mainstream products, investigations into algorithms and software impacts will be led by the Intel Labs group Software and Systems Research, led by Intel Fellow Rich Uhlig.

According to Dr. Uhlig, “We know from computing history that when you deliver orders-of-magnitude leaps in performance, algorithms can suddenly begin to deliver significantly more meaningful and valuable insights into massive datasets.  We think that this project will get us there.”

We’re excited to partner with DARPA on this project to advance the field of graph analytics. Intel has a strong portfolio for analytics and artificial intelligence and accelerating the field of graph analytics will allow us to help businesses gain an even faster time to insight. To learn more about the HIVE DARPA program, visit http://graphchallenge.mit.edu/darpa-hive, and to learn more about Intel’s work in analytics and machine learning, visit www.intel.com/analytics and www.intel.com/machinelearning.