Can We Outrun/Outcompute the Data Avalanche? (Part 2)


What is “Data Intensive Computing”? Abstractly, think of the quest to make better decisions out of big data as a filtering or funneling process. Basically the process of taking that huge amount of data and filtering and transforming it down to the right piece(s) of information that you need to make what amounts to a one-bit decision: yes or no, buy or sell, live or die. Data Intensive Computing is this end-to-end process.

I like the phrase “Data Intensive Computing” because it better describes the problem as being data limited rather than the compute limited. Data intensive computing is about efficiently taking vast amounts of structured or unstructured data through a series of processing steps to transform raw data into intelligence, knowledge, experience, and ultimately better decision making. In today’s world, we ideally want this filtering process to occur in sub-seconds, or real time.

To do this in an efficient and scalable way we have to focus on the data itself, transforming and filtering it as quickly as possible, minimizing the movement of data when it is large, and only transporting it in some compressed or size reduced form.

  • Here are some key attributes that should be present in Data Intensive Computing architectures:
  • Fewer more powerful discreet components (CPU, GPU, etc…)
  • Utilize discreet components that have high I/O communication bandwidth
  • Consolidation of application workflows around data (move processing to data not data to processing)
  • Minimizing time-stealing data movement (especially when data is large)
  • Centralize rendering of data for visual analysis (move rendering to data not data to rendering)
  • Delivering visual (size reduced) representations of data to users, rather than the data itself

My next and last Blog post on this topic will be less abstract and more about real world “data intensive” applications.