Can We Outrun/Outcompute the Data Avalanche?

These days it seems I’m overrun with data, not just my professional life but my personal life too. The stress of it is affecting my quality of life and not in a good way. Like Sisyphus I wake each day and go to battle with it knowing that the data is growing every second of every day and by the end of the day I’ve made no overall headway. I battle with massive data because I know there’s a benefit to be gained. New insight, a fresh way of thinking that can spawn a breakthrough, a more accurate computer model to better predict with, basically better decision making. I’m motivated by the potential but fatigued by how ineffective I am coping with the growing data volume. Sound familiar?

The data growth we’ve experienced in the last decade has changed the world. The tools and methodologies for processing data are increasingly inefficient as data size grows and soon will reach the point of being ineffective. I feel a big change is necessary if we’re going to outrun this data avalanche, innovation that departs from the compute-centric approach of the last decade.

I know large research centers operating at petascale levels today can already generate more data than they can effectively store and manage, pressuring researchers to visualize and analyze the data in real-time and foregoing any ability to archive their work or do future analysis without re-computing the solution again.

Let’s try and think differently. The point of Petascale computing, Cloud, or Virtualization is the data or information being produced or more importantly the eventual insight to be gained. If we can’t feed the processors fast enough or effectively analyze the results then we’ve missed the mark. Keep in mind we’re talking about a data size problem that is far out pacing Moore’s law.

This end-to-end processing problem is a “Data Intensive Computing” problem. One we should focus on sooner rather than later.

More to come...