Big Data, Hadoop, Real-Time Analytics, and In-Memory – It’s Anybody & Everybody’s Game

It’s clear that the two biggest buzzwords of 2012 are   “Hadoop” and “Big Data.” (Maybe even buzzier than “cloud”)  I keep hearing that they are taking over the world, and that relational databases are so, “yesterday.”

I have no disagreement with the explosion of unstructured data.  I also think that open source capability like Hadoop is allowing for unprecedented levels of innovation.  I just think it’s quite a leap to decide that all of the various existing analytic tools are dead, and that no one is doing predictive or real time analytics today.  Likewise, the death of scale up configurations is widely exaggerated.

First, I’d like to offer that the notion that real time frequently involves utilizing in-memory analytics for response time. Today, and for the near term, that often implies larger SMP type configurations such as those employed by SAP HANA.  Many think that the evolution to next generation NVRAM server memory technology will redefine server configurations as we know them today, particularly if they succeed competing with DRAM and enabling much larger memory configurations at much lower price points. . This could revolutionize how servers that handle data are configured both from the amount of memory and its non-volatility.

Second, precisely because Hadoop is Open Source, it makes sense that existing analytics suppliers are moving to incorporate many of its key features into existing products, such as Greenplum has done with MapReduce. Further, key players like Oracle are now offering Big Data appliances embracing both Hadoop and NoSQL, separately or in conjunction with other offerings.  Even IBM, with arguably the best analytics portfolio in existence today, offers InfoSphere BigInsights Enterprise Edition, which delivers Hadoop (with HDFS and MapReduce) integrated with popular offerings such as Netezza and DB2.  Predictive analytics also exist from companies like SAS in addition to Bayesian modeling from specialty providers.

Now, on one level this is capitalism at its best, allowing for the integration of open source with existing (for a price) products, while taking full advantage of open innovation.  On another level, it is an acknowledgement that unless you are a pure internet company (a la Facebook, Google, YouTube), you have a variety of data that might also originate in the physical world, and occasionally would like to connect to actual transactional data.  It also acknowledges that predictive analytics exist today, and the capability can be adapted and applied to unstructured data, while adding installation, management, security and support, and connecting to other warehouses and databases.

I think it’s extremely premature to categorize the Hadoop world as separate from everything that has gone before it.  It is naïve to believe that expertise previously developed has no value and that existing suppliers won’t evolve to integrate and combine the best of breed solutions incorporating all data types. Likewise configurations will continue to evolve with new capabilities.

I am personally thrilled to see the excitement and energy that is driving new data types, real time and predictive analytics, and automation.  I can only hope that credit card fraud detection becomes much more sophisticated!

Have something to say or share?  Feel free to find me on Twitter @panist or comment below.