Will Hadoop replace traditional Data Warehousing?

If you’ve read my previous blog, you know that I’m a big fan of using Google Trends to track the popularity of terms. Comparing the terms “data warehouse” and “hadoop;" shows the latter handily beating the former since mid-2010. But, both of them are crushed in a comparison to “database;" it still has a long way to go before it completely supplants older data storage technologies.

And for the short term, why should it? I work for a Fortune 100 company, and like most large enterprises, we have a considerable number of traditional business intelligence systems, with dozens of data warehouses and even more data marts for specific applications. The cost of converting all of these systems to hadoop would be hard to justify.

For small and medium sized businesses (SMBs) and start-ups, it might be worthwhile to move directly into using Hadoop (and other big data technologies) as they don’t have the cost of converting legacy infrastructure. I’ll have more to say about SMBs and Big Data in a future blog post.

Looking at the medium-long term (3-5 years) the situation completely changes. James Kobielus, back in 2011 and working for Forrester, said “Enterprises are moving rapidly toward theEDW as the hubfor all advanced analytics.” While there is no arguing against the notion that there is considerable hype around Big Data, I think the focus on using analytics to derive value from data is real, and accelerating this trend.

Vendors are recognizing the opportunity of Hadoop to be the basis of the next-generation of data warehousing — Cloudera and MapR have recently released Hadoop-based systems to support enterprise search. Both are trumpeting the cost differential versus traditional data warehousing systems, which are listed as $1,000 / Tb compared to $20,000, or more.

So, what does this all mean? I think this supports the idea that, particularly when starting a new Big Data effort with a new data source, companies should get all the data. Spending a lot of time considering which data to store and making sure to exhaustively curate it just doesn’t make much sense. The key to getting value from data is not just storing it but finding patterns and acting on them. The sooner you apply analytics to the data, the better.

Michael Cavaretta is a Data Scientist and Manager at Ford Motor Company. He is a leader for the Predictive Analytics group in Research and Advanced Engineering.

Check out his previous posts and discussions.