Some Big Data Needs Big Memory (and Big Processing)

In my last post, I talked about Oracle’s Big Data appliance, and the broad database landscape. As promised, this week I want to discuss data management. Moving on to in-memory...

Since its introduction in 2009, many wrote extensively on the subject of Oracle's Exadata appliance. I won't cover that ground here. But what makes Exadata interesting in the context of this post is the way Oracle positioned it at the center of Oracle's data management universe.

If you've got Big Data problems, then the starting point for that universe is the previously mentioned Oracle appliance, which can feed the needles from the Hadoop haystack directly into Exadata at full 40Gbit InfiniBand-driven line speed.

Once those needles are in Exadata, you can perform data warehousing queries against that data to your heart's content. Thanks to Oracle's use of E5-series Xeon processor-based servers throughout the parallel cluster of storage cells in Exadata, those queries are likely to run pretty quickly.

But what if even Exadata can't run the queries quickly enough?  What if you need near-instantaneous response time against large datasets with complex queries and heaps of simultaneous users?

That's where the other major new product announcement from OpenWorld comes in: Exalytics.  This appliance combines a 4-socket E7-series Xeon processor-based server, configured with a full Terabyte of DRAM, with new Oracle-authored software for in-memory analytics.  That software derives from the integration of the classic Times Ten in-memory database engine, which until now was focused primarily on OLTP, with the Oracle BI engine (formerly known as ESSBase), along with a sprinkling of columnar-orientation and associated columnar deduplication.

If you watched the development of SAP's HANA in-memory database appliance over the last year, then you know what that all means.  What it competitively entails is that HANA is no longer the only enterprise-grade in-memory analytics alternative out there.

Competition is a good thing, especially when the competitors run on a common underlying platform, as these two (HANA and Exalytics) do.

I found one aspect of Exalytics very intriguing.  Larry Ellison called the feature 'Adaptive Heuristic In-Memory Analytics'.   Thanks to its intimate linkage with Exadata, an Exalytics appliance can successfully deal with working sets that exceed the physical memory capacity of an Exalytics box.  Queries won't run as fast when the working set exceeds physical memory capacity, since physical I/O to the Exadata machine will be needed to fetch the excess. However, that fetching happens at a very high data rate, thanks to the high-speed linkages between Exalytics and Exadata and the parallel nature of Exadata's I/O subsystem.

If Oracle delivers on these OpenWorld promises, then their customers will have the option of using a one-stop-shopping approach to acquiring a highly integrated set of tools for Big Data analysis, high-speed extract and load into the Exadata engine for massively parallel data warehousing (and OLTP), and near-real-time in-memory analytics using Exalytics.

If it all works as described, I think it will be very impressive. However, Oracle isn't the only game in town. Their competitors aren't standing still. Next time, I'll continue the discussion with my take on the recent announcements of IBM and Microsoft.

What do you think of Oracle's announcements?  Do you see an application for them in your shop?  Are you concerned at all about the appliance-only nature of the delivery mechanism for these technologies?  Respond to this post and let's get the conversation started!

P.S. - You may have read some of the coverage about Oracle's announcement of the new SPARC T4 processor and the SPARC SuperCluster T4-4 platform that's based on it.  To me, the most interesting thing about the SuperCluster product is the way that it takes advantage of Xeon processor-based subsystem elements.

The use of T4 processors on the database machine nodes is secondary. The important thing is that both the ZFS server and the storage cell machines in the SuperCluster are based on E5-series Xeon processors.  The performance gains that Oracle describes for the SuperCluster are due almost entirely to the system architecture improvements that the Exadata approach pioneered, and to the processing acceleration delivered by the Xeon processor-based subsystems -- not the T4 processor itself.

I hope Oracle publishes a head-to-head comparison of the performance and price performance of an Exadata system vs. a SuperCluster system.  I'm confident that the all-Xeon-based Exadata system will win on every conceivable measure. I don't expect such a result to be published, do you?