I recently returned from Santa Clara and the Feb 26 – 28 Strata Conference. I always enjoy Strata—the conference does a good job of balancing the contributions of longtime industry leaders, academics, and up-and-coming companies bursting with fresh ideas. There’s a fresh give-and-take dynamic, much different than business-based trade shows where the focus is just on sales.
At this year’s Strata, announcements about new distributions of Apache Hadoop* came fast and furious. In fact, there was so much activity around Hadoop that the conference was like a land rush – the Wild West of Big Data.
There were a number of major kickoffs.
- EMC Greenplum launched a new Hadoop distribution called Pivotal HD* that operates in conjunction with HAWQ*, a new analytics and SQL processing tool. With optimized data availability and processing speeds, these new Greenplum releases seem aimed right at the bow of Impala*, the new insights tool from Cloudera.
- HP announced HP ArcSight/Hadoop Integration Utility*, a platform that speeds the processing of Big Data raw security data to provide a more complete view into events and behavior patterns, and to more quickly identify security attack trends.
- Hortonworks announced Hadoop distribution integration with Windows Azure*, which brings Hadoop to the Microsoft cloud environment
- And of course, the Intel® Distribution of Apache Hadoop* was also announced, with integration for SAP HANA, the in-memory database.
In this flurry of announcements, how does one distribution stand out from the others? In the case of Intel, it’s all about a comprehensive platform that features baked-in optimization for Hadoop, from Intel® Xeon® processors at the heart of the enterprise data center to embedded Intel® Atom™ processor-powered devices and sensors capturing data out in the wild. With a single, massively-integrated, edge-to-edge platform, Intel delivers manageability and performance that scales to the horizons of Big Data. With AES-NI encryption-decryption securing data transmission with virtually no performance hit, and 10GbE networking performance, the Intel Distribution of Apache Hadoop extends Big Data analytics across multiple large data centers, and across distributed physical geographies. The Intel platform is built from the silicon up for performance and security.
Another key differentiator for Intel Distribution for Apache Hadoop is our continuing commitment to Open Source development; case and point, the Rhino project. After that flurry of announcements of new Hadoop distros, you can’t blame companies that have devoted themselves to Open Source and Hadoop for years for wondering about some of the players suddenly getting into the game. Like I said, it’s the Wild West of Big Data, and everyone is dreaming of striking it rich. But will other players remain true to the code, to the Open Source project at the heart of Hadoop?
For some, that remains to be seen. However, Intel pledges to continue to work with and give back to the Open Source community. We’ve been a top contributor to Open Source for over 15 years, and our distribution of Hadoop doesn’t change that commitment. We won’t branch the code, and we’ll continue to advocate for open standards.
That’s another way that we stand tall and set ourselves apart from the competition.
At the Intel booth, we had several technical discussions with our staff engineers including:
Overview of Intel® Distribution for Apache Hadoop*
SAP HANA and Intel® Distribution Interoperability
Project Rhino for Security
Project Panthera for Analytic SQL Engine on Apache Hadoop*
Intel® Active Tuner
HiveQL with our partner Simba
Intel® GraphBuilder for Apache Hadoop*
Heck, I even got in the action with Tim’s Video
To learn more about the Intel Distribution of Apache Hadoop or to get the 90-day trial, visit http://hadoop.intel.com.
For more big data and analytics updates follow Tim on Twitter @TimIntel.