Ritu Kama is the Director of Product Management for Big Data at Intel. She has over 15 years of experience in building software solutions for enterprises. She has led Engineering, QA and Solution Delivery organizations within Datacenter Software Division for Security and Identity products. Last year she led the Product and Program management responsibilities for Intelâ€™s Distribution of Hadoop and Big Data solutions. Prior to joining Intel, she led technical and architecture teams at IBM and Ascom. She has a MBA degree from University of Chicago and a Bachelorâ€™s degree in Computer Science.
SQL is still a pre-dominant mechanism to query data from data storage systems like data warehouses and databases. More and more enterprises are moving to Hadoop because of its architecture that lets them store and process enormously large volumes of data. With SQL available on Hadoop allows allthis data to be available to a much larger audience.
Intel announced Project Gryphon, an initiative that allows developers to deploy SQL applications on top of Hadoop. This project is a natural extension of Project Panthera which supports SQL features on Hadoop (see https://github.com/intel-hadoop/project-panthera-ase). Project Gryphon is our open source efforts to provide efficient support of standard SQL features on Hadoop. Gryphon will be implemented in three phases and will be integrated into the IntelÂ® Distribution for Apache Hadoop* (IDH).
The current approaches for enabling SQL on Hadoop provide an incomplete coverage of SQL or they sacrifice openness for performance. The Hive query language only accepts only a small subset of SQL commands and the Hive data warehouses aren't optimized for queries with low latency. Additionally, enterprises that use Hadoop typically require open-source solutions with real time performance of SQL queries.
Gryphon provides full access to the SQL92 revision of SQL for online analytical processing applications with a back-end based on Project Panthera. It offers low-latency queries on the HBase along with a more efficient storage engine and uses HBase to provide real-time SQL using HBase and Hive query optimizations.
Gryphon will be implemented in three phases:
Phase I of Project Gryphon is available in a alpha version as of June 2013. It uses Hive as the back-end to obtain 100 percent compliance with TPC-H queries. Phase I offers improved performance by using HBase to get real time responses to user queries.
Phase II will be available as a proof-of-concept in October 2013 and will have all of the capabilities of Phase I. It will also add performance gains with container caching, query caching and job-execution optimizations. Phase II will be supported on Hadoop 2.x.x by integrating YARN in the distribution.
Phase III will be available as a POC in December 2013 and will have all of the capabilities of Phase II. In addition, there will be specific enhancements for providing real time performance for data stored on Hadoop and HBase along with complete compliance with TPC-H.
IDH is an open platform. It's based on the open source distribution of Hadoop and is additionally optimized for Intel hardware such as Xeon processors, solid-state drives and 10GbE networks. Additional benefits of this distribution include improved performance by a factor of 20, partner analytics, automatic configuration and encryption. To learn more, please check out hadoop.intel.com/resources
Over the next few weeks and months I will provide more updates on Gryphon and the other areas where Intel is supporting the Hadoop open source community. I will share where we are working to solve key problems in important areas.