Extending DRAM with BucketCache in HBase, learn to love low latency Intel SSDs


DRAM, and DRAM managed by Java (the java heap) could always be exhausted by large HBase database instances, especially in the Big Data realm where large data is being sourced via Hadoop's HDFS.  Apache* HBase,  the NoSQL database of the Hadoop  ecosystem has a few ways, to extend DRAM. One of them is to use a L1 cache, it is called the Java LRU Map implementation and this is a DRAM-based cache. Your next foray should be learn more about the BlockCache API  and how do I implement something called the BucketCache. So next orient yourself with this blog, Block Cache 101 to get a feel for what is the BlockCache API. From there you will find an actual showdown and testing report, on the various solutions in Nick’s follow-on blog. The BlockCache showdown. You should also orient yourself with the BucketCache API which can run on heap in Java (DRAM), or off heap (also in DRAM), or in a file-based mode (like in a block device, such as an PCIe, NVMe-based SSD, SATA SSD, or HDD) via the HBase Reference Guide: https://hbase.apache.org/book.html.

So now that you have some orientation and an HBase developer’s view on the different modes of implementing the BlockCache, and you have also learned that larger data need implementations benefit from the BucketCache implementation, let’s look at Intel’s storage comparison data of BucketCache in file mode. Intel’s Software Labs did a study that actually used three types of storage media as the BucketCache storage. This allows for TB’s of storage that could be HBase accessible, depending on your architectural goals, but clearly the size limits of low latency SSDs are far better than say 256GB of DRAM on a standard server for data serving. HBase is flexible enough to let you level these caches as your needs requires. Intel tested a hard disk drive (HDD), a Intel-based SATA SSD and the Intel Data Center Family for PCIe SSD, P3700 drive in each of the 3 threading load scenarios, of 10, 25 and 50 threads of activity.  What they found was the following overall performance for the BucketCache in file mode (using a standard filesystem not a DRAM-based tmpfs):

The PCIe SSD outperformed the HDD from 81x to 189x.

The PCIe SSD even beat the SATA SSD with performance ranging from 4x to 11x, which is pretty impressive, for an SSD of a new interface type (namely NVMe), beat another SSD by this margin using database software.

We encourage you to read the entire report to see what larger cache footprints on better latency SSDs can do to let you extend your database over the BlockCache API in HBase.