Tags and Visibility labels – Per KV security

Anoop Sam John - Senior Software Engineer at Intel in the Big Data Platform Engineering group and HBase Committer. Prior experience as Platform Engineer in Huawei Technologies with Big Data and Cloud technologies.

Ramkrishna S Vasudevan – Senior Software Engineer at Intel in the Big Data Platform Engineering group, HBase Committer and PMC member.  Prior experience as Platform Engineer in Huawei Technologies with Big Data and Cloud technologies.

KeyValue Tags

Many of the use cases demands addition of meta data with every KV stored. Such use cases includes per cell ACLs, visibility expressions providing cell level security capabilities like that of Accumulo.

Adding the metadata along with the data part of the KV would be very complex and inefficient. 

Our team at Intel, worked at adding tags per cell, an arbitrary metadata that can be stored per cell. Each tag can be considered as a key value meta data where key is a tag identifier (say type) and value is the actual meta data.

The support for Tags is now integrated into HBase trunk and so will be available with next major release of HBase, 0.98. (See issue - HBASE-8496)

We have introduced a new HFile version (V3) to support the tags to be stored along with KVs

The tag portion of the KV would look like this

<tag length><type><tag value>

Each Tag will have a type (1 byte) and the actual tag value length is restricted to Short.

To use this tag feature one should set the HFile version as ‘3’ in hbase-site.xml file

<property >




Tags can be added to KVs while writing data to HBase. Use the following APIs in Put to add KVs with tags:

Put#add(byte[] family, byte [] qualifier, byte [] value, Tag[] tag)

Put#add(byte[] family, byte[] qualifier, long ts, byte[] value, Tag[] tag)

As mentioned above every Tag consists of a tag type byte and tag data. The type values 100-127 are reserved for internal use by HBase. Every KV can have 0 or more tags.

Dictionary based tag compression is also available to encode the tags that could be repeating in adjacent KVs.  Dictionary based tag compression on HFile are done when Data block encoding algorithm is enabled.  This option can be set using the following API in HColumnDescriptor

HColumnDescriptor#setCompressTags(boolean compressTags)

By default Tag compression is turned ON for every CF.

Tags added to the WAL can also be compressed using the Dictionary based technique.  This can be enabled using the property

<property >




This property defaults to true. Tag compression in WAL will be active only in case when we use WAL compression feature.

Visibility Labels

We have introduced a new coprocessor, the VisibilityController, which can be used on its own or in conjunction with HBase’s AccessController (responsible for ACL handling). The VisibilityController determines, based on label metadata stored in the cell tag and associated with a given subject, if the user is authorized to view the cell. The maximal set of labels granted to a user is managed by new shell commands getauths, setauths, and clearauths, and stored in a new HBase system table. Accumulo users will find the new HBase shell commands familiar.

When storing or mutating a cell, the HBase user can now add visibility expressions, using a backwards compatible extension to the HBase API. (By backwards compatible, we mean older servers will simply ignore the new cell metadata, as opposed to throw an exception or fail.)

Mutation#setCellVisibility(new CellVisibility(String labelExpession));

The visibility expression can contain labels joined with logical expressions ‘&’, ‘|’ and ‘!’. Also using ‘(‘, ‘)’ one can specify the precedence order. For example, consider the label set {confidential, secret, topsecret, probationary}, where the first three are sensitivity classifications and the last describes if an employee is probationary or not. If a cell is stored with this visibility expression:

( secret | topsecret ) & !probationary

Then any user associated with the secret or topsecret label will be able to view the cell, as long as the user is not also associated with the probationary label. Furthermore, any user only associated with the confidential label, whether probationary or not, will not see the cell or even know of its existence. Accumulo users will also find HBase visibility expressions familiar, but also providing a superset of boolean operators.

We build the user’s label set in the RPC context when a request is first received by the HBase RegionServer. How users are associated with labels is pluggable. The default plugin passes through labels specified in Authorizations added to the Get or Scan. This will also be familiar to Accumulo users.

Get#setAuthorizations(new Authorizations(String,…));

Scan#setAuthorizations(new Authorizations(String,…));

Authorizations not in the maximal set of labels granted to the user are dropped. From this point, visibility expression processing is very fast, using set operations.

In the future we envision additional plugins which may interrogate an external source when building the effective label set for a user, for example LDAP or Active Directory. Consider our earlier example. Perhaps the sensitivity classifications are attached when cells are stored into HBase, but the probationary label, determined by the user’s employment status, is provided by an external authorization service.

Anoop & Ramkrishna