Enhancing Hadoop* Security with Cloudera

Cloudera Logo Big data is exciting—big data applications can help grow and improve businesses of all sizes. Big data analytics is moving toward real-time operational intelligence that is starting to support "live" decision making. But lost among all the excitement about the potential of big data are the very real security challenges. While big data has a range of benefits to businesses, it also brings risks with privacy and legal issues. Big data may contain sensitive financial data, intellectual property, and personally identifiable information such as the names, addresses, and Social Security numbers of customers and employees.

Security and Apache Hadoop

Hadoop is an open-source architecture designed to enable organizations to gain analytic insights and operational efficiencies through the use of multiple-standard, low-cost, and high-speed parallel processing nodes operating on very large sets of data. The resulting flexibility, performance, and scalability are unprecedented.

The standards of data security, data governancedata management and data protection need to be used with Hadoop and big data analysis. The more data you have, the more important it is that the data is protected. Due to the sensitive nature of data and the potential damage that can occur if it falls into the wrong hands, it must be adequately protected.

A few of the Hadoop security limitations include:

  • Infrastructure security must be maintained within the Hadoop distributed programming framework.
  • Data privacy must be preserved in data mining and analysis. The Hadoop file system has no built-in encryption for data at rest or for data in transit.
  • Data storage needs to be secure, and transit logs must be maintained.
  • Data integrity must be preserved through both input filtering and end-point validation.
  • Authentication is needed for users, for applications on all types of clients (including mobile device apps and web consoles), and for system processes.

What Intel IT Is Doing about Hadoop Security

As documented in our recent white paper “Big Data: Securing Intel IT’s Apache Hadoop* Platform” we have deployed Cloudera Enterprise* software, developed a Hadoop security strategy, and implemented best practices to help keep sensitive data secure.

We defined our data security strategy up front as part of the roadmap for integrating Hadoop into our enterprise information environment. Careful planning, thorough testing, proactive communications, and phased implementation helped enable the first big data platform within Intel that is certified to host data that is Intel top secret. Here are the steps we took:

  1. We started by securing the perimeter and guarding access to the cluster itself utilizing LDAP, firewalls, and features in Kerberos*.
  2. Next, we added fine-grained authorization and role-based access controls defining what users and applications can do with data.
  3. Then we established data integrity and provenance, identifying where data came from and how it is being used.
  4. Finally, through encryption, tokenization, and data masking, we protect the data in the cluster from unauthorized visibility.

These steps addressed the immediate data security needs and closed existing Hadoop security gaps.

Data Security Strategy

Intel IT’s Commitment to Data Security

Intel IT is committed to protecting Intel’s intellectual property and the personally identifiable information of customers and employees. We have established a method for balancing risk and productivity to achieve the appropriate level of business innovation, agility, efficiency, and risk tolerance.

Intel IT has benefited from a Hadoop distribution that has open source as its core, not proprietary software. Using open source software components helps minimize total cost of ownership. Starting with a small test project helped keep the focus on design and scalability of the security solution.

In our current era of big data, it is critical to understand the importance of security as we process and analyze massive amounts of data. This starts with understanding our data, the associated security policies, and how they need to be enforced.

Read the IT@Intel White Paper “Big Data: Securing Intel IT’s Apache Hadoop* Platform” for additional information, and then begin to secure your big data. You will be amazed at the results!