Big Data: The Right Tools to Optimize Performance

If you have deployed Apache Hadoop*, and you are already analyzing unstructured data, you may have Big Data corralled. But is your Hadoop environment optimized for maximum throughput?

Hadoop is a powerful tool for storing and processing massive data sets. But fine-tuning Hadoop clusters for maximum performance can be difficult to accomplish from within Hadoop. If you want to analyze your data processing to make sure your clusters are set up correctly and performing optimally, you’ll need a different set of tools.

Distributed Hadoop environments can be challenging to fine-tune because of the way the framework handles data partitioning, load balancing, fault tolerance, and other low-level operations that Hadoop structures automatically. Intel recently introduced two open-source tools—HiBench and HiTune—to help optimize Hadoop clusters for faster analytics.

HiTune monitors the key performance metrics on each server in a Hadoop cluster, then aggregates and correlates these low-level indicators with the high-level data flow model. From this output, you can gain deep insight into the dynamic interactions between different tasks and stages, and quickly pinpoint performance bottlenecks, application hotspots, and hardware problems that slow data processing.

To further verify your clusters are set up correctly for maximum performance, perform benchmark testing on the system using HiBench, which allows you to measure, validate, and compare performance of Hadoop clusters accurately and consistently across diverse workloads. You can measure cluster performance for specific, common tasks, such as sorting and word counting, or for more comprehensive real-world applications, such as web searching, machine learning, and data analytics. If you’d like to learn more about the HiBench benchmarking suite for large-scale analysis, download this paper by Intel’s Jason Dai.

HiBench and HiTune tools let you get under the hood with Hadoop and fine-tune performance to gain deep insights into the dynamic interactions between hardware and software in massively distributed environments. The result? Faster, more efficient data analysis and higher value from your Hadoop cluster. For a product overview read here.

Intel will continue to support the extension and refinement of HiTune and HiBench, and is working with other leading vendors and standards bodies to distill expertise with Hadoop technologies into reference architectures, tuning guides and best practice recommendations. Big Data is growing fast—with Hadoop and associated technologies, we now have the tools to manage and get value out of it.

Published on Categories Archive
Tim Allen

About Tim Allen

Tim is a strategic marketing manager for Intel with specific responsibilities related to the cloud, big data, analytics, datacenter appliances and RISC migration. Tim has 20+ years of industry experience including work as a systems analyst, developer, system adminstrator, enterprise systems trainer, product marketing engineer and marketing program manager. Prior to Intel Tim worked at Tektronix, IBM, Intersolv, Sequent and Con-Way Logistics. Tim holds a BSEE in computer engineering from BYU, PMP certification and a MBA in finance from the University of Portland. Specialties include - PMP, MCSE, CNA, HP-UX, AIX, Shell, Perl, C++