IDH with Oracle Big Data Connectors (Part 2)

Ritu Kama is the Director of Product Management for Big Data at Intel. She has over 15 years of experience in building software solutions for enterprises. She  led Engineering, QA and Solution Delivery organizations within Datacenter Software Division for Security and Identity products. Last year she led the Product and Program management responsibilities for Intel’s Distribution of Hadoop and Big Data solutions.  Prior to joining Intel, she led technical and architecture teams at IBM and Ascom. She has a MBA degree from University of Chicago and a Bachelor’s degree in Computer Science.

This is the second part of a two-part series.  If you missed the first part you can find it here.

In this part of my blog entries on Oracle BDCs, I would like to show an example of the Oracle SQL Connector for Hadoop in action.

Step 1: Install OSCH

OSCH can be downloaded here.  The installation guide for OSCH is located here.  It is important that the system which is running Oracle is able to access your Hadoop cluster.

First log on to the system that is running Oracle as oracle.  Ensure that HIVE_HOME is set.  From here we can begin the install.

  • Ø cd
  • Ø mkdir osch
  • Ø export OSCH_HOME=~/osch/

Now unzip the OSCH package into the osch directory.

  • Ø cd osch
  • Ø ./hdfs_stream

You should see the following – Usage: hdfs_stream locationFile

  • Ø hadoop fs –mkdir /user/oracle/osch/exttab
  • Ø sqlplus
  • Ø SQL> connect / as sysdba
  • Ø SQL> create or replace directory osch_bin_path as ‘/home/oracle/osch/bin’;
  • Ø SQL> create user hdfsuser identified by welcome1;
  • Ø SQL> grant create session, create table to hdfsuser;
  • Ø SQL> grant execute on SYS.UTIL_FILE to hdfsuser;
  • Ø SQL> grant read, execute on directory osch_bin_path to hdfsuser;
  • Ø SQL> create or replace directory exttab_default_directory as ‘/home/oracle/osch/exttab’;
  • Ø SQL> grant read, write, execute on directory exttab_default_directory to hdfsuser;
  • Ø SQL> exit;

Step 2: Create an External Table

Put the data files included in this blog post onto HDFS

  • Ø hadoop fs –mkdir /user/oracle/osch/data/
  • Ø hadoop fs –put part* /user/oracle/osch/data/

Configure the database.

  • Ø sqlplus
  • Ø SQL> connect / as sysdba
  • Ø SQL> create or replace directory oracletest_dir as ‘/home/oracle/osch/logs’;
  • Ø SQL> grant read, write on directory oracletest_dir to hdfsuser;
  • Ø SQL> exit;

Create the file osch.xml as follows (be sure to change the connection string in bold):

<?xml version=”1.0”?>







































Explanation of properties:

  • tableName – name of external table to be created In Oracle
  • sourceType – type of source data (text is default for HDFS files, hive is used for Hive tables)
  • dataPaths – where source data resides
  • url – connect string to target database
  • user – database user to perform operation as
  • locationFileCount – number of location files (files that point to data in HDFS for Oracle to read)  more files can increase parallelism.
  • fieldTerminator – what the fields are delimited by in the source file
  • columnNames – names of the columns to be used in the created Oracle external table
  • defaultDirectory – where the location files will reside

Now to create the external table run the following:

  • Ø hadoop jar $OSCH_HOME/jlib/orahdfs.jar \
  • Ø oracle.hadoop.exttab.ExternalTable \
  • Ø -conf <location of osch.xml>/osch.xml \
  • Ø -createTable

Part 3 – Verify that the external table was created successfully

  • Ø sqlplus hdfsuser/welcome1
  • Ø SQL> describe text_ora_ext_tab;
  • Ø SQL> select count(*) from text_ora_ext_tab;
  • Ø SQL> select cust_id from text_ora_ext_tab where rownum < 10;

What you are actually doing is querying an Oracle extended table that overlays data files in HDFS. The results of your queries are being obtained by issuing MapReduce jobs to the cluster.  Pretty incredible!