Monthly Archives: January 2015

Oracle Data Integrator (ODI) Adapter for Hadoop


Apache Hadoop is designed –

  • To handle and process data from data sources that are typically non-RDBMS
  • To handle data volumes that are typically beyond what is handled by relational databases

The ODI Application Adapter for Hadoop enables data integration developers to integrate and transform data easily within Hadoop using ODI. Typical processing in Hadoop includes validation and transformation using MapReduce Jobs. Desinging and implementing a MapReduce job requires expert programming language. However ODI and Adapter for Hadoop, you do not need to write MapReduce jobs. ODI uses Hive and HiveQL for implementing MapReduce Jobs.

When implementing a big data scenario, the first step is to load data into Hadoop. The data source is typically in the local file system, HDFS, Hive Tables or external Hive Tables

Knowledge Modules?

ODI provides below KMs for use with Hadoop

  • IKM File to Hive (Load Data)
  • IKM Hive Control Append
  • IKM Hive Transform
  • IKM File-Hive to Oracle (OLH)
  • CKM Hive
  • RKM Hive

To setup an integration project, import the above KMs into the ODI Project.

Setting up the Topology?

  1. Define a File Data source (HDFS or Local files outside of HDFS)
  2. Define a Hive Data source
  3. Setup ODI Agent to execute Hadoop Jobs
  4. Configure ODI Studio for executing Hadoop Jobs on the Local Agent


Oracle Data Integrator (ODI) Variables


Variable is an object that stores a single value. It can be string, number or a date.

A variable can be created as a global variable (Prefix as GLOBAL) or a project (PROJECT_CODE). You refer variables using # e.g. #MY_VAR, #MY_PROJET_CODE.MY_VAR, #GLOBAL.MY_VAR

Creating a Variable?

  • Data Type: Alphanumeric, date, Numeric, Text
  • Default Value:
  • Action:
  1. Non-persistent: The value of a variable is kept in memory for whole session
  2. Last value: ODI stores in its repository the latest value held by the variable
  3. Historize: ODI keeps the history of all the values held by the variable. For debugging purposes

Using Variables?

  • Using Variables in Package
    • Declare a variable step
    • Refresh a variable
    • Set the value of a variable
    • Increment a variable
    • Evaluate a variable for conditional branching
  • Using Variables in Interfaces
  • Using Variables in Object Properties
    • Physical names of files and tables (Resource field in the data store) , or their location (Physical Schema’s schema in the topology)
    • Physical Schema
    • Data Server URL
  • Using Variables in Procedures
  • Using Variables within Variables
  • Using Variables in the Resource Name of a Datastore
    • When the file that needs to be loaded into a DWH is dynamic, you can use a variable in the resource name of a data store e.g. where DWH is project_code
    • Declare a variable to use it in the resource property, Set the value of a variable to set to a value, Refresh the value in the package
  • Passing a Variable to a Scenario
  • Generating a Scenario for a Variable