Oozie on Qubole Clusters

This is a step-by-step guide on how to set up Oozie to work with Qubole cluster. 

In this guide, we are using Qubole Hadoop 2 (2.6) cluster and Oozie 4.1.0. We will also illustrate 2 configurations:

  1. Oozie is hosted on the cluster master node using a node-bootstrap script.
  2. Oozie is hosted on a separate EC2 instance that is in the same subnet with the Qubole cluster.

1 Oozie on Cluster Master

In this configuration, Oozie server is hosted on Qubole Cluster master nodes.

Installation Guide

  • Set up node bootstrap script to use script qbol_nm_oozie_v2.sh.qbol_nm_oozie_v3.sh.
  • Create a security group in EC2 with port 11000 open. This is the port for Oozie UI.

How to run an Oozie Job

  • Upload Oozie job folder that contains the job.properties and workflow.xml to s3. In job.properties file, 
    • namenode port is 9000, job tracker port is 8031 (yarn) 
    • namenode and jobtracker ip should be using node internal IP such as 10.0.0.x.
    • oozie.wf.application.path should be pointing to the corresponding folder in the hdfs put operation
  • Run the Oozie job via Shell Command in Analyze:
 
hadoop dfs -get s3://jkuang/test/no-op
hadoop dfs -put no-op /user/jove/
/usr/local/oozie/bin/oozie job -oozie http://${master node internal IP}:11000/oozie -config no-op/job.properties -run
  • Check Oozie job status via UI at http://${your-master-node}:11000/oozie.

2 Oozie on EC2

In this configuration, Oozie server is hosted on an EC2 instance in the same subnet as the Qubole cluster nodes.

Installation Guide

Install Java on Oozie server:

yum install java-1.7.0-openjdk-devel.x86_64

 

Install Maven on Oozie server:

  1. Get Maven
    wget http://mirror.cc.columbia.edu/pub/software/apache/maven/maven-3/3.0.5/binaries/apache-maven-3.0.5-bin.tar.gz
  2. Install Maven
    tar xzf apache-maven-3.0.5-bin.tar.gz -C /usr/local
    cd /usr/local
    sudo ln -s apache-maven-3.0.5 maven
  3. Set up Command Shell
    echo ‘export M2_HOME=/usr/local/maven’ > /etc/profile.d/maven.sh
    echo ‘export PATH=${M2_HOME}/bin:${PATH}’ >> /etc/profile.d/maven.sh
    source /etc/profile.d/maven.sh

Install Oozie on Oozie server:

  1. Get and Build Oozie
    wget http://archive.apache.org/dist/oozie/4.1.0/oozie-4.1.0.tar.gz
    tar xzf oozie-4.1.0.tar.gz 
    mv oozie-4.1.0 /usr/local/

    cd /usr/local/oozie-4.1.0
    sed -i "s:<hadoop.version>1.1.1</hadoop.version>:<hadoop.version>2.3.0</hadoop.version>:" /usr/local/oozie-4.1.0/pom.xml
    set JAVA_HOME="/usr/lib/jvm/java-1.7.0/"
    mvn clean package assembly:single -P hadoop-2 -DskipTests
  2. Install Oozie
    cd /usr/local/oozie-4.1.0/distro/target
    tar xzf oozie-4.1.0-distro.tar.gz
    mkdir /usr/local/oozie
    mv /usr/local/oozie-4.1.0/distro/target/oozie-4.1.0/* /usr/local/oozie/

    cd /usr/local/oozie/
    mkdir /usr/local/oozie/libext
    cp -R /usr/local/oozie-4.1.0/hadooplibs/hadoop2/target/hadooplibs/hadooplib-2.3.0.oozie-4.1.0/* /usr/local/oozie/libext
  3. Get ExtJs
    cd /usr/local/oozie/libext
    wget http://archive.cloudera.com/gplextras/misc/ext-2.2.zip

  4. Prepare war and database 
    sh /usr/local/oozie/bin/oozie-setup.sh prepare-war
    sh /usr/local/oozie/bin/ooziedb.sh create -sqlfile oozie.sql -run
  5. Run Oozie server
    sh /usr/local/oozie/bin/oozied.sh start
Have more questions? Submit a request

Comments

Powered by Zendesk