How To: Running Mahout


This article discusses how to run Mahout Jobs in Qubole.

About Mahout:

How To:

  • Mahout jars are placed in Qubole S3 bucket "paid-qubole" and the is used at the time of boot up to install the JARS on the clusters
    • Patch: We have hand patched all Mahout jars and added an extra set of class files at org/apache/hadoop/mapreduce/lib/input/. Reason to patch mahout library was just that we did not want to patch our Hadoop jars, which could be the ideal way to fix this.
  • has to be copied to the default s3 location of Cluster, so that cluster gets provisioned during the boot time. The bootstrap file should look like:

mkdir -p /media/ephemeral0/mahout
cd /media/ephemeral0/mahout
hadoop dfs -get s3://paid-qubole/mahout0.9/mahout-distribution-0.9.tar.gz .
hadoop dfs -get s3://paid-qubole/mahout0.9/data.tar.gz .
tar -xvf mahout-distribution-0.9.tar.gz
tar -xvf data.tar.gz

  • Run a sample job ( Shell Command) as below replacing the --output with a write accessible bucket:

/media/ephemeral0/mahout/mahout-distribution-0.9/bin/mahout recommenditembased --input s3://paid-qubole/mahout0.9/sampledata/myrating.csv --output s3:// --tempDir /tmp/abc6 --usersFile s3://paid-qubole/mahout0.9/sampledata/ -s SIMILARITY_COSINE


Have more questions? Submit a request


Powered by Zendesk