How To: Run R script present on s3, from Qubole analyze

To run R script from s3 location:

  1. Go to Qubole Analyze UI page
  2. Select "Spark Command"
  3. Under this select "Command Line"
  4. Run the following command:
    /usr/lib/spark/bin/spark-submit s3://<your_bucket>/<script_name>

If there are additional files that the script depends on, those can be added as follows:
hadoop dfs -get <s3_location_of_functions.R>

-> Run SPARK -> Command Line -> "/usr/lib/spark/bin/spark-submit <location_on_local_directory>"

This can be done via node bootstrap or workflow as well.

Node Bootstrap Reference:
http://docs.qubole.com/en/latest/user-guide/clusters/run-scripts-cluster.html
http://docs.qubole.com/en/latest/user-guide/clusters/node-bootstrap.html

Workflow Reference:
http://docs.qubole.com/en/latest/user-guide/features/analyze/compose-workflow.html

For Workflow, the first command would be a SHELL COMMAND like:
hadoop dfs -get <s3_location_of_functions.R>

the second command would be a command line submit to specify the local folder.




Have more questions? Submit a request

Comments

Powered by Zendesk