Adding specific jar/package to be used with PySpark

Node_bootstrap should be helpful cases with and without 'pip install'
More info on node_bootstrap script: http://docs.qubole.com/en/latest/user-guide/clusters/run-scripts-cluster.html#examples-node-scripts

If pip works:
Add following line to node_bootstrap script:
pip install <pkgname>

If pip doesnt work, this is what will work: - use node_bootstrap script for this as well:

wget https://pypi.python.org/pypi/geojson/ link to download the package
<other setup steps for this package here>

Once above is done, and cluster is restarted, the package should be available to use in PySpark. 

Have more questions? Submit a request

Comments

Powered by Zendesk