Reference: Looker with Hive 1.2

This is a guide to enabling jdbc connections to a Qubole cluster using a Hiveserver 2. It includes setting the default execution engine to Tez, and binds the cluster to an EIP so that it can be statically referenced in Looker.

Prerequisites:

  • Admin access to a Qubole account on https://api.qubole.com with customer managed S3 and EC2 settings.
  • Default Hive version in Qubole set to 1.2. If uncertain, please contact Qubole support to verify this.
  • An available Hive 1.2 or Spark 1.5+ cluster in Qubole to complete the configuration
  • A preconfigured VPC with a public subnet in AWS. Ensure that port: 10008 is open into and out of this subnet on any existing ACLs you have
  • Access in AWS to create a new Security Group
  • Access in AWS to create an Elastic IP
  • AWS S3 privileges to upload and modify files

Step-by-step guide:

  1. Login to AWS and allocate a new EIP (Elastic IP address). Choose EC2-VPC as the Network Platform. Make a note of the Allocation ID.
  2. Login to AWS to the VPC Management console, and create a new security group where you can define an open port to connect to the cluster. Ensure your new SG is set to use the same vpc as your cluster.

  3. Open port 10008 in your new SG to your source (i.e. where you'll be connecting from). Note: If Looker is hosting your instance, the list of IPs that you should whitelist is at this URL: https://looker.com/docs/setup-and-management/enabling-secure-db

     
  4. Edit the bootstrap script (see attached) and set these parameters according to your environment:

    EIP_ALLOCATION_ID (from step 1)

    MY_AWS_ROLE (just the name, not the full ARN)

    EC2_URL (default is set to https://ec2.us-east-1.amazonaws.com/)    
        

  5. Upload the edited node_bootstrap_jdbc.sh file to the following location in S3:

    S3://<your-bucket>/scripts/hadoop/node_bootstrap_jdbc.sh

    OR using the AWS cli:

    aws s3 cp --content-type "text/plain" node_bootstrap_jdbc.sh s3://<your-bucket>/scripts/hadoop/node_bootstrap_jdbc.sh

     

  6. Ensure that the file type is “text/plain”:

    The aws cli equivalent to do this is:

    aws s3api get-object --bucket <bucket-name> --key scripts/hadoop/node_bootstrap.sh /tmp/test.out

    The output would look as follows:

    {
    "AcceptRanges": "bytes",
    "ContentType": "text/plain",
    "LastModified": "Fri, 27 May 2016 18:48:56 GMT",
    "ContentLength": 4518,
    "ETag": "\"d8e41f773bd390ea85d8c235f377fbcb\"",
    "Metadata": {}
    }

     

  7. Edit your cluster in Qubole, and ensure that it points to the node bootstrap file that you uploaded:



        

  8. Add the following overrides to your cluster:

    dfs.namenode.rpc-bind-host=0.0.0.0
    dfs.namenode.http-bind-host=0.0.0.0
    dfs.namenode.https-bind-host=0.0.0.0
    yarn.resourcemanager.bind-host=0.0.0.0

  9. Point your cluster to the VPC referred to in the prerequisites.
      

  10. Ensure that you add the new persistent security groups (created in steps 2 & 3), to your cluster:

  11. Start your cluster. Please note, it may take up to a minute after the cluster has come up before the bootstrap completes starting up the Thrift server and binding to the EIP.

  12. Navigate to the EC2 Management console, and click on Elastic IPs in the left-nav. You should see that the EIP created previously has been bound to your Qubole cluster:


  13. Login to Looker and navigate to Admin > Database: Connections.
      

  14. Create a new Apache Hive2 connection. For the hostname, use the public DNS associated with the EIP from step 10, and set the port to 10008.

  15. Configuration is now complete.

NOTE: The bootstrap will take a few minutes to complete, so you will need to allow the script time to bind the cluster's master node to the EIP before attempting to connect to it in Looker.

Have more questions? Submit a request

Comments

Powered by Zendesk