Qubole supports clusters within a VPC with private subnets.
VPC with public subnet is analogous to the scenario 1 of amazon VPC architecture docs:
VPC with private&public subnet is akin to the scenario 2 of amazon VPC architecture:
Running clusters in a private subnet is fully covered in Qubole Documentation.
This KB Article provides a summary of the required steps.
Creating the VPC in AWS (by customer)
- To create such a VPC, provision a VPC with two subnets private and public. The private subnet is where qubole clusters will be launched. See the diagram attached.
- Routes in Private Subnet: All outbound traffic (0.0.0.0/0) should go to the NAT gateway setup in the VPC
- Routes in Public Subnet: All outbound traffic (0.0.0.0/0) should go to the internet gateway automatically setup by the VPC wizard
- Amazon will also automatically open routes to allow communication between all hosts in the private and public subnets
- We recommend using a NAT gw instead of a customer setup NAT instance for high reliability
- Create a VPC endpoint to allow direct access to the s3 object store in the region the VPC is in and attach it to the private subnet
- Create only Bastion Security Group:
- This allows SSH access to the qubole tunnel server (inbound) in qubole/us-east-1 (Ask Qubole support for the tunnel server IP)
- Port 7000 should be opened outbound to the private subnet
- Allow outbound to everywhere (default amazon behaviour)
- EnableDNSHostnames should be yes for the VPC
- Bring up the bastion host in the public subnet, using the image in the amazon community labelled "qubole-pv-bastion-ami" which has altered SSH config/etc/ssh/sshd_config to allow Gateway Ports GatewayPorts yes
Qubole side changes (request to Qubole Support)
- Make sure Qubole’s public ssh key is present on bastion node. Otherwise the ssh authentication from our tiers will not work
- Update cluster configuration to register tunnel server used for bastion server setup