Qubole Release Notes 25-Oct-2016

Release Version: 38.0.0

For details of what has changed in this version of QDS, see What is New,  New Beta Features, and List of Changes and Bug Fixes in this Release.


What is New


AWS KMS Client-side Encryption

The AWS KMS client-side encryption is now supported on the S3A filesystem.

LVM-based Storage Capacity Upscaling

Hadoop 2 and Spark clusters now support the LVM-based storage capacity upscaling. See node configuration and EBS upscaling for more information. 

Enhancements in Presto

These are the enhancements in Presto:

  • Presto cluster auto-scaling logs now show the cause for the spot loss
  • Qubole has introduced a new UI checkbox as part of Hadoop Overrides on the Presto Cluster UI to enable Rubix. 
  • Qubole now supports Avro format in Presto 0.142. Avro tables with schema defined in table properties are supported.
  • The autoscaling property to change the target latency of jobs has been introduced. 
  • Qubole has added support for datatype varchar(x) in Presto 0.142
  • Going forward, Qubole will release trial versions of Presto in sync with versions from the open source community. Trial versions will not contain all cloud-specific features available in a production version. Qubole suggests running these versions for development purposes only. Over time, Qubole will upgrade selective trial versions to be production ready. Please contact help@qubole.com for more information.
  • Presto 0.150 is available as a trial version. Contact help@qubole.com for more information.


UI Enhancements


These are the UI enhancements:

  • Qubole has added password strength feedback in Sign Up, Change Password and Forgot Password pages. 
  • Notebooks are arranged as per the cluster ID and name. Adding a new notebook automatically opens it.


New Beta Features

Streamlined Spark Default Configuration

Spark default configuration parameters are streamlined. Now most of them are shown in the cluster settings and Analyze page. Number of cores has been set to half (minimum 2) of of its previous default configuration. This feature is available for beta access. Contact help@qubole.com to enable this feature for the Qubole account.


List of Changes and Bug Fixes in this Release

AIRFLOW


New
QBOL-5843: Added support to fetch logs from account's defloc, incase the logs are not found on the Airflow cluster master.
Change
QBOL-5261: Default Datastore (MySQL) is now available inside Airflow cluster. This removes the need to explicitly setup RDS to launch an Airflow cluster.
Change
QBOL-5590: Refactored uploading Airflow logs to S3 to improve overall Network and CPU utilisation.
Change
QBOL-5816: To keep Airflow's UI response time lower, Qubole has moved its static assets to CDN. This results in 70-80% faster loading pages.

 


AWS CLUSTER MANAGEMENT


Fix  ACM-511: Qubole uses AWS user data file to initialize cluster nodes. AWS has a maximum size limit of 16KB for this file. All configuration customizations by customers such as Hadoop overrides, fair scheduler configuration, and so on are passed to clusters using the user data file. In some cases, the 16-KB limit was restrictive for customers and prevented them from specifying all required configurations.

This fix removes all size limits by utilizing S3 to pass the user data file from Qubole tiers to cluster nodes during the startup.
Fix  ACM-598: Fixed an issue where clusters would not start because of hitting the maximum concurrent command limit.
Fix  ACM-691: Fixed an issue where virtualenv was not honoring the Python version at the account level.


HADOOP


Fix  HAD-636: Fixed the integer overflow in BytesWritable (port of HADOOP-11901).


HADOOP-2


New  HADTWO-565: The AWS KMS Client-side encryption is now supported on the S3A File System.
Fix  HADTWO-594: Fixed copying of files only from a directory (and not from a common prefix) if the input path is a directory.
Fix  HADTWO-597: Fixed copying of a single file.
Fix  HADTWO-614: Changing maximum number of connections supported by S3A to 100.
Fix  HADTWO-617: Extremely high pending containers in the cluster due to preemption and rescheduling of containers.
Fix  HADTWO-621: Underlying Hadoop job was not getting killed on killing Hive job. This issue has been fixed now.
Change  HADTWO-490: Hadoop 2 and Spark clusters now support the LVM-based storage capacity upscaling.
Change  HADTWO-590: Added infrastructure to check the spot loss notification periodically on spot nodes.
Change  HADTWO-613: Added support for aggressive downscaling in Hadoop 2.


HBASE


Fix  HBAS-186: Fixed open source JIRA: HBASE-13329


HIVE 0.13


Change  HIVE-1567: Whenever a bulk of commands are scheduled at a time on the cluster master, there seems to be SSH connection problems . The hive.enable_ssh_retry flag enables retries while using SSH and executing the Hive command.


HIVE 1.2


Fix  HIVE-1468: Fixed the broken constant folding in case of UDF (missing).
Change  HIVE-1567: Whenever a bulk of commands are scheduled at a time on the cluster master, there seems to be SSH connection problems. The hive.enable_ssh_retry flag enables retries while using SSH and executing the Hive command.


PRESTO 0.119


Fix  PRES-791: This fixes the failures when queries are submitted via the Ruby client on a Presto cluster that is starting up.
Change  PRES-687: Added the log message to indicate spot loss as part of cluster auto-scaling logs .
Example log message: 2016-09-16T11:11:17.724Z INFO CMRefresherService RUNNING com.qubole.service.CMRefresherService Node:i-9b3446aa moved to REMOVED state, probably due to spot loss
Change  PRES-737: Rubix in Presto can now be enabled by selecting the Enable Rubix checkbox in the Hadoop Overrides on the cluster UI page. Once you select it, it takes care of setting up Presto to use Rubix without a need for explicitly updating configuration.
Change  PRES-784: ascm.bds.target-latency can now be used to change the target latency (default of 1 minute) for the jobs. Increasing this makes autos-scaling less aggressive.

 


PRESTO 0.142


Fix  PRES-679: Added Avro format support in Presto v0.142. Avro tables with schema defined in table properties are supported.
Fix  PRES-776: Qubole has added support for datatype varchar(x)
Fix  PRES-791: This fixes the failures when queries are submitted via the Ruby client on a Presto cluster that is starting up.
Fix  PRES-825: This fix ensures no work is scheduled on the master. There were some cases where Presto scheduled work on master even after it being configured to not do so.
Change  PRES-687: Added the log message to indicate spot loss as part of cluster auto-scaling logs .
Example log message: 2016-09-16T11:11:17.724Z INFO CMRefresherService RUNNING com.qubole.service.CMRefresherService Node:i-9b3446aa moved to REMOVED state, probably due to spot loss
Change  PRES-737: Rubix in Presto can now be enabled by selecting the Enable Rubix checkbox in the Hadoop Overrides on the cluster UI page. Once you select it, it takes care of setting up Presto to use Rubix without a need for explicitly updating configuration.
Change  PRES-784: ascm.bds.target-latency can now be used to change the target latency (default of 1 minute) for the jobs. Increasing this makes autos-scaling less aggressive.

 

QDS


New  QBOL-3852: To make Qubole more secure for its users, Qubole will be adding password enforcements (zxcvbn score of 2+) along with password expiry (6 months) and password archiving (last 3).
Fix  UI-3403: Fixed an issue with the scheduler where sometimes the user is not able to save or edit a schedule job with the Hive dependency.
Change  QBOL-5693: In the command template API, a new field called default_value within the input_vars parameter has been added. The default values get picked while running the command template API if values are not specified.
Change  QBOL-5740: You can see the Storage Role ARN and External ID in the View Account API for an IAM Roles based account.
Change  UI-3691: Creating a new notebook must open it automatically.
Change  UI-4215: Added password strength feedback in Sign Up, Change Password and Forgot Password pages. 
Change  UI-4336: List of notebooks on My notebook will be arranged as per the assigned cluster ID and name.
Change  UI-4436: QDS UI header enhancements


SPARK


Change  SPAR-1137: Spark default configuration parameters are streamlined. Now most of them are shown in the cluster settings and Analyze page. Number of cores has been set to half (minimum 2) of its previous default configuration.
This feature is available for beta access. Contact help@qubole.com to enable this feature for the Qubole account.


TEZ


Fix  QTEZ-70: Fixed an issue in Hive which causes NullPointerException with the 3-way Tez merge join. Related OS JIRA - HIVE-12563


ZEPPELIN/NOTEBOOKS


Change  ZEP-439: Notebook can be linked to GitHub via .git link as well.

 


List of Hotfixes Since 29th September 2016


None

 

List of Hotfixes After 25th October 2016

 

Fix   ACM-703: Qubole has removed an AWS API call made while running a command on a running cluster. This helps commands which fail because of AWS API limits and make the commands run a little faster.
Fix   ACM-727: Qubole has upgraded Kernel in Qubole AMIs for ALAS-2016-757.

Fix   QBOL-5900: Fixed the issue related to Notebook folders when just bucket name is used as the account's default location. 

Fix   UI-4014: Qubole has introduced folders in Notebooks. However, this feature is available for betaaccess. Contact help@qubole.com to enable this feature for the account. 

 

Have more questions? Submit a request

Comments

Powered by Zendesk