Qubole Release Notes for QDS Version R46 16-Aug-2017

Release Version: 46.45.2

For details of what has changed in this version of QDS, see What is New, New Beta Features, and List of Changes and Bug Fixes in this Release.

Heads-up on the General Availability of Alerts Insights and Recommendations (AIR)

QDS now supports AIR for Data Discovery and Data Model that will be GA after this release.

Data Discovery:

Data Discovery helps analysts get the information about their data model intelligently and intuitively. The Analyze query composer in the QDS UI will contain these two new features:

  • Intelligent Auto-suggestion and Completion.
  • A Preview tab that will contain the following information, Usage Insights, Statistics and Sample Data Preview.

Data Model:

AIR for Data Model provides Insights (for All Tables as well as Hot Tables) and Recommendations (Partition, Sorting and Data Format). Both Insights and Recommendations are available in the Usage page on the QDS UI.
Currently, Insights and Recommendations support only Hive. AIR in SparkSQL and Presto will be supported in the near future.

 

What is New

New in AWS Cluster Management
These are the new enhancements in AWS clusters in this QDS release:

  • Qubole clusters, data stores and custom metastores support non-default SSH port and user on the Bastion node. This feature is generally available with this release.
  • The Node Bootstrap Logs are also available in the cluster UI as part of the Nodes table for a running cluster. In the cluster UI, below the active/running cluster, the number of nodes on the cluster is displayed against Nodes. Click the number to see the Nodes table. For more information, see the documentation. You can also see the list of deleted clusters in the Clusters UI page. 

  • Terminating a cluster warns users about running commands on the cluster.


New Data-at-rest Encryption Configuration in the s3a File System

QDS now supports SSE-KMS and SSE-C in the s3a File System. To use them, set these properties: 

  • fs.s3a.server-side-encryption-algorithm: It is disabled by default. It can be enabled with these supported values: AES256 (for SSE-S3), SSE-KMS, and SSE-C.
  • fs.s3a.server-side-encryption.key: It specifies the encryption key to use if fs.s3a.server-side-encryption-algorithm has been set to SSE-KMS or SSE-C. In the case of SSE-C, the encryption key value must be the Base64 encoded key. If you are using SSE-KMS and leave this property empty, the default S3 KMS key is used. Otherwise, set this property to the specific KMS key ID.

For more information, see the documentation.

Improvement in Handling Eventual Consistency Errors
A significant improvement has been done in handling eventual consistency errors during the FileOutputCommitter.commitJob when appending parquet files. It also improves the commit speed. To use this feature, set:

  • mapreduce.use.parallelmergepaths to true in Hadoop 2. Set FileOutputCommitter version to 1. For more information, see the documentation.
  • spark.hadoop.mapreduce.use.parallelmergepaths to true as part of the spark job configuration. This setting defaults to false. Set FileOutputCommitter version to 1. For more information, see the documentation.

New Enhancements in Hive
This is a new enhancement in Hive in this QDS release:

  • New accounts will be created with the Hive 2.1.1-based metastore.

New Enhancements in Notebooks
These are the new enhancements in Notebooks in this QDS release:

  • When a web-socket connection is lost, reconnect to web-socket instead of refreshing the page to get connected again. 
  • In the Spark cluster UI page, Zeppelin Interpreter Mode has been renamed as Notebook Interpreter Mode. For more information, see the documentation.

 

New Enhancements in Spark

With this release, Spark 2.1.1 is the latest version supported on Qubole Spark and it is reflected as 2.1 latest (2.1.1) on a Spark cluster UI. For more information, see the documentation.

 

UI Enhancements

Qubole has an enhancement in this QDS release that is listed below:

  • As part of Analyze query composer autocomplete feature, the UI detects and suggests table names based on column names provided in the query/command.

 

New Beta Features

QDS Supports Presto 0.180
QDS Supports Presto 0.180 as the beta version on the Presto clusters and it does not support Presto 0.174 beta version from this release. For more information, see the documentation.

 

Qubole Supports Package Management on Spark Clusters
QDS UI supports creating an environment to easily install and upgrade R and Python packages on Spark clusters. It also supports seamless installation/upgrade of packages on a running Spark cluster. This feature is available for beta access about a week after this release. To get this feature enabled on your QDS account, create a ticket with Qubole Support.

 

List of Changes and Bug Fixes in this Release

AIRFLOW

Enhancements

Change   AIR-11: QDS supports Airflow clusters working with Postgresql data sources.


AWS CLUSTER MANAGEMENT

New Features

New   MW-646: Qubole clusters, data stores and custom metastores support non-default SSH port and user on the Bastion node. This feature is generally available with this release.
New   UI-5345: The Node Bootstrap Logs are also available in the cluster UI as part of the Nodes table for a running cluster. In the cluster UI, below the active/running cluster, the number of nodes on the cluster is displayed against Nodes. Click the number to see the Nodes table.
New   UI-6213: Terminating a cluster warns users about running commands on the cluster.


Bug Fixes

Fix   ACM-1185: In certain scenarios during upscaling, the specified spot node percentage might not be honored. So, there is an option that can be enabled to strictly maintain the spot percentage. Contact Qubole Support to enable this option on your QDS account.

Fix   ACM-1270: The issue with disk capacity calculations in Datadog alerts has been fixed.
Fix   ACM-1374: The UI now shows an error if node bootstrap file for a cluster could not be retrieved.


Enhancements

Change   UI-4279: In the Clusters page, a View Deleted Clusters link has been added in the bottom-left of the page. Clicking the link displays the list of deleted clusters.
Change   UI-5386: An option to create a new HBase cluster has been removed from the Clusters UI page.


HADOOP

Enhancements

Change   HAD-616: Retry copying of jars/files/archives when submitting a job in case of failures.

 

HADOOP 2

Bug Fixes

Fix   HADTWO-903: Fixed an issue where concurrent access to job history files using history server was failing leading to job failures in some cases.
Fix   HADTWO-932: Fixed an issue due to which the Shell command did not produce any output.
Fix   HADTWO-985: Upgraded the aws-java-sdk version used in S3A File System from 1.10.77 to 1.11.158.


Enhancements

Change   HADTWO-845: A significant improvement has been done in handling eventual consistency errors during the FileOutputCommitter.commitJob. It also improves the commit speed. To use this feature, set :

  • mapreduce.use.parallelmergepaths to true in Hadoop 2.
  • spark.hadoop.mapreduce.use.parallelmergepaths to true as part of spark job config. This setting defaults to false. Note that it works only when FileOutputCommitter version 1 is used. It is a no-op with FileOutputCommitter version 2.

Change   HADTWO-971: QDS now supports SSE-KMS and SSE-C in the s3a File System. To use them, set these properties:

  • fs.s3a.server-side-encryption-algorithm: It is disabled by default. It can be enabled with these supported values: AES256 (for SSE-S3), SSE-KMS, and SSE-C.
  • fs.s3a.server-side-encryption.key: It specifies the encryption key to use if fs.s3a.server-side-encryption-algorithm has been set to SSE-KMS or SSE-C. In the case of SSE-C, the encryption key value must be the Base64 encoded key. If you are using SSE-KMS and leave this property empty, the default S3 KMS key is used. Otherwise, set this property to the specific KMS key ID.

Change   HADTWO-976: Maximum number of nodes that can be downscaled simultaneously during downscaling is now controlled by mapred.hustler.downscaling.nodes.max.request. Its default value is 500.
Change   HADTWO-1008: Automatic detection of a bucket endpoint in the s3a file system can now be enabled through a configuration option. To enable it, set fs.s3a.bucket.endpoint-detection.enable to true. By default it is disabled.


HIVE 1.2

Enhancements

Change   HIVE-2254: New accounts will be created with the Hive 2.1.1-based metastore.
Change   HIVE-2268: Terminating cluster if HiveServer2 startup fails.
Change   HIVE-2367: Compaction for Hive ACID transactions has been turned off.

 

PRESTO

Bug Fixes

Fix   PRES-1089 and PRES-1116: Sensitive information like AWS keys is hidden in the Query Tracker page.
Fix   PRES-1094: Do not schedule tasks on the master in case of bucketed tables too.
Fix   PRES-1118: Fixed the configuration overrides’ issue with the Presto 174 version.
Fix   PRES-1129: Fixed running metadata commands on view. For example, desc.
Fix   PRES-1190: Autoscaling module updated in 0.157 and 0.180 to retry when node termination fail during downscaling other cluster could end up with more than configured maximum size.
Fix   PRES-927: Fixed a bug in Presto auto-scaling where more nodes were getting added than the configured maximum nodes.


Enhancements

Change   PRES-1100: Reading tables is supported with the s3a scheme via RubiX in Presto.
Change   PRES-1132: Presto 0.180 is now available as a beta version.
Change   PRES-1140: Presto 0.174 beta version is no longer supported and use Presto 0.180 version as the beta version.


QDS

Bug Fixes

Fix   MW-1040: While downloading results from Analyze UI, a correct error is shown when a number of files to process exceeds the maximum limit of QDS.
Fix   UI-5643: Data store related UI fields when it is in a VPC are visible now.
Fix   UI-5938: As part of Analyze query composer autocomplete feature, the UI detects and suggests table names based on column names provided in the query/command. It is as part of the Data Discovery feature. To get it enabled on your QDS account, create a ticket with Qubole Support

 

SPARK

New Features

New   SPAR-1731: With this release, Spark 2.1.1 is the latest version supported on Qubole Spark and it is reflected as 2.1 latest (2.1.1) on a Spark cluster UI. Spark 2.1 latest (2.1.1)  has been auto upgraded from Spark 2.1.0.

 

Backports from Open Source Spark

Change   SPAR-1820: SPARK-19872 back ported to Spark-2.1.0 - UnicodeDecodeError in Pyspark on sc.textFile read with a repartition.

 

Bug Fixes

Fix   SPAR-1562: The issue in which there was a 504 Gateway Time-out while accessing the Spark Application UI, has been fixed.
Fix   SPAR-1691: Spark SQL optimization to scan partitions on Hive tables backed by Parquet/ORC files with lots of partitions is now ported in Spark 2.1.x versions.
Fix   SPAR-1809: Fixed the object store path issue that occurred when creating a table using Spark.

Enhancements

Change   SPAR-1675: Spark application startup time has been significantly reduced by avoiding localization of Spark binaries. This is done by avoiding wait time for executors to register before marking Spark App ready and so on. This feature is not available by default, create a ticket with Qubole Support to get it enabled on your QDS account.


TEZ
Bug Fixes

Fix   QTEZ-173: Fixed the task log url for running jobs that was reported as broken.


ZEPPELIN/NOTEBOOKS

New Features

New   ZEP-477: Package Management Environment version v.0 is available in the QDS UI > Control Panel as a beta version. Contact Qubole Support to get this feature enabled on your QDS account. 
New   ZEP-926: When a web-socket connection is lost, reconnect to web-socket instead of refreshing the page to get connected again.

Bug Fixes

Fix   UI-6230: A typo has been fixed in the Dashboards homepage.
Fix   ZEP-1252: Flickering Issues with Notebooks leading to data/code loss have been fixed.


Enhancements

Change   UI-3352: The filter in the Notebook UI has been enhanced with additional fields.
Change   UI-6233: In the Spark cluster UI page, Zeppelin Interpreter Mode has been renamed as Notebook Interpreter Mode.


List of Hotfixes Since 12th July 2017


Fix   HADTWO-963: Fixed the connection reset exception in seek flow in S3AFileSystem.
Fix   HADTWO-1018: Fixes in multi-object delete in the S3a file system and they are:

  • Added the option, fs.s3a.multiobjectdelete.enable to disable multi-object delete. By default, multi-delete is enabled.
  • Added the option, fs.s3a.multiobjectdelete.batch.size to configure the number of keys to delete at once. By default, its value is 1000.
  • Add an option for retries, fs.s3a.multiobjectdelete.batch.retries in multi object delete request if MultiObjectDeleteException is encountered. By default number of retries is 5.

Fix   HADTWO-1033: Fixed an issue due to which Cache was not getting honored in Avro Tables.
Fix   MW-1221: The error that said Error creating user while creating a new account through API has been fixed.
Fix   ZEP-1230: The issue in which changes in notebook paragraphs appeared lost has been fixed now.



Have more questions? Submit a request

Comments

Powered by Zendesk