Qubole Release Notes for QDS-on-AWS Version R48 24-Oct-2017

Release Version: 48.3.0

Qubole provides Qubole Data Service (QDS) on Amazon Web Service (AWS), Microsoft Azure, and Oracle Cloud Infrastructure (OCI).

Note: This set of release notes is for QDS-on-AWS.

 

For details of what has changed in this version of QDS, see:

  • What is New in QDS-on-AWS
  • New Alpha/Beta Features in QDS-on-AWS
  • List of Changes and Bug Fixes in QDS-on-AWS

 

What is New in QDS-on-AWS

New in AWS Cluster Management

These are the new enhancements in AWS clusters in this QDS release:

  • The Clusters UI page supports object-level access-control-lists. This feature is not enabled by default. Create a ticket with Qubole Support to enable this feature on the QDS account.  For more information, see the documentation.
  • When using EBS upscaling with multiple initial EBS volumes, there will be one logical volume created per initial volume. This is different from the earlier behaviour where all the volumes were added to the same logical volume.
  • Hadoop 2 clusters support upscaling based on reducers using a configuration option, which is disabled by default. It can be enabled by adding it as an Hadoop override:
    mapred.reducer.autoscale.factor=1
  • Cluster-level permissions can be set for default clusters created during creating/cloning an account.

QDS Supports BlockOutputStream in the S3a Filesystem

QDS now supports BlockOutputStream in the s3a file system. To enable it, set fs.s3a.fast.upload to trueIt is an output stream mechanism in which large files/streams are uploaded in the form of blocks with the size set by fs.s3a.multipart.size. These blocks can be buffered on disk, array (on JVM Heap memory) and byte buffers. The buffering mechanism can be set using using property fs.s3a.fast.upload.buffer. Its valid values are: disk, array, and bytebuffer. The default value is disk.

If the block output type is array or bytebuffer, then there is a limit on how many buffers can be queued for upload. The queue size is 15 by default. This is to ensure the memory footprint of default is less. The queue size is equal to fs.s3a.max.total.tasks and therefore can be configured with higher value for large JVMs.

New Enhancements in Hive

These are the Hive enhancements in this release:

  • QDS supports importing data in Parquet format to Hive.
  • HiveServer 2 when enabled can use Java8 and GIGC but this is not enabled by default. Create a ticket with Qubole Support to enable this on the account.
  • hive.optimize.index.filter is set to true by default. It will cause lesser splits to be created using PPD for ORC files.

 

New Enhancements in Presto

These are the Presto enhancements in this release:

  • Presto 0.180 is Generally Available and the latest stable version supported on QDS.
  • Presto 0.119 version is no longer configurable and it is the end of life for version 0.119.

New Enhancements in Spark

These are the Qubole Spark enhancements in this release:

  • Dataframe read with partition information in the path works as expected with Qubole's optimization for partition discovery. Create a ticket with Qubole Support to enable this feature on the account.
  • The Structured Streaming tab is available in the Spark UI to show progress on active streaming queries. The UI shows following details for all the active streaming queries:
    • Query ID and Name
    • Source and Sink Description
    • Plot of Trigger Execution vs. Timestamp
    • Plot of Processed & Input Rows Per Second vs. Timestamp
  • Spark cluster bring up time has been improved. Create a ticket with Qubole Support to enable this feature on your Spark clusters.
  • The spark_version parameter is mandatory while creating a Spark cluster through REST API/SDK.

 

New Enhancements in Notebooks

These are the Notebooks' enhancements in this release:

  • Notebook Folders’ object policy access-control-lists (ACLs) can be set through the REST API call.
  • The Markdown (md) interpreter in notebooks is now available with the pegdown parser. A new md-interpreter created will use this new parser by default.
  • Deletion of notebooks is not allowed if there are active schedule jobs/commands.

 

UI Enhancements

These are the UI enhancements in this release:

  • New and improved search experience is available in the Analyze UI. The new feature supports keyword/text searches on the query text. It allows users to search through command history for the past 3 months. This search feature is rolled out to QDS users as part of the segmented feature rollout. To use this feature in advance, create a ticket with Qubole Support to enable this feature on your account. For more information, see the documentation
  • QDS allows creating a new group without adding even a single group member in it.

 

New Alpha/Beta Features in QDS-on-AWS

Qubole Supports Package Management on Spark Clusters

QDS UI supports creating an environment to easily install and upgrade R and Python packages on Spark clusters. It also supports seamless installation/upgrade of packages on a running Spark cluster. The name of the environment is editable. Packages can be added in the Package Management UI in two different modes, simple mode and advanced mode. The default is the simple mode and you can choose Advanced mode that suggests the syntax for adding package names. For more information, see the documentation.

This feature is available for an alpha access. To get this feature enabled on your QDS account, create a ticket with Qubole Support.

 

List of Changes and Bug Fixes in Qubole-on-AWS

AIRFLOW

New Features

New   EAM-328: Run airflow backup_dags on an Airflow cluster to back up the DAGs to the S3 location. After backing up the DAGs can be deleted. To restore the DAGs on the Airflow cluster node, run airflow restore_dags. This backup and restoring DAG feature is supported only on Airflow version 1.8.2.

Enhancements

Change   QBOL-6184: QDS supports multiple versions of Airflow. The supported versions are version 1.7.0 and 1.8.2 (beta).

AWS CLUSTER MANAGEMENT

Bug Fixes

Fix   ACM-826: Improvements for validating compute credentials on the cluster.
Fix   ACM-1428: The issue in which an open security group was getting created even after enabling required security restrictions on the cluster has been resolved now.
Fix   ACM-1542:  Currently, the cluster object policy's deny-access restricts users from performing operations such as start, delete, edit, view, and terminate clusters. But it still allowed the user to view the cluster with deny-access for read permission from the cluster drop downs and be able to run a command against it from the UI. This fix removes access to such clusters from the UI to run commands.

Fix   ACM-1582: The issue in which cloning a cluster changes the instance type in the cloned cluster, has been resolved.

 

Enhancements

Change   ACM-1431: It is now possible to enable object-level access-control-lists (ACL) for clusters through the QDS UI. This feature is not enabled by default. Create a ticket with Qubole Support to enable this feature on the QDS account.
Change   ACM-1508: Now Job history server starts before execution of node bootstrap and Tez configuration.
Change   MW-1343: Cluster-level permissions can be set for default clusters created during creating/cloning an account.

 

HADOOP 2

New Features

New   HADTWO-1015: QDS now supports BlockOutputStream in the s3a file system. To enable it, set fs.s3a.fast.upload to trueIt is an output stream mechanism in which large files/streams are uploaded in the form of blocks with the size set by fs.s3a.multipart.size. These blocks can be buffered on disk, array (on JVM Heap memory) and byte buffers. The buffering mechanism can be set using using property fs.s3a.fast.upload.buffer. Its valid values are: disk, array, and bytebuffer. The default value is disk.

If the block output type is array or bytebuffer, then there is a limit on how many buffers can be queued for upload. The queue size is 15 by default. This is to ensure the memory footprint of default is less. The queue size is equal to fs.s3a.max.total.tasks and therefore can be configured with higher value for large JVMs.

Bug Fixes

Fix   HADTWO-1058: The issue in which the decommissioning-node count was incorrectly displayed in the Resource Manager UI page has been resolved.
Fix   HADTWO-1069: The issue in which the Job History Server and Resource Manager out files displayed repetitive warning messages has been resolved.
Fix   HADTWO-1077: The issue in which the Resource Manager was displaying the node count in negative and nodes were not downscaling, has been resolved.

Fix   HADTWO-1096: Fixed an issue in which a node sometimes gets recommissioned even when it is already present in yarn.exclude. QDS decommissions the node as soon as it is added to yarn.exclude.


 

Enhancements

Change   HADTWO-871: Hadoop 2 clusters support upscaling based on reducers using a configuration option, which is disabled by default. It can be enabled by adding it as an Hadoop override:
mapred.reducer.autoscale.factor=1.

 

HIVE

New Features

New   EAM-232: QDS supports importing data in the Parquet format to Hive.

 

Bug Fixes

Fix   MW-1064: The command failure issue for whitespaces before comments in Hive QL has been resolved.

Enhancements

Change   HIVE-2244: In Hive 1.2, earlier, the password details were shown in logs. This change will mask the original passwords.
Change   HIVE-2285: HiveServer 2 when enabled can use Java8 and GIGC but this is not enabled by default. Create a ticket with Qubole Support to enable this on the account.
Change   HIVE-2491: Ported a feature from Qubole Hive 1.2 into Qubole Hive 2.1.1. With Hive Authorization enabled, you can set hive.qubole.authz.strict.show.tables to true to allow users to only see tables that they have SELECT access to in the show tables query result.
Change   HIVE-2516: hive.optimize.index.filter is set to true by default. It will cause lesser splits to be created using PPD for ORC files.


PRESTO

Bug Fixes

Fix   PRES-1062: Users can set hive.metastore-cache-ttl-bulk to speed up Presto queries as a Presto override. For example, hive.metastore-cache-ttl-bulk=24h. Enabling this option caches tables fetched from the metastore for the configured duration. Thus, fetching tables/columns through JBDC drivers becomes faster.  
Fix   PRES-1220: The permission issue causing the exception - java.io.FileNotFoundException: /media/ephemeral0/logs/yarn/scaling.log (Permission denied) in Hadoop jobs has been resolved.

 

Enhancements

Change   PRES-1177: Presto 0.180 is Generally Available on QDS and it is the latest supported stable version.
Change   PRES-1241: End-of-life for Presto 0.119 and it is no longer configurable on QDS.

 

QDS

New Features

New   AN-71: New and improved search experience is available in the Analyze UI. The new feature supports keyword/text searches on the query text. It allows users to search through command history for the past 3 months.

This search feature is rolled out to QDS users as part of the segmented feature rollout. To use this feature in advance, create a ticket with Qubole Support to enable this feature on your account.

 

Bug Fixes

Fix   AD-77: QDS resets changes to the account settings form when a user navigates away without saving the form.
Fix   AD-134: The issue in which the All Commands report in the QDS Usage UI was giving lesser results than the selected range, has been resolved.
Fix   AN-255: The issue related to cloning workflow templates has been resolved.
Fix   AN-345: The FairScheduler pool value is now reflected for the selected cluster in all scenarios.
Fix   AN-359: If a user with the admin privilege tries accessing a command from a different account, this message is displayed:

(This command belongs to account id : <acc_id> (<acc_name>). Please switch to that account in order to run it. Switch Account).
Clicking Switch Account takes the user to the account to which the command belongs.
Fix   MW-1266: A timeout for command execution can be set in seconds in the command API call. It has a default value of 129600 seconds (36 hours). QDS checks the timeout for a command every 60 seconds. If the timeout is set for 80 seconds, the command gets killed in the next minute that is after 120 seconds.

Enhancements

Change   AD-144: QDS allows creating a new group without adding even a single group member in it.
Change   EAM-255: The Business Edition sign-up page reflects the preferred sign-up method.
Change   EAM-274: When Bastion Node is enabled, QDS supports account-level custom SSH keys in the data store/metastore. To add the custom SSH keys, create a ticket with Qubole Support.

 

SCHEDULER

Bug Fixes

Fix   SCHED-153: The issue in which the Scheduler rerun got triggered multiple times has been resolved.
Fix   SCHED-161: The rerun of a Scheduler Instance will create a Command on the actual user's name who reran the instance instead of the actual scheduler's owner.

 

Enhancements

Change   EAM-294: The Scheduler UI displays security breaches for the scheduled jobs.
Change   SCHED-132: Qubole Scheduler honors either the schedule frequency or cron expression based on the latest update on that schedule.

 

SPARK

Bug Fixes

Fix   SPAR-1723: The Structured Streaming tab is available in the Spark UI to show progress on active streaming queries. The UI shows following details for all the active streaming queries:

  • Query ID and Name
  • Source and Sink Description
  • Plot of Trigger Execution vs. Timestamp
  • Plot of Processed & Input Rows Per Second vs. Timestamp

Fix   SPAR-1738: The issue in which a Spark Scala command failed due to out of memory has been resolved.
Fix   SPAR-1880: Dataframe read with partition information in the path works as expected with Qubole's optimization for partition discovery. Create a ticket with Qubole Support to enable this feature on the account.
Fix   SPAR-1891: The Java 8 security policy has been updated. Users will now be able to use Kafka streaming with Spark 2.2.0.

Fix   SPAR-1935: Spark applications can hang if important listener events are lost. If you set spark.qubole.setIdleShutdownThreadAsDaemon to true and spark.qubole.killApplicationOnEventDrop to true, the application will crash when an event is lost instead of going to a hung state. You can then increase the listener bus size (set spark.scheduler.listenerbus.eventqueue.size) and rerun the application.

 

Enhancements

Change   SPAR-1713: Spark cluster bring up time has been improved. Create a ticket with Qubole Support to enable this feature on your Spark clusters,
Change   SPAR-1894: The spark_version parameter is mandatory while creating a Spark cluster through REST API/SDK.
Change   SPAR-1993: The default value of spark.sql.autoBroadcastJoinThreshold in Spark 2.2.0 is changed from 50 MB to 10 MB.

 

STREAMX

Enhancements

Change   SX-58: The mounting directory /media/ephemeral0/streamx on a StreamX cluster inside docker container at the same location.
Change   SX-60: QDS supports SSL-enabled Kafka clusters to connect from the StreamX cluster.

 

TEZ

Bug Fixes

Fix   QTEZ-190: The Application UI link issue under the Resources tab in Analyze UI For Hive on Tez queries has been resolved.

Enhancements

Change   QTEZ-203: QDS supports offline Tez-UI for Hive 2.1.1. With this change, the Tez Application UI can be viewed even after the cluster is down.

ZEPPELIN/NOTEBOOKS

New Features

New   ZEP-1234: Notebook Folders’ object policy access-control-lists (ACLs) can be set through the REST API call.

Bug Fixes

Fix   ZEP-1394: The Markdown (md) interpreter in notebooks is now available with the pegdown parser. A new md-interpreter created will use this new parser by default.
Fix   ZEP-1493: Cancellation of paragraphs in original notebook would not affect pending/running paragraphs in the cloned notebook.

Enhancements

Change   ZEP-1104: Simplified Spark interpreter properties by removing unnecessary default interpreter properties. Create a ticket with Qubole Support to enable this feature on your account. When this feature is enabled, it will not affect existing Spark interpreters.
Change   ZEP-1158: Qubole Notebooks provides relevant autocomplete suggestions for Scala commands.
Change   ZEP-1307: Show the Interpreter list as expand collapse blocks so that user can switch between the interpreters quickly. However, the Edit button in the interpreter only appears after expanding the setting, which would be fixed shortly by the next release.
Change   ZEP-1338: Changes to the spark.yarn.queue interpreter property are restricted for Spark interpreters.

Change   ZEP-1369: Packages can be added in the Package Management UI in two different modes, simple mode and advanced mode. The default is the simple mode and you can choose Advanced mode that suggests the syntax for adding package names.
Change   ZEP-1346: The opened notebook name is displayed in the web browser tab.
Change   ZEP-1411: Environment name field is editable for an environment in Package Management.
Change   ZEP-1457: Deletion of notebooks is not allowed if there are active schedule jobs/commands.
Change   ZEP-1476: The issue in which %knitr interpreter was not working has been resolved.
Change   ZEP-1492: Context menu is available on notebooks and folders.
Change   ZEP-1528: UI Changes for Job Stage Progress.
Change   ZEP-1540: Ported ZEPPELIN-2904 Show Remove Paragraph button upfront.

 

List of Hotfixes in QDS-on-AWS Since 19th September 2017

Fix   AN-130: If a command is edited and run/saved from/in History, and if it had a saved query associated with it, the edited query is added as a new version to the corresponding saved query (same as Save and Run functionality from Workspace).
Fix   AN-229: Switch to the preview tab while editing the query is prevented.
Fix   AN-257: Results for the commands saved from Analyze > History are retained.
Fix   AN-301: When auto complete is being displayed, pressing ESC will close the autocomplete widget and retains the cursor focus on the editor.
Change   EAM-278: QDS allows the user to self start the Cloud Agents free trial through the UI and REST API.
Fix   PRES-1279: Presto version 0.157 and 0.180 used to delete all data of partitioned table if no rows were inserted. This behavior has been changed to delete old data only when there are some rows inserted and limit this deletion to the partitions where rows were inserted.

Have more questions? Submit a request

Comments

Powered by Zendesk