Qubole Release Notes for QDS Version R44 14-Jun-2017

Release Version: 44.53.3


For details of what has changed in this version of QDS, see What is New, New Beta Features, and List of Changes and Bug Fixes in this Release.


An Heads-up on Notebook Folders

Qubole will enable Notebook folders on all QDS accounts after the R44 deployment.

What is New

QDS Supports KMS-based EBS encryption for EBS-only Instances

Qubole now supports KMS based EBS encryption for EBS only instances. An encrypted snapshot is required to use this feature. To enable this on the QDS account, create a ticket with Qubole Support.

Qubole Supports Bucket-to-region Mapping in the S3A Filesystem

A user can provide a JSON specifying the bucket-to-region mapping in the S3A filesystem. This feature is enabled by specifying the below Hadoop overrides.

fs.s3.awsBucketToRegionMapping =
{"acm-354-test": "s3.ap-northeast-2.amazonaws.com", "acm-920": "s3.ap-south-1.amazonaws.com"}

For more information, see the documentation

New Enhancements in Hive

These are new enhancements in Hive in this QDS release:

  • Hive Metadata Cache can be enabled on an Hadoop 2 (Hive) cluster and also from the Hadoop 2 (Hive) cluster configuration UI. For more information, see the documentation
  • Qubole supports a user-personalized Hive bootstrap through a REST API call. For more information, see the documentation
  • QDS supports Hive Authorization in Hive 2.1.1 when HiveServer2 is enabled.

 

New Enhancements in Presto

These are the new enhancements in Presto in this QDS release:

  • The Query Tracker link in Presto logs will now show the latest query tracker only when the query is running or the Presto cluster that ran the query has information about it. Else, the old legacy query tracker will be shown.
  • Presto 0.174 is now available as an unstable release. It is available for development and POC workloads. OSS Release notes are available at https://prestodb.io/docs/current/release/release-0.174.html
  • All Qubole features are available in Presto 0.174.
  • Rubix is upgraded to 0.2.9 which has the fix to prevent the data corruption when handling files with a size more than 2GB.
  • Presto clusters support datadog settings that is set at the QDS account level. Users can add their Datadog api and app tokens and get the Presto query metric in their Datadog account. These metrics are displayed on the Datadog account:

    presto.queuedQueries
    presto.planningQueries
    presto.startingQueries
    presto.runningQueries
    presto.finishingQueries
    presto.finishedQueries
    presto.failedQueries
  • Presto-ui-clusterid now works even for an SSL-enabled Presto cluster.

New Enhancements in Notebooks

These are the new enhancements in Notebooks in this QDS release:

  • Consume web service to show the notebook examples by categories.
  • A cluster dropdown is added within the notebook page that helps a user to select and assign a cluster to the notebook.
  • The Notebooks UI now supports importing a notebook from a web URL containing a valid notebook JSON.

For more information on notebook enhancments, see the documentation.

UI Enhancements

Qubole has a few enhancements in this QDS release that are listed below:

  • Run queries by highlighting them in the Analyze UI query editor.
  • When user highlights a particular part of the query and clicks the Run button, the selected part is run as a separate query. 
  • When number of columns are greater than 30, render only first 30 columns and provide an option where a user can select the columns to view.
  • The Include header option in the Download Results dropdown in Analyze is now selected by default.
  • Query Export/Query Import that is run on Hadoop clusters will have the first running Hadoop 2/Hadoop 1 cluster selected by default.


New Beta Features

Notebook Dashboards

Notebook Dashboards are available for beta access. To enable this on the account, create a
ticket with Qubole Support.
Dashboards provide an interface for sharing your analytics built using Qubole notebooks. Users
can publish notebooks as dashboards, and can share dashboards with specific users within
your organization. For more information, see the documentation

 

Spark supports Qubole Hive Authorization

Spark supports Authorization of Hive Objects that is honors the privileges and roles set in Hive as per Hive auhthorization. This feature is available for beta access. Create a ticket with Qubole Support to get this feature enabled on your account.  For more information, see the documentation


List of Changes and Bug Fixes in this Release

AIRFLOW

Enhancements

Change   AIR-7: Airflow can be used to submit Notebook commands. A sample of running a parameterized notebook is given below.

t2 = QuboleOperator(
task_id='spark_cmd',
command_type="sparkcmd",
note_id="36995",
qubole_conn_id='qubole_prod',
arguments='{"name":"hello world"}',
dag=dag)


AWS CLUSTER MANAGEMENT

New Features

New   ACM-1084: Qubole now supports KMS-based EBS encryption for EBS-only instances. An encrypted snapshot is required to use this feature. To enable this on the QDS account, create a ticket with Qubole Support.


Bug Fixes

Fix   ACM-1144: This fix moves the downloading of node bootstrap from s3cmd to the Hadoop client for AWS clusters. Qubole uses S3AFileSystem to connect to S3 if S3a File System is enabled. Otherwise, Qubole uses the NativeS3FileSystem.
Fix   ACM-1150: This resolves a bug where the cluster logs for a cluster instance were stored at an inconsistent location in some cases.
Fix   ACM-1174: The issue where autoscaling logs were missing spot loss messages has been fixed.
Fix   ACM-1188: Sporadic connectivity failures to Qubole-managed metastore from different clusters have been fixed.
Fix   MW-634: The maximum limit for custom EC2 tags has been increased from 7 to 20.


Enhancements

Change   ACM-720: Qubole now ensures that even on instance reboot, daemons such as datanode, nodemamager, sparkhistoryserver, timelineserver, resourcemanager, namenode, and historyserver get restarted.


HADOOP 2

Bug Fixes

Fix   HADTWO-869: This fix makes dominant resource fairness policy as default scheduling policy to honor both memory and vcores while making decisions for scheduling.
Fix   HADTWO-894: This fixes an issue with the S3A File System where it was falling back to the next role in the chain of credentials on role assumption failures.
Fix   HADTWO-895: This fix makes FileSystem object cache credentials (IAM roles) intelligent.
Fix   HADTWO-899: This fixes an issue due to which already decommissioned nodes were recommissioned in Hadoop causing job delays.
Fix   HADTWO-902: Reduced the retry time for which ApplicationMaster (AM) keeps on waiting to connect to NodeManager on lost nodes. This fixes the issue due to which AM gets stuck for a long duration if spot loss happens during the job’s tenure.


Enhancements

Change   HADTWO-906: Qubole supports bucket-to-region mapping in the S3A filesystem.


HIVE 1.2

New Features

New   HIVE-2158: QDS supports Hive Authorization in Hive 2.1.1 when HiveServer2 is enabled.


Bug Fixes

Fix   HIVE-1515: This fix supports parallel INSERT INTO values from a same Hive session. Hive session ID will be generated randomly for each query. This avoids race conditions in session directories.
Fix   HIVE-2079: This fix appends random ID to the empty file created in getCloudFSCurrentTime to get the S3 bucket’s timestamp. This is done to avoid race condition between the dynamic partition INSERT queries.
Fix   HIVE-2082: This fixes disk space filling up issue on a long running cluster caused by LocalMapRedTask logs.
Fix   HIVE-2098: This fix has added retry logic in case of NoSuchObjectError for database, table and partition.
Fix   HIVE-2109: This fixes an issue where the hive cli was letting authorization related configuration to be overridden by the user.
Fix   HIVE-2153: This fixes the MSCK REPAIR table with single threaded that throws a NullPointerException.
Fix   HIVE-2166: Update the 2.1.1 hive version (on cluster create page) to reflect the beta tag.


Enhancements

Change   HIVE-2150: Hive Metadata Cache can be enabled on an Hadoop 2 (Hive) cluster.
Change   MW-443: Qubole supports a user-personalized Hive bootstrap through a REST API call.
Change   UI-5852: Hive metadata caching can now be enabled from the Hadoop 2 (Hive) cluster configuration UI.

 


PRESTO

Bug Fixes

Fix   PRES-998: The Ruby client issue of not displaying errors for misspelled DDL has been fixed.
Fix   PRES-999: This fixes the query failure when user issues a Presto query on tables with LZO data.
Fix   PRES-1015: The Ruby client has been fixed to minimize the number of DNS lookup.
Fix   PRES-1019: Do not print output while setting cli.headers session property to prevent polluting the actual result.


Enhancements

Change   PRES-536: The Query Tracker link in Presto logs will now show the latest query tracker only when the query is running or the Presto cluster that ran the query has information about it. Else, the old legacy query tracker will be shown.
Change   PRES-584: New catalogs specified in Presto overrides are now automatically added to datasources in the config.properties file
Change   PRES-646: Users can add their Datadog api and app tokens and get the Presto query metrics in their Datadog account.
Change   PRES-1031: presto-ui-clusterid now works even for an SSL-enabled Presto cluster.
Change   PRES-1037: Presto 0.174 is now available as an unstable release. It is available for development and POC workloads. OSS Release notes are available at Presto Release 0.174

All Qubole features are available in Presto 0.174.
Change   UI-5498: In the Presto Cluster UI, unstable versions of Presto are distinctly clarified.

QDS

New Features

New   UI-5264: Run queries by highlighting them in the Analyze UI query editor.
When user highlights a particular part of the query and clicks the Run button, the selected part is run as a separate query.

Bug Fixes

Fix   MW-633: The issue where multiple queries were failing with NoneType object has no attribute error has been fixed.
Fix   MW-640: GET Command templates call will only return active command templates.
Fix   MW-834: V4 signature S3 buckets are now supported in the Explore UI.
Fix   UI-5518: In Analyze, the Show more button is hidden when there are no additional results to be displayed.

Enhancements

Change   UI-5105: When number of columns are greater than 30, render only first 30 columns and provide an option where a user can select the columns to view.
Change   UI-5198: The Include header option in the Download Results dropdown in Analyze is now selected by default.
Change   UI-5228: Query Export/Query Import that is run on Hadoop clusters will have the first running Hadoop2/Hadoop1 cluster selected by default.



SCHEDULER

Enhancements

Change   SCHED-127: Qubole allows reusing schedule names of killed periodic jobs.

 


SPARK

New Features

New   QBOL-5679: Error Code Summary in GET API response for failed Spark Scala commands. This feature is available for beta access. Create a ticket with Qubole Support to get this feature enabled on your account. For more information, see the documentation


New   SPAR-1426: Spark supports Authorization of Hive Objects that is honors the privileges and roles set in Hive as per Hive auhthorization. This feature is available for beta access. Create a ticket with Qubole Support to get this feature enabled on your account. For more information, see the documentation


Bug Fixes

Fix   MW-604: Spark Script location with S3 path does work now.
Fix   SPAR-1319: DirectFileOutputCommitter (DFOC) is enabled by default in Spark 2.1.0.
Fix   SPAR-1359: Earlier, when the value of the driver memory was set using the Spark configure override parameter, it was not effective. This issue is now fixed in this release.
Fix   SPAR-1498: The issue where Spark 2.1 UI was not displaying active/completed tasks info has been fixed.


Enhancements

Change   HIVE-2020 and HIVE-2114: This fix makes loading Hive tables from Spark more robust to eventual consistency errors generally expected from cloud Blob stores. It can be enabled per job by passing the following command line option.

--conf spark.hadoop.hive.qubole.consistent.loadpartition=true

Change   SPAR-827: Multiple fixes for Spark application level metrics (bytes read/bytes written). The first task's metrics and the last task's metrics are now being accounted. Scheme matching is fixed in a couple of cases where it was broken.
Change   SPAR-1507: Fixed Qubole auto-scaling to cancel/lower requested executors as and when the current executors need changes.
Change   TES-2164: The R language version in the Qubole AMI has been upgraded to 3.3.3.


TEZ

Bug Fixes

Fix   HIVE-1952: This fix changes extraction directory for jetty files. Added request and out logs for TEZ UI.
Fix   QTEZ-135: This has fixed concurrent access of offline leveldb by caching the connection.
Fix   QTEZ-159: This fix keeps track of lost nodes in case of launching containers.


Enhancements

Change   QTEZ-143: Added RollingLevelDb support for the Timeline Server.


ZEPPELIN/NOTEBOOKS

New Features

New   MW-850: New System Users in Dashboards.
New   UI-5615: The Notebooks UI now supports importing a notebook from a web URL containing a valid notebook JSON.
New   UI-5758: View Details is available for dashboards from the notebooks page.

Bug Fixes

Fix   UI-5860: A correct message appears while deleting a notebook.
Fix   UI-5922: The Import button’s tooltip has been changed.
Fix   ZEP-824: Notebooks will retry to download the interpreter configuration file if the the operation had failed in its previous attempt.
Fix   ZEP-866: Current notebook run via API runs paragraph that are disabled. After this fix, paragraph that are disabled for run are not run as part of the notebook API run or schedule.
Fix   ZEP-881: This fixes streaming output for notebooks.
Fix   ZEP-956: The notebook hanging issue has been fixed.
Fix   ZEP-963: The issue where spark.driver.memory was not honored when the zeppelin_s3_package_name was used, has been fixed.



Enhancements

Change   UI-4116: A cluster dropdown is added within the notebook page that helps a user to select and assign a cluster to the notebook.
Change   UI-4486: Use full-length width for label in the cluster dropdown.
Change   UI-5472: Show the interactive mode action message for the Dashboard.
Change   UI-5700: Consume web service to show the notebook examples by categories.
Change   UI-5787: The delete Notebook confirmation dialog will show Notebook name instead of ID.
Change   UI-5806: Notebook and Dashboard: Update select styling of items.
Change   UI-5816: Qubole does not show the feature if user does not have a read permission to that feature resource.
Change   UI-5939: The Add paragraph button is more prominent in a Notebook.
Change   ZEP-891: The scroll bar does not get stuck inside paragraph while scrolling through the notebook.
Change   ZEP-947: Support movement of dashboards to new cluster on change of the associated cluster of a notebook.

 


List of Hotfixes Since 11th May 2017


Fix   ACM-1226: To enhance security in clusters, Qubole has added support to remove wide-open outbound (egress) rules in the cluster security group. Create a ticket with Qubole Support to get this feature enabled on your account.
Fix   MW-832: The issue where notebook actions did not bounce the QDS idle session timeout has been fixed.
Fix   PRES-1058: RubiX has been upgraded to 0.2.10 which uses much fewer resources and will prevent errors such as Too many errors talking to worker node to happen due to RubiX load.
Fix   UI-5107: The double-view display of a table issue on the Analyze UI has been fixed.
Fix   UI-5909: The query editor in the Analyze > Workspace does not get cleared when there are no saved queries present.

Have more questions? Submit a request

Comments

Powered by Zendesk