Qubole Release Notes for QDS Version R47 19-Sep-2017

Release Version: 47.44.0

Qubole provides Qubole Data Service (QDS) on Amazon Web Service (AWS), Microsoft Azure, and Oracle Bare Metal Cloud (BMC).

Note: This set of release notes only contains details about the release of QDS-on-AWS.

For details of what has changed in this version of QDS-on-AWS, see:

  • What is New in QDS-on-AWS
  • New Beta Features in QDS-on-AWS
  • List of Changes and Bug Fixes in QDS-on-AWS

 

What is New in QDS-on-AWS
New in AWS Cluster Management
These are the new enhancements in AWS clusters in this QDS release:

  • EBS volumes are tagged while creation during EBS upscaling which used to happen in a different API call earlier. One of the major benefit is that there is one less API call in addition to the opportunity to specify a stricter policy based on tags on volumes. 
  • When the node type is changed from one EBS-compatible type to another EBS-compatible type, EBS settings are preserved. Otherwise, they are reset.

New Enhancements in Hive
These are the Hive enhancements in this release:

  • With Hive Authorization enabled, you can set hive.qubole.authz.strict.show.tables to true to allow users to only see tables that they have SELECT access to in the show tables query result.
  • Tez is the default engine for HiveServer 2 that is enabled on an Hadoop 2 (Hive) cluster. HiveServer2  is supported on Hive versions 1.2.0 and 2.1.1.
  • Tez is also the default execution engine for queries on Hive version 2.1.1.
  • All Hive Queries executed through the QDS UI would use Tez as the execution engine. Tez will be the default execution engine for all new accounts.  However, these two feature will be rolled out in the Segmented Feature Rollout for different levels of customers. To get this feature enabled or for more information, create a ticket with Qubole Support

QDS supports Importing ipynb Notebooks  

With this release, Qubole supports importing ipynb notebooks. For information on how to import notebook, see the documentation

Spark 2.2.0 is the Latest Supported Version On Qubole Spark

Spark 2.2.0 is the latest version supported on Qubole Spark and it is reflected as 2.2 latest (2.2.0) on the Spark cluster UI. For information on how to set the Spark version on a cluster, see the documentation

The newly released 2.2.0 will have the following additional configuration set by default:

spark.sql.autoBroadcastJoinThreshold=52428800
spark.network.timeout=1200s
spark.hadoop.mapreduce.use.parallelmergepaths=true
spark.hadoop.hive.qubole.consistent.loadpartition=true

These are known issues of Spark 2.2.0:

  • Avro write is failing with org.apache.spark.SparkException: Task failed while writing rows. This is a known open source issue. Until it is fixed, as a workaround,  append the following to your node bootstrap:
    rm -rf /usr/lib/spark/assembly/target/scala-2.11/jars/spark-avro_2.11-3.2.0.jar
    /usr/lib/hadoop2/bin/hadoop fs -get s3://paid-qubole/spark/jars/spark-avro/spark-avro_2.11
  • java.security.AccessControlException with JAVA8. Users would not be able to use kafka streaming with 2.2.0 due to the Java 8 security policy in the AMI. This issue will be fixed in the next release.

New Enhancements in Spark

These are the Qubole Spark enhancements in this release: 

  • You can use Hive metastore v1.2 with Spark. Create a ticket with Qubole Support for more information.
  • Improved performance of ALTER TABLE RECOVER partitions.

New Enhancements in Notebooks

These are the Notebooks' enhancements in this release: 

  • Under the Users folder in Notebooks/Dashboards , users are not allowed to create, move, and delete folders.
  • Whenever the user right-clicks on an object on the Notebooks UI sidebar or the sidebar itself, it shows its options menu, just like it is shown in operating systems.
  • Qubole Notebooks support the full screen mode.

UI Enhancements
These are the UI Enhancements in this release:

  • Data Model Recommendations in the Usage page has these filter recommendations:
    • Type
    • Schema
    • Tables 
    • Date (day, week, month)
  • Added the Refresh Table as a supported command type under the Workflow command in the Analyzer/Scheduler UI query composer.
  • In Analyze > Workspace and Scheduler, clicking the Save button saves the whole text instead of saving only the selected text. 

New Beta Features in QDS-on-AWS

 Launching a Web Terminal for Clusters on QDS-on-AWS

A new feature for launching a web terminal for a cluster has been introduced with this release. This feature is available for beta access. Create a ticket with Qubole Support to get this feature enabled on your account. You can launch a web terminal on <AWS environment>/butterfly-terminal-<cluster ID>. Where <AWS environment> can be: https://api.qubole.comhttps://us.qubole.com, or https://india.qubole.com.

Example: https://api.qubole.com/butterfly-terminal-<cluster ID>

Qubole Supports Package Management on Spark Clusters

QDS UI supports creating an environment to easily install and upgrade R and Python packages on Spark clusters. It also supports seamless installation/upgrade of packages on a running Spark cluster. This feature is available for alpha access. To get this feature enabled on your QDS account, create a ticket with Qubole Support. For more information, see this documentation


List of Changes and Bug Fixes in Qubole-on-AWS

AWS CLUSTER MANAGEMENT
Bug Fixes

Fix   UI-5781: When the node type is changed from one EBS-compatible type to another EBS-compatible type, EBS settings are preserved. Otherwise, they are reset.


Enhancements

Change   ACM-1342: EBS volumes are tagged while creation during EBS upscaling which used to happen in a different API call earlier. One of the major benefit is that there is one less API call in addition to the opportunity to specify a stricter policy based on tags on volumes.

Change   ACM-1451: A new feature for launching a web terminal for a cluster has been introduced with this release. This feature is available for beta access. Create a ticket with Qubole Support to get this feature enabled on your account. You can launch a web terminal on <AWS environment>/butterfly-terminal-<cluster ID>. Where <AWS environment> can be: https://api.qubole.comhttps://us.qubole.com, or https://india.qubole.com.

Example: https://api.qubole.com/butterfly-terminal-<cluster ID> 


HADOOP 2

Bug Fixes

Fix   HADTWO-358: This fix binds the Hadoop daemons such as resource manager, namenode, timeline server and job history server to 0.0.0.0. Earlier, these daemons were bound to master DNS, which caused a problem when customer attached an elastic IP address to the private IP address.
Fix   HADTWO-1012: This fixed an issue to avoid DNS resolution in case it is not required.
Fix   HADTWO-1047: The worker node picks the value of fs.s3a.buffer.dir from the node where the job starts. So if the job starts on a node which is EBS only type and the worker node contains only instance store, then an exception will be thrown by the worker node at LocalDirAllocator class as it does not contain any volume specified in the buffer directory. Since in EBS-only type worker nodes, EBS is symlinked to ephemeral and the ephemeral path has been appended to the value of the buffer directory. In addition, the value of fs.s3n.cache.dir has been changed to the ephemeral0 path in case where EBS or NVM only volumes are present.


Enhancements

Change   HADTWO-988: The Requester pays option is now supported in the S3a File System. By default it is disabled, to enable it set fs.s3a.requester-pays.enabled to true.
Change   HADTWO-1026: This introduces a new metric collector in s3a file system. This metric collector collects metrics for all s3 requests. By default the collector is enabled and logs only those requests which were either throttled or encountered internal server errors (status code 5xx). The collector can be enabled/disabled using property fs.s3a.metrics-collection.enabled. To enable logging for all s3 requests, set fs.s3a.request.log.enable to true. Note that fs.s3a.request.log.enable works only if collector is enabled.

HIVE 1.2

Bug Fixes

Fix   HIVE-1503: Enforce bucketing with dynamic partitions with hive.allow.move.on.s3 creates empty buckets as required.
Fix   HIVE-1504: Enforce bucketing with dynamic partitions with dynamic prefix enabled does not create extra empty buckets.
Fix   HIVE-2293: Hiding critical information shown in the HiveServer2 UI.
Fix   HIVE-2361: Provide the user with different error messages depending on the Thrift client exception code. Related Open Source Jira - HIVE-9423.


Enhancements

Change   HIVE-2307: If Qubole's S3 Storage Authorization is enabled, non HiveServer2 (HS2) shell commands will get disallowed. HS2 shell commands were already disallowed.
Change   HIVE-2326: With Hive Authorization enabled, you can set hive.qubole.authz.strict.show.tables to true to allow users to only see tables that they have SELECT access to in the show tables query result.


Change   HIVE-2332: Qubole Hive supports creating socket with unresolved DNS in the Beeline. This facilitates the Beeline client to contact the cluster inside a public subnet.

Change HIVE-2403: Tez is the default execution engine for all queries on Hive version 2.1.1. 

Change   HIVE-2410: Tez is the default engine for Hive Server 2 that is enabled on an Hadoop 2 (Hive cluster). HiveServer2  is supported on Hive versions 1.2.0 and 2.1.1.

Change   HIVE-2412: All Hive Queries executed through the QDS UI would use Tez as the execution engine. However, this feature will be rolled out in the Segmented Feature Rollout for different levels of customers. To get this feature enabled or for more information, create a ticket with Qubole Support.
Change   HIVE-2413: Tez will be the default execution engine for all new accounts. However, this feature will To get this feature enabaled or for more information, create a ticket with Qubole Support.
Change   HIVE-2429: Aux jars will be picked up from the aux lib directory which is created during packaging, instead of hive-site.xml as done earlier. These jars will be added to HIVE.AUX.JARS.PATH.


PRESTO

Bug Fixes

Fix   PRES-1147: The value of query.max-memory-per-node is set as 32% of RAM, and above in Presto 0.157 similar to Presto 0.142.
Fix   PRES-1165: If SSL is enabled, HTTP service is not started on slave nodes to ensure that HTTPS is used for the internode communication.


Enhancements

Change   PRES-1115: Jitter based retry logic is supported in ObjectListing of S3.

Change   PRES-1152: Predicate Pushdown has been enabled by default in the Parquet optimized reader.


QDS

New Features

New   AD-7: Data Model Recommendations in the Usage page has these filter recommendations:

  • Type
  • Schema
  • Tables
  • Date (day, week, month)

New   UI-6141: Added the Refresh Table as a supported command type under the Workflow command in the Analyzer/Scheduler UI query composer.


Bug Fixes

Fix   AN-209: Single-row results are processed correctly even if it contains only a space character.
Fix   MOJ-265: Table Views are not considered in AIR (Alerts, Insights, and Recommendations) features.
Fix   MW-1255: Command results API and status_with_results API support a new parameter called raw. If raw=true, the results are displayed in the RAW format without converting ^A to tabs. There are some exceptions to this where if it is a Hive command and number of result rows are less than 1000, then delimiters are still tabs. This works fine for Presto.
Fix   UI-6258: In Analyze > Workspace and Scheduler, clicking the Save button saves the whole text instead of saving only the selected text.

Enhancements

Change   MW-1125: Multiple issues while accessing data in S3 buckets and validating default location for IAM-Role based account and IAM-Keys based account for v4 region have been fixed.
Change   UI-4100: Unvalidated data stores are supported in DB import/export commands.


SPARK

Backports from Open Source Spark

Fix   SPAR-1659: Backport SPARK-17685 from OS 2.2.0 to 2.1.0 - Make SortMergeJoinExec's currentVars is null when calling createJoinKey.


Bug Fixes

Fix   SPAR-1350: The issue in which the Spark Session name getting printed as SparkContext has been resolved now.


Enhancements

Change   SPAR-1276: You can use Hive metastore v1.2 with Spark. Create a ticket with Qubole Support for more information.

Change   SPAR-1700: The newly released 2.2.0 will have the following additional configuration set by default:

spark.sql.autoBroadcastJoinThreshold=52428800
spark.network.timeout=1200s
spark.hadoop.mapreduce.use.parallelmergepaths=true
spark.hadoop.hive.qubole.consistent.loadpartition=true

Change   SPAR-1702: Spark 2.2.0 is the latest version supported on Qubole Spark and it is reflected as 2.2 latest (2.2.0) on the Spark cluster UI.
Change   SPAR-1823: Improved performance of ALTER TABLE RECOVER partitions.

 

STREAMX
Enhancements

Change   SX-56: The issue in which the node bootstrap is not running on a StreamX cluster has been resolved now.


ZEPPELIN/NOTEBOOKS

New Features

New   UI-5598: Whenever the user right-clicks on an object on the Notebooks UI sidebar or the sidebar itself, it shows its options menu, just like it is shown in operating systems.

New   ZEP-858: With this release, Qubole supports importing ipynb notebooks.
New   ZEP-1235: Under the Users folder in Notebooks/Dashboards , users are not allowed to create, move, and delete folders.

 


Bug Fixes

Fix   ZEP-1179: The label, Permission has been changed to Permissions on the Dashboards page.
Fix   ZEP-1180: A typo is fixed in the Dashboards message.
Fix   ZEP-1205: Change in font for Interpreter buttons and Notebooks paragraphs.


Enhancements

Change   ZEP-928: The cursor appearance in the Notebooks UI has been changed.
Change   ZEP-1152: A user cannot see other user's binding(s) in Interpreter bindings.
Change   ZEP-1153: The Interpreter page does not show other users’ interpreters by default.
Change   ZEP-1186: Qubole Notebooks support the full screen mode.


List of Hotfixes in QDS-on-AWS Since 16th August 2017


Fix   ACM-1328: Terminate the cluster due to inactivity when the node's public and private DNS are not enabled/resolved and cluster is inactive for the configured time.
Fix   HADTWO-1042: Fixed an issue where the /api/v1.2/commands/<command-id>/jobs API did not work for certain users.
Fix   HADTWO-1051: QDS now supports V2 of List Objects API in the S3a file system. This version improves performance specialy for listings performed on versioned buckets. To enable it set fs.s3a.list-objects.v2 enabled to true.

Fix   SPAR-1863: The Issue in which the Spark UI was not accessible is resolved.

Fix   SPAR-1880 and SPAR-1882: spark.read issue with parquet has been resolved.

 

List of Hotfixes in QDS-on-AWS After 19th September 2017


Fix   AN-130: If a command is edited and run/saved from/in History, and if it had a saved query associated with it, the edited query is added as a new version to the corresponding saved query (same as Save and Run functionality from Workspace).
Fix   AN-229: Switch to the preview tab while editing the query is prevented.
Fix   AN-257: Results for the commands saved from Analyze > History are retained.
Fix   AN-301: When auto complete is being displayed, pressing ESC will close the autocomplete widget and retains the cursor focus on the editor.
Change   EAM-278: QDS allows the user to self start the Cloud Agents free trial through the UI and REST API.
Fix   PRES-1279: Presto version 0.157 and 0.180 used to delete all data of partitioned table if no rows were inserted. This behavior has been changed to delete old data only when there are some rows inserted and limit this deletion to the partitions where rows were inserted.

Have more questions? Submit a request

Comments

Powered by Zendesk