Qubole Release Notes for QDS Version R40 12-Jan-2017

Release Version: 40.0.0

For details of what has changed in this version of QDS, see What is NewList of Changes and Bug Fixes in this Release, and New Beta Features.

 

What is New


Progress-rate EBS Volume Upscaling Configuration Options

Storage-capacity upscaling in Hadoop2/Spark clusters using EBS volumes also supports upscaling based on the rate of increase of used capacity. For more information, see this documentation.

The following two new options are added in ebs_upscaling_config:

  • sampling_interval - It is the frequency at which the capacity of the logical volume is sampled. Its default value is 30 seconds. 
  • sampling_window - It is the number of sampling_intervals over which Qubole evaluates the rate of increase of used capacity. Its default value is 5. This means that the rate is evaluated over 150 (30 * 5) seconds by default. To disable upscaling based on rate and use only thresholds, this value may be set to 1. The logical volume is upscaled if, based on the current rate, it is estimated to get full in (sampling_interval + 120) secondds (the additional 120 seconds is because the addition of a new EBS volume to a heavily loaded volume group has been observed to take up to 120 seconds.)

 

UI Enhancements

Qubole has a few enhancements in this QDS release that are listed below:

  • Qubole has added a new homepage for QDS UI that will be the default landing page for new and existing users. For more information, see this documentation.
  • Qubole supports keyboard shortcuts on the Analyze page. For more information, see this documentation.
  • Users can send multiple email invites at once in the Control Panel > Manage Users section.
  • Qubole has added filters for logs such as errors and warnings on the Analyze page. 
  • Results are fetched and displayed for commands that have failed, provided they are available on Analyze.
  • ParseException in hive queries are highlighted at the appropriate location in the query editor.
  • Analyze results and logs tabs support complete raw result downloads.
  • A user’s email ID can be seen in the profile drop-down list available on the QDS UI top-bar.

 

New Beta Features

 

Presto Trial Versions

Presto 0.157 is now available as a trial version. Going forward, Presto 0.150 will be deprecated. Please migrate development and testing work that uses Presto 0.150 to Presto 0.157.

Disabling Public IP address and DNS for Nodes launched in Private Subnets

Currently, Qubole assigns public IP address and DNS to all nodes launched in private subnets. Since instances launched in private subnets cannot be reached from anywhere outside the subnet, public IP and DNS seems to be redundant.You can now disable public IP and DNS attachment for nodes launched in private subnets.
This feature is available for beta access. Contact help@qubole.com to enable this feature for the account.


List of Changes and Bug Fixes in this Release


AIRFLOW

Fix   QBOL-5769: Access Denied while fetching Datastores on the Airflow cluster configuration page for system-user
Fix   QBOL-5835: Fix for Airflow not picking jobs sometimes, changed num_runs in scheduler from infinite to 20.
Changed default Airflow instance type to m1.xlarge
Change   QBOL-5834: Disabling the fetch_logs feature from the Qubole Operator. To check the QDS logs, use the Goto QDS navigation hook from the Airflow Webserver. If this parameter is already set in the DAG script, it will become ineffective.
Change   QBOL-5967: Airflow web server will have the authentication enabled, which can not be overridden. Though Qubole tier will automatically take care of that and no inputs will be needed from users.

 

AWS CLUSTER MANAGEMENT

New   ACM-71: Currently, Qubole assigns public IP address and DNS to all nodes launched in private subnets. Since instances launched in private subnets cannot be reached from anywhere outside the subnet, public IP and DNS seems to be redundant.You can now disable public IP and DNS attachment for nodes launched in private subnets.
This feature is available for beta access. Contact help@qubole.com to enable this feature for the account.
Fix   ACM-460: Fixed an issue where a number of nodes in QDS UI were not being updated in a regular and predictable frequency.
Fix   ACM-694: Logs were cleaned up to get rid of misleading messages such as MissingParameter Source group ID missing.
Fix   ACM-780: Fixed an issue where a cluster start would fail because of a delay in getting public DNS for master in AWS.
Change   ACM-78: Qubole has added a validation for Bastion host in the list of validations it does before a cluster start. As a part of the validation, Qubole checks if the Bastion host is reachable from Qubole tunnel servers. If it is not, then Qubole throws an error before launching instances.
Change   ACM-621: The logs were cleaned up to get rid of unnecessary log statements.
Change   ACM-728: Cluster health checks are now run at a predictable and regular frequency.
Change   ACM-750: Fixed an issue where a dead master node of a previous cluster instance was popping up in the current cluster instance.


HADOOP

New   HAD-562: REST API to gracefully shutdown cluster nodes (to be run by only admins) in Hadoop clusters.


HADOOP 2

Fix   HADTWO-520: Fixes the race condition in a container completed operation in FairScheduler. This race condition sometimes results in negative core count and memory values on the RM UI.
Fix   HADTWO-601: Increasing maximum number of retries for DFS block write failures.
Fix   HADTWO-667: Changed defaults of the S3A file system for performance improvements.
Fix   HADTWO-683: Handled the exception in continuous scheduling due to changes of available resources while sorting the nodes.
Fix   HADTWO-692: Fixing autoscaling issues:

  • Qubole was using capacity for nodes in GS for resource calculation 
  • Not recommissioning nodes from GS when the cluster is at maximum cluster size 
  • Not recommissioning node when node is not in HDFS Decommissioning state 
  • Code cleanup

Fix   HADTWO-705: EBS upscaling now supports disks with a block device encryption.
Fix   HADTWO-735: Enable HTTPS in S3A file system if the AWS KMS client-side encryption is enabled.
Fix   HADTWO-706: Old log files for the namenode, Resource Manager, datanode, and so on are now zipped to save space.
Change   HADTWO-590: Added infrastructure to check spot loss notification periodically on spot nodes.
Change   HADTWO-593: Improved defaults for container packing. With this change, container packing would only be enabled on up-scaled clusters.
Change   HADTWO-625: Show more info for RM Node on the Hadoop UI.
Change   HADTWO-634: Storage capacity upscaling in Hadoop2/Spark using EBS volumes now supports upscaling based on the rate of increase of used capacity. The following two new options are added in ebs_upscaling_config:

  • sampling_interval - It is the frequency at which the capacity of the logical volume is sampled. Its default value is 30 seconds.
  • sampling_window - It is the number of sampling_intervals over which Qubole evaluates the rate of increase of used capacity. Its default value is 5. This means that the rate is evaluated over 150 (30 * 5) seconds by default. To disable upscaling based on rate and use only thresholds, this value may be set to 1. The logical volume is upscaled if, based on the current rate, it is estimated to get full in (sampling_interval + 120) seconds (the additional 120 seconds is because the addition of a new EBS volume to a heavily loaded volume group has been observed to take up to 120 seconds.)

Change   HADTWO-690: Created an environment setting to disable/enable SSL for S3 file system.


HBASE

Fix   HBAS-181: Now, all the metrics reported by HBase to Ganglia have a threshold reporting time of 300 seconds. It means that if a metric is not reported again within 300 seconds, it will be dropped from Ganglia.
Change   HBAS-183: Qubole introduces two new properties that can be used to throttle the resources utilized while running a backup:

  • hbase.snapshot.export.map.bandwidth.mb - It can be used to specify the maximum bandwidth for every map task used for taking backups.
  • hbase.snapshot.mapreduce.job.maps - It can be used to specify the number of map tasks used for taking backups.

 

HIVE 0.13

Fix   HIVE-1310: Fixing the index location while throwing exception in GenericUDFNvl

 

HIVE 1.2


New   PER-61: Pulled the following fixes from open source:

  • HIVE-13985: ORC improvements for reducing the file system calls in task side (Prasanth Jayachandran reviewed by Sergey Shelukhin)
  • HIVE-13146: OrcFile table property values are case sensitive (Yongzhi Chen, reviewed by Aihua Xu)
  • HIVE-12498: ACID: Setting OrcRecordUpdater.OrcOptions.tableProperties() has no effect (Prasanth Jayachandran reviewed by Eugene Koifman)
  • HIVE-11928: ORC footer section can also exceed protobuf message limit (Prasanth Jayachandran reviewed by Sergey Shelukhin and Owen O'Malley)
  • HIVE-11592: ORC metadata section can sometimes exceed protobuf message size limit (Prasanth Jayachandran reviewed by Sergey Shelukhin)
  • HIVE-11546: Projected columns read size should be scaled to split size for ORC Splits (Prasanth Jayachandran reviewed by Sergey Shelukhin)
  • HIVE-13291: ORC BI Split strategy should consider block size instead of file size (Prasanth Jayachandran reviewed by Gopal V)
  • HIVE-10651: ORC file footer cache should be bounded (Prasanth Jayachandran reviewed by Sergey Shelukhin) 
  • HIVE-13841: Orc split generation returns different strategies with cache enabled vs disabled (Prasanth Jayachandran reviewed by Sergey Shelukhin) 
  • HIVE-11541: ORC: Split Strategy should depend on global file count, not per-partition (Gopal V reviewed by Prasanth Jayachandran) 
  • HIVE-11043: ORC split strategies should adapt based on number of files (Gopal V reviewed by Prasanth Jayachandran) 
  • HIVE-13216 : ORC Reader will leave file open until GC when opening a malformed ORC file (Sergey Shelukhin, reviewed by Prasanth Jayachandran) 
  • HIVE-13840: Orc split generation is reading file footers twice (Prasanth Jayachandran reviewed by Owen O'Malley)

Fix   HIVE-1310: Fixing the index location while throwing exception in GenericUDFNvl.
Fix   HIVE-1581: Removing resetting of maxOffset in a OrcRecordReader/OrcRawRecordMerger.
Fix   HIVE-1737: OS-HIVE-10811: RelFieldTrimmer throws NoSuchElementException in some cases.
Change   HIVE-1645: Removing Non-static threadlocals in the metastore code that can potentially cause memory leak. Related open source jira: HIVE-10925.
Change   HIVE-1646: Added support for crc32 UDF. Related open source jira: HIVE-10641.
Change   HIVE-1647: Fixed a memory leak in HS2 when used with .hiverc file. Related open source jira: HIVE-12660.
Change   HIVE-1688: Qubole has enabled map-side Join by default.
Change   HIVE-1807: Qubole has done changes to reduce latency of Hive queries.


PRESTO

Fix   PRES-759: Rubix configurations set in Hadoop overrides are now picked by Presto cluster automatically. As a result, the hadoop.cache.data.dirprefix.list and hadoop.cache.data.block-size options must only be provided as Hadoop Overrides.
Fix   PRES-814: Cluster bringup logs are now visible in command logs of Presto command submitted via the Ruby client.
Fix   PRES-820: Fetching all columns via JDBC drivers is now optimized to run faster.
Fix   PRES-841: Increasing upper limit of Rubix BookKeeper process to prevent too many open files errors which would cause queries to hang.
Fix   PRES-862: This fixes the issue where Presto queries bringing up the cluster hang in case where there are lot of retries in the bringup process.
Fix   QBOL-5396: The Presto Ruby client now works with VPC and private subnets as well.
Change   PRES-838: Trial Versions:

Presto 0.157 is now available as a trial version. Going forward, Presto 0.150 will be deprecated. Please migrate development and testing work that uses Presto 0.150 to Presto 0.157.


QDS

New   QBOL-4548: Upgraded the version of Sqoop used from 1.4.2 to 1.4.6

New   QBOL-5197: Ability to suspend/resume an account with a message via API has been added:

  • A system-admin-user can now temporarily suspend an account and block access to others non system admin users. While suspended, only system-admin-users will be able to work on the account. 
  • A system-admin-user can resume the account back and thus restoring access to non system-admin users.

New   QBOL-5351: Support for using unvalidated data stores in data import/export commands.
New   QBOL-5907: Fixed the issue with accessing Query Hist across accounts.
New   UI-3958: Qubole had created a new Home Page for QDS that will be the default homepage for new and existing users.
New   UI-4122: Qubole had added a new option in Explore to add and use a custom metastore other than Qubole managed metastore.
New   UI-4314: Results are fetched and displayed for commands that have failed, provided they are available on Analyze.
Fix   MW-187: Validation of branding link is not required when branding the Qubole logo through the API.
Fix   UI-1106: While creating or editing a query in Analyze if the user is navigation away from the UI query composer window, a confirmation will be displayed asking the user whether or not to discard the changes.
Change   QBOL-4883: On the completion of the buffer time (range in 0-6 hrs) after a cluster goes idle, it gets terminated given auto termination is enabled.
Change   UI-1441: Added an option to print headers in Analyze download results.
Change   UI-3284: Analyze Results and Logs tabs support complete raw result downloads.
Change   UI-3894: Qubole now supports keyboard shortcuts in Analyze.
Change   UI-4229: Users can send multiple email invites at once in the Control Panel > Manage Users section.
Change   UI-4620: Qubole has added filters for logs such as errors and warnings on the Analyze page.
Change   UI-4836: A user’s email ID can be seen in the profile drop-down list available on the QDS UI top-bar.


SPARK

Fix   SPAR-556: When submitting SQL queries through QDS SDK, some commands failed with NoViableAltException if the last command ends with semicolon. This is caused by split logic to create list of SQL commands from the input. Current logic consider empty lines as a separate command and tries to execute it. This fix checks for such empty commands and filters them before executing.
Fix   SPAR-1266: Qubole has added spark.sql.qubole.parquet.cacheMetadata to skip table-data caching. It is a default configuration available at Spark cluster and job levels. It is also an interpreter property available by default in a Spark Notebook. This configuration avoids table-data-access query failures in case of any change in the table's S3 location. You can set the property to false if you do not want to skip table-data caching.

You can now ignore FileNotFoundExceptions for files unavailable in the table's S3 location by setting spark.sql.qubole.ignoreFNFExceptions to true. Even this configuration can be set at the Spark cluster and job levels and as a Spark notebook's interpreter property.
Fix   SPAR-1347: Fixed a deadlock in exception handling during a Spark context cleanup.

 

TEZ

Fix   QTEZ-100: Fixed empty file handling issue in TEZ.

 

ZEPPELIN/Notebooks

Change   QBOL-5956: On the notebook-folders page, you can now specify the new location for the notebook.
Change   UI-4306: Folders can be copied, renamed, moved, and deleted.
Change   ZEP-444: You can access the Spark Job UI from the paragraph of a Spark notebook.
Change   ZEP-516: UDF should run fine for Spark 2.0.

Change   ZEP-599: There is an improvement in the notebook experience. When a large notebook is opened, you will see the first two paragraphs immediately, while the remaining paragraphs are loaded and displayed subsequently.


List of Hotfixes Since 17th November 2016


Fix   ACM-762: Fixed an issue where a heterogeneous cluster could not be terminated.
Change   ZEP-489: Access Control for Notebook Folders. The Notebook folders feature is available for beta access. Contact help@qubole.com to enable this feature for the account.

 

List of Hotfixes After 12th January 2017

Fix   ACM-809: Qubole has made the sshmaster command faster by eliminating redundant AWS calls.
Fix   ACM-877: The AZ filter can now be used in describe-instances call in AWS. This is useful if there are large number of instances in your AWS account and describe-instances call takes a very long time to return. Contact help@qubole.com to enable this feature for the QDS account.

 

Have more questions? Submit a request

Comments

Powered by Zendesk