Qubole Release Notes for QDS Version R42 12-Apr-2017

Release Version: 42.41.0

 

For details of what has changed in this version of QDS, see What is New, New Beta Features, and List of Changes and Bug Fixes in this Release.


What is New

QDS Supports Spot Block Instances

QDS supports configuring Spot Block instances on the new cluster UI page and also through cluster API calls. Spot Blocks are Spot instances that run continuously for a finite duration (1 to 6 hours). They are 30% to 40% cheaper than On-demand instances based on the requested duration. They are stable than spot nodes as they are not susceptible to being taken away for the specified duration. However, they will certainly get terminated once the duration they were requested for is over. To learn more about spot blocks, refer to AWS spot blocks.

QDS ensures that Spot blocks are acquired at a price lower than On-Demand nodes. It also ensures that autoscaled nodes are acquired for the remaining duration of the cluster. For example, if the duration of a Spot block cluster is 5 hours and there is a need to autoscale at the 2nd hour, Spot blocks are acquired for 3 hours. For more information, see the documentation.

QDS Supports Dedicated Instances at Cluster Level

QDS provides instance tenancy at the cluster level and only in a VPC. The choice for tenancies are: default or dedicated. The dedicated instance_tenancy would mean the instances launched would not share physical hardware with any other instances outside of the respective AWS account. Currently, it can be set through the instance_tenancy parameter under ec2_settings in an edit cluster API call. A cloned cluster would get the instance_tenancy setting if the parent cluster had it configured.

R4 Instance Type Supported in Additional AWS Regions

QDS supports r4 type EC2 instances in ap-northeast-1, ap-northeast-2, ap-southeast-1, eu-central-1, and sa-east-1 AWS regions.

Resuming-Suspended Schedules Can Now Skip Missed Instances

The Scheduler Resume API now has an option to skip the missed instances using the no_catch_up parameter. When it is set to true, all instances which were missed from the time the job was suspended will be skipped.

Enhancements in Notebooks

These are the new enhancements in Notebooks in this QDS release:

  • Added a new section named Recents in notebooks which will display all recent notebooks.
  • The per-user interpreter mode is selected by default for new clusters. For older behavior, an admin can use the legacy mode. 
  • Ability to specify datatype to table columns. String and Number are the supported data types.

UI Enhancements

Qubole has a few enhancements in this QDS release that are listed below:

  • You can now pin custom paths in the Explore page.
  • Improved search in the Help Center and a new ANNOUNCEMENTS section is added in the Help Center. For more information, see the documentation.
  • You can now disable keyboard shortcuts from the My Profile tab in Control Panel. For more information, see the documentation.

New Beta Features

QDS supports Hive Server 2 on Hive 1.2

QDS supports the open-source feature, Hive Server 2 that helps in reducing the latency of Hive queries. For more information, see the documentation. This feature is available for beta access. Create a ticket with Qubole Support to enable this feature for the QDS account.


List of Changes and Bug Fixes in this Release

AWS CLUSTER MANAGEMENT

New   ACM-606: QDS provides instance tenancy at the cluster level and only in a VPC. The choice for tenancies are: default or dedicated. The dedicated instance_tenancy would mean the instances launched would not share physical hardware with any other instances outside of the respective AWS account. Currently, it can be set through the instance_tenancy parameter under ec2_settings in an edit cluster API call. A cloned cluster would get the instance_tenancy setting if the parent cluster had it configured.
Fix   ACM-908: QDS supports r4 type EC2 instances in ap-northeast-1, ap-northeast-2, ap-southeast-1, eu-central-1, and sa-east-1 AWS regions.

Fix   ACM-1063: The issue where R4 instances not provisioning correct permissions and soft links for media/ephemera0 dir has been fixed.
Change   ACM-975: added following counters for tracking in the Datadog cluster monitoring metrics:

  • yarn.QueueMetrics.AvailableMB
  • yarn.QueueMetrics.AllocatedMB
  • yarn.QueueMetrics.ReservedMB
  • yarn.QueueMetrics.AvailableVCores
  • yarn.QueueMetrics.AllocatedVCores
  • yarn.QueueMetrics.ReservedVCores
  • yarn.QueueMetrics.AllocatedContainers
  • yarn.QueueMetrics.ReservedContainers
  • yarn.ClusterMetrics.NumActiveNMs
  • yarn.ClusterMetrics.NumDecommissionedNMs
  • yarn.ClusterMetrics.NumDecommissioningNMs
  • yarn.ClusterMetrics.NumLostNMs
  • yarn.ClusterMetrics.NumRebootedNMs
  • yarn.ClusterMetrics.NumUnhealthyNMs
  • dfs.FSNamesystem.CapacityUsed
  • dfs.FSNamesystem.CapacityTotal
  • dfs.FSNamesystem.CapacityRemaining
  • dfs.FSNamesystem.TotalLoad
  • dfs.FSNamesystem.BlocksTotal
  • dfs.FSNamesystem.FilesTotal
  • dfs.FSNamesystem.MissingBlocks
  • dfs.FSNamesystem.CorruptBlocks
  • dfs.FSNamesystem.PendingReplicationBlocks
  • dfs.FSNamesystem.PendingDeletionBlocks
  • dfs.FSNamesystem.UnderReplicatedBlocks
  • dfs.FSNamesystem.ScheduledReplicationBlocks


HADOOP 2

Fix   HADTWO-768: Added support for EBS volumes for the S3n cache directory.
Fix   HADTWO-790: Fixed NPE in the FairScheduler. It happens when a node transitions from UNHEALTHY to GS mode and scheduler tries to update its resources.
Fix   HADTWO-814: Incase the fair-scheduler.xml file is corrupt at cluster start, Resource Manager will fallback to using a default configuration for FairScheduler. The default configuration will be used until a correct fair-scheduler.xml is pushed to the cluster.
Fix   HADTWO-834: Added retries in copying job jar and other remote files to HDFS.
Change   HADTWO-778: QDS reports YARN autoscaling metrics to Ganglia. Following are the metrics:

  • Decommissioned nodes from Yarn
  • Number of nodes removed from cluster.
  • Number of new nodes added to cluster
  • Number of nodes recommissioned. 
  • Number of running nodes reported by cloud provider. 
  • Number of nodes reported by Yarn.
  • Number of nodes lost due to spot loss.

Change   HADTWO-812: Added capability in FairScheduler to set MaxResourceDefault for each queue.
Ported from OS Hadoop: YARN-2913


HIVE 1.2

Fix   HIVE-1575: With S3a enabled cluster, the Hive command still says replacing s3 to s3n. This issue has been fixed.

Fix   HIVE-1918: Fixed an issue causing a failure while creating an EXTERNAL TABLE on HDFS path with Hive authorization enabled.
Fix   HIVE-1962: Performance improvement for dynamic partition loading

Fix   HIVE-1974: Hive 1.2 based Metastore is not starting when configured to bypass reverse-tunnel. This issue has been fixed.

Fix   HIVE-1979: Fixed an issue in HiveServer2 which caused masking of errors in the event of a failure.
Change   HIVE-1940: Deprecating Hive configuration flag - hive.qubole.dynpart.track.s3 and introducing a new configuration option, hive.qubole.dynpart.track.cloudfs.
Change   QBOL-5747: Get Hive Table Definition API now supports fetching AVRO table schema as well.


PRESTO

Fix   PRES-849: The issue where running multiple queries on presto cluster causing no nodes available to run query, has been fixed.

Fix   PRES-962: Increased open file descriptors limit to 100000 for a Presto server.

 

QDS

New   UI-4964: Spot Block is now supported in new cluster page. Spot Blocks are Spot instances that will run continuously for a finite duration (1 to 6 hours).

Fix   QBOL-6026: In Explore, sample table rows that were not working, work now.
Fix   SQOOP-96:  Correct parameters are passed to SQOOP.
Fix   UI-5138: Scheduler- There is a new option to skip missed instances in UI while resuming a suspended job
Fix   UI-5420: Fixed unintentional run on saving a command from history
Change   MW-332: The All Command Report API response will have an option to filter by tags. Reports will now have tags column as well. However, this change has these known issues:

  • There is an error while fetching all_commands report with the Sort Column specified.
  • Reports API to fetch commands with an invalid tag gives all commands.
  • There are UI Issues in All Commands Report after the Tag field was added.

Change   MW-400: The Scheduler Resume API now has an option to skip the missed instances using the no_catch_up parameter.
When it is set to true, all instances which were missed from the time the job was suspended will be skipped.
Change   SQOOP-61: Enhancement to include additional sqoop options such as map-column-hive while running data import/export commands through API.
Change   UI-3722: You can now pin custom paths in the Explore page.
Change   UI-4478: Improved search in the Help Center and a new ANNOUNCEMENTS section is added in the Help Center.

Change   UI-5123: Periodic job cron expression does not accept invalid values.

Change   UI-5141: You can now disable keyboard shortcuts from the My Profile tab in Control Panel.

 

SPARK

Fix   SPAR-1232: Fixed the issue where the Spark History server threads got stuck and Spark UI links did not work.

Fix   SPAR-1512: The issue where the Spark History server was showing version as 1.5.1 for Spark 2.0.2 clusters has been fixed.

Change   SPAR-1244: Infrastructure support for Spark programs with size more than 64k.

 

TEZ

Fix   QTEZ-107: Fixed task log URL for running Tez jobs.
Fix   QTEZ-116: Killing the underlying Tez job when hive command is cancelled.
Fix   QTEZ-121: Creating inner record reader only if required, avoiding a hanging reader(in case of vectorization) which was leading to running out of HTTP connections.
Fix   QTEZ-127: Disabled All Dags link for offline Tez UI

Fix   QTEZ-130: Fixed the ORC memory issue with 0.7.0 that is fixed in 0.7.1.


ZEPPELIN/NOTEBOOKS

Fix   ZEP-811: This fixes a case where two paragraph were getting run, when Shift + Enter were pressed.
Change   UI-5076: Added a new section named Recents in notebooks which will display all recent notebooks.
Change   UI-5111: The per-user interpreter mode is selected by default for new clusters. For older behavior, an admin can use the legacy mode.
Change   ZEP-686: Ability to specify datatype to table columns. String and Number are the supported data types.


List of Hotfixes Since 14th March 2017

Fix   HADTWO-781: Fixed an issue due to which an app never gets scheduled after an exception in FairScheduler.
Pulled in the following patches from the open source:

  • YARN-2910. FSLeafQueue can throw ConcurrentModificationException. 
  • YARN-2975. FSLeafQueue app lists are accessed without required locks.

Fix   HADTWO-818: Fixed an issue in which all cluster nodes go into graceful decommissioning state.
Fix   UI-5440: Added a Back button in the interpreter page to return to the Notebook page.
Fix   UI-5449: Fixed an issue in the Notebook page where it was not possible to select a cluster from a long list.

Have more questions? Submit a request

Comments

Powered by Zendesk