This article describes when Qubole downscales a node and also describes the concept of "Graceful Decommissioning" in Hadoop 2 clusters.
When does Qubole downscale a cluster node?
QDS can downscale a node only when these three conditions are fulfilled:
1. No containers are running on the node
2. No Shuffle service is running on the node
3. Log aggregation for all the tasks that ran on that node is completed
What does it mean when nodes are in "Graceful_decommissioning" Stage?
This means that the nodes are available to be decommissioned and would be decommissioned as soon as the pending activities like log aggregation on that node is complete.
Once QDS figures out that the cluster doesn’t have enough load and no containers are running on it, it is put to this "graceful_decommissioning" mode such that no new containers are allocated on that node. Now, once this is done, QDS waits for log aggregation to get completed so that we can remove that node.
However, if suppose, more jobs get submitted and there is more load, we bring back these nodes to normal mode and containers start getting allocated on these nodes. This can be overridden if users want faster downscaling, by setting the below property in hadoop cluster overrides: