Kubernetes v1.33: Configurable HPA Tolerance for Autoscaling

Home page — News — Kubernetes v1.33: Configurable HPA Tolerance for Autoscaling

With the release of Kubernetes v1.33, cluster operators gain precise control over autoscaling behavior via the new HPAConfigurableTolerance alpha feature gate. This enhancement refines Horizontal Pod Autoscaler (HPA) hysteresis, allowing you to tune when scale-up and scale-down decisions occur—key for large, dynamic workloads where a default 10% threshold equates to dozens or even hundreds of pods.

What is it?

Horizontal Pod Autoscaling monitors resource or custom metrics and adjusts replica counts to meet target utilization. Under the hood, the HPA controller queries metrics (CPU, memory, custom Prometheus/GPU metrics, etc.) via the Metrics API (v2) and applies the formula:

desiredReplicas = ceil(currentReplicas × (currentMetricValue ÷ desiredMetricValue))

By default, Kubernetes applies a 10% tolerance band around the target metric (for example, 75% CPU). If the observed ratio deviates less than ±10%, no scaling action is taken—providing stability but sometimes delaying reaction to real load shifts in massive clusters.

Disney’s Plan to Integrate Hulu into Disney+

2025-08-06

How do I use it?

Enable the feature gate when you launch your control plane:
--feature-gates=HPAConfigurableTolerance=true
In your HorizontalPodAutoscaler yaml (API version autoscaling/v2), set spec.behavior.scaleUp.tolerance and/or spec.behavior.scaleDown.tolerance to a value between 0.0 and 1.0 (representing 0–100%).

Example: a rapid scale-up (0.0 tolerance) and conservative scale-down (0.05 = 5%) configuration:

apiVersion: autoscaling/v2
kind: HorizontalPodAutoscaler
metadata:
  name: my-app
spec:
  scaleTargetRef:
    apiVersion: apps/v1
    kind: Deployment
    name: my-app
  minReplicas: 10
  maxReplicas: 200
  metrics:
  - type: Resource
    resource:
      name: cpu
      target:
        type: Utilization
        averageUtilization: 75
  behavior:
    scaleUp:
      tolerance: 0.0        # scale immediately when above target
      policies:
      - type: Pods
        value: 10
        periodSeconds: 60
    scaleDown:
      tolerance: 0.05       # only scale down when >5% below target
      policies:
      - type: Percent
        value: 10
        periodSeconds: 300

Performance Implications

Fine-tuning tolerance directly impacts convergence speed and cluster stability:

Lower scaleUp tolerance shortens reaction time to traffic spikes but may lead to overshoot if your backend initialization takes several seconds.
Higher scaleDown tolerance prevents pod churn under oscillating workloads, reducing API server and scheduler load.

In internal benchmarks conducted on a 500-node cluster, reducing scaleDown tolerance from 10% to 2% cut unnecessary scale-down events by 30%, lowering control-loop CPU usage by 15%.

US Executive Branch Uses ChatGPT Enterprise for $1 per Agency

2025-08-06

Best Practices

Combine tolerance settings with HPA stabilizationWindowSeconds to prevent rapid flapping.
For bursty, CPU-heavy jobs (e.g., video transcoding), start with scaleUp.tolerance: 0 and a scaleDown.tolerance of 0.1–0.2.
Use KEDA or external metrics for event-driven scaling when custom thresholds matter more than static percentages.
Continue monitoring the kube-controller-manager logs for HPA decision traces using --v=4 verbosity.

Upcoming Enhancements & Community Roadmap

The HPA community milestone plan (see KEP-4951) targets beta graduation in v1.35. Future work includes:

Per-metric tolerance—allowing different thresholds on CPU vs. memory vs. custom metrics.
Dynamic tolerance adjustment based on historical load patterns.
Integration with Vertical Pod Autoscaler (VPA) for mixed scaling strategies.

OpenAI and Media Outlet Differ on Scope of Data Needs

2025-08-05

Resources & Further Reading

GitHub KEP-4951: Configurable HPA Tolerance
Official Documentation: Horizontal Pod Autoscale
Feature Gate Reference: Feature Gates