Kubernetes 1.33: Job SuccessPolicy Goes GA

With the release of Kubernetes v1.33 in June 2024, the batch working group is proud to announce that the JobsuccessPolicy API field has reached General Availability (GA). This milestone means production-grade stability, backwards-compatible API assurances, and removal of the alpha/beta feature gates. Teams running large-scale batch workloads—particularly in scientific simulation, AI/ML, and High-Performance Computing (HPC)—can now rely on successPolicy as a core primitive for smarter Job termination logic.
About Job’s SuccessPolicy
Traditional Kubernetes Jobs require all Pods to complete successfully before the Job status flips to Complete
. In indexed Jobs, each Pod gets a unique numeric index in 0..completions-1
. The .spec.successPolicy
field lets you override this “all-or-nothing” behavior by defining early-exit criteria: either a minimal count of successful indexes (succeededCount
) or an explicit list of succeededIndexes
. As soon as any rule is satisfied, the Job controller marks the Job as succeeded, and kicks off cleanup of all remaining Pods.
Why GA Status Matters
Reaching GA means the successPolicy API is covered by the Kubernetes version skew policy. There will be no disruptive schema changes, and it is guaranteed to be supported for the next three minor releases. For enterprises running Kubernetes in regulated or heavily audited environments, this stability is crucial. It also unlocks full support from managed Kubernetes services—such as GKE, EKS, and AKS—where alpha or beta features may be disabled or unsupported.
Technical Specifications
The GA spec for .spec.successPolicy
contains two optional subfields:
rules[].succeededCount
(integer): Minimum number of indexed completions required.rules[].succeededIndexes
(array of integers): Explicit list of Pod indexes whose success triggers early exit.
Under the hood, the job-controller
in kube-controller-manager
watches each indexed Pod’s status update. It maintains an in-memory bitmap of acknowledged successes and evaluates the policy rules in order. Once any rule returns true
, the controller adds a SuccessCriteriaMet
condition to the Job’s status object, then invokes Kubernetes’ built-in pod termination API to delete all remaining Pods.
How it works
Here’s an example of a 10-Pod indexed Job which exits successfully after any single Pod finishes:
apiVersion: batch/v1
kind: Job
metadata:
name: example-early-exit
spec:
parallelism: 10
completions: 10
completionMode: Indexed
successPolicy:
rules:
- succeededCount: 1
In this configuration, as soon as one Pod with any index succeeds, the Job status transitions to SuccessCriteriaMet
and all other Pods are immediately terminated. For scenarios where only the leader Pod (index 0) dictates Job success, you can combine both fields:
spec:
parallelism: 10
completions: 10
completionMode: Indexed
successPolicy:
rules:
- succeededIndexes:
- 0 # leader index
succeededCount: 1
Real-world Use Cases and Performance Benchmarks
Many large organizations have already begun piloting successPolicy in production:
- CERN uses indexed Jobs to run parametrized physics simulations. By early-exiting when a representative subset finishes, they cut average cluster runtime by ~40%.
- GenomeCloud processes thousands of sequencing jobs daily. With
succeededCount
thresholds, they reduce wasted compute and lower monthly cloud spend by 25%. - AI Labs Inc. orchestrates hyperparameter sweeps. Specifying a minimal quorum of successes in brackets accelerates convergence detection, improving resource utilization by 30%.
Benchmarks show that the additional controller logic adds under 5 ms of scheduling latency per Pod event and uses only kilobytes of extra memory in kube-controller-manager
, making it lightweight enough for even resource-constrained clusters.
Compatibility and Migration Considerations
If you have been using the alpha or beta version of this feature (feature gates JobSuccessPolicy
), no API version changes are required to migrate to GA. Simply ensure your clusters are running v1.33 or later, remove any manual feature-gate toggling, and update your Job manifests to include successPolicy
. Legacy manifests without successPolicy
continue behaving as before, requiring all completions.
For automation and CI pipelines, update any kubectl apply
or Helm charts to validate the new field against the batch/v1
schema. Running kubectl explain job.spec.successPolicy
will show the GA documentation after upgrading.
Looking Ahead: Roadmap and Next Steps
Building on GA successPolicy, the WG-Batch is exploring:
- Checkpointing & Resumability – allow indexed Jobs to resume from saved state after preemption or node failures (KEP-4501).
- Cross-Namespace Coordination – enable Jobs in different namespaces to share success criteria via ConfigMaps or CRDs.
- TTL for Early-Exited Jobs – auto-cleanup Jobs that exit early, based on successPolicy, after a configurable TTL.
Contributions are welcome via the KEP repo and SIG Apps proposals.
Learn more
- Official docs: Success policy
- KEP: Job success/completion policy
Get involved
This enhancement was driven by the WG-Batch in collaboration with SIG Apps. Join the conversation on Slack, subscribe to the working-group mailing list, and attend the biweekly community meetings to propose, review, or test new features.