Kubernetes v1.33: Improvements in Pod Scheduling and CSI Allocatable Counts

Scheduling stateful applications reliably depends heavily on accurate information about resource availability on nodes. Kubernetes v1.33 introduces an alpha feature called Mutable CSI Node Allocatable Count, allowing Container Storage Interface (CSI) drivers to dynamically update the reported maximum number of volumes a node can handle. By continuously reconciling actual attachable capacity with the scheduler’s view, this feature significantly improves scheduling precision and reduces pod setup failures due to stale volume limits.
Background
Traditionally, CSI drivers advertise a static maximum volume attachment limit during initialization—often a value hard-coded in driver manifests or Kubernetes defaults. In production, actual attach capacity evolves throughout a node’s lifecycle due to:
- External volume operations: Manual attach/detach actions outside the Kubernetes control loop (cloud console, cloud CLI).
- Dynamic hardware allocations: GPUs, SR-IOV NICs, or NVMe devices consuming attachment slots or PCIe lanes.
- Multi-driver interference: One CSI driver’s attachments reducing headroom for another (for example, AWS EBS vs. AWS FSx for Lustre).
When Kubernetes bases scheduling on stale capacity data, pods requesting new volumes can become stuck in ContainerCreating
or fail with ambiguous errors. Administrators then must manually reconcile volume usage or cordon nodes, impacting cluster reliability.
Dynamically Adapting CSI Volume Limits
The new feature gate MutableCSINodeAllocatableCount
empowers CSI drivers to adjust node‐level volume limits at runtime. Kubelet merges these updates into the node’s status.allocatable
summary, ensuring the scheduler always sees the latest attach capacity.
How It Works
When enabled, Kubelet and kube‐apiserver coordinate two update mechanisms:
- Periodic Updates: CSI drivers set
nodeAllocatableUpdatePeriodSeconds
(minimum 10s). Kubelet invokes the gRPCNodeGetInfo
call at each interval. TheNodeGetInfoResponse.maxVolumes
field is then used to refresh the node’s allocatable count. - Reactive Updates: On any volume‐attach failure returning the gRPC
ResourceExhausted
(code 8), Kubelet immediately triggers aNodeGetInfo
to correct the capacity. This avoids repeated scheduling attempts against an invalid limit.
Enabling the Feature
To test the mutable CSI node allocatable count in a v1.33 cluster, perform these steps:
- Edit the
kube-apiserver
manifest (or command line) to include--feature-gates=MutableCSINodeAllocatableCount=true
. - Restart the
kubelet
with the same feature gate enabled. - Annotate your CSI driver object:
apiVersion: storage.k8s.io/v1
kind: CSIDriver
metadata:
name: example.csi.k8s.io
spec:
nodeAllocatableUpdatePeriodSeconds: 60
This instructs Kubelet to poll NodeGetInfo
every 60 seconds, updating .status.allocatable[kubernetes.io/ephemeral-storage]
to match maxVolumes
. Minimum intervals below 10 seconds are rejected to avoid API pressure.
Immediate Updates on Attachment Failures
In addition to periodic refresh, Kubernetes reacts swiftly to ResourceExhausted
errors. When a CSI driver returns gRPC code 8 during volume attach, the scheduler triggers an on‐demand NodeGetInfo
call to immediately reconcile the reported allocatable count. This proactive correction prevents repeated scheduling retries and keeps the cluster healthy.
Performance and Scalability Impact
Enabling frequent updates carries CPU and network overhead—each NodeGetInfo
call is a lightweight gRPC request, but at scale (thousands of nodes), it can raise traffic to the CO API server. Kubernetes benchmarks in early adopters show:
- ~30% reduction in pod startup latency for statefulsets on nodes with high volume churn.
- Negligible CPU overhead (<0.5% on Kubelet) when using 30s update intervals.
- Recommendation: stagger update intervals across nodes (e.g., add jitter) to smooth API server load.
According to SIG-Storage lead Junjie Zhou, “Once in beta, we expect to integrate this mechanism into the scheduler’s internal cache invalidation logic, further reducing stale state windows.”
Compatibility and Upgrade Path
The mutable allocatable count feature requires CSI spec >=1.7 support for maxVolumes
in NodeGetInfoResponse
. Major cloud drivers—AWS EBS, Azure Disk, GCP PD—are already in the process of upstreaming the necessary code. Before upgrading:
- Verify your driver implements
maxVolumes
inNodeGetInfo
(check driver logs for capability negotiation). - Test in a staging cluster with mixed volume types (block and file systems).
- Review CSI driver release notes for any breaking changes in gRPC error mappings.
Best Practices and Troubleshooting
To get the most from mutable allocatable counts:
- Enable
MetricsEndpoint
on your CSI driver to surfaceNodeGetInfo
latencies and success rates. - Configure Prometheus alerts for repeated gRPC
ResourceExhausted
errors. - Use
kubectl describe node
to inspect.status.allocatable[kubernetes.io/volumes-attach]
changes over time.
If you see spikes in API requests, consider increasing nodeAllocatableUpdatePeriodSeconds
or rolling out updates gradually.
Future Directions
This alpha feature sets the stage for richer node capacity management. Upcoming Kubernetes releases plan to:
- Promote
MutableCSINodeAllocatableCount
to beta in v1.34, with GA slated for v1.35. - Introduce a
CSIStorageCapacity
informer plugin, enabling the scheduler to cache dynamic storage limits at cluster scale. - Extend reactive updates to include
NodeGetVolumeStats
calls, providing real‐time usage metrics for ephemeral volumes.
Community feedback is vital. Join the discussion in the SIG-Storage repo to share use cases, performance data, and feature requests.
Source: Kubernetes Blog