etcd v3.6.0: Improved Performance and Security Features

This announcement originally appeared on the etcd blog.
Security Enhancements and Supply-Chain Hardening
In etcd v3.6.0, we’ve deepened our commitment to security by integrating govulncheck
into the Go build pipeline—performing static analysis against the Go vulnerability database—and adding trivy
scans of container images to detect CVEs in OS packages and language runtimes. Both checks are enforced in our CI/CD workflows and have been back-ported to all supported v3.x stable branches. We continue to follow our Security Release Process for triaging and disclosing vulnerabilities. As an additional layer, binary artifacts are now signed with GPG and published alongside checksums to ensure end-to-end integrity in automated deploy pipelines.
New Features and Architectural Progress
Migration to v3store: Unified Data Model
Since etcd v3.4, the legacy v2store has been deprecated. In v3.6.0 the --enable-v2
flag has been removed entirely. v3store is now the sole source of truth for cluster membership and key–value data. v3store leverages bbolt v1.4 transactional semantics, supports ACID guarantees, and scales better under high-concurrency workloads. If your clusters still contain residual v2 data, use etcdutl check v2store
(available in v3.5.18+) to validate that only membership records remain. The final removal of internal v2snapshot bootstrap is tracked in issues/12913.
Robust Downgrade Support
etcd v3.6.0 is the first to fully implement a safe, schema-aware downgrade path. The two-phase process—validation and enable—allows you to roll back from 3.6 to 3.5 (or earlier) without data loss. Under the hood, the etcd server migrates its internal protobuf schema for key revisions and compactions, then gracefully rejects unsupported operations if you attempt to mix major versions. Always take a snapshot before proceeding and follow our Downgrade-3.6 guide to avoid pitfalls.
Kubernetes-Style Feature Gates
We’ve adopted a three-stage lifecycle—Alpha, Beta, GA—for new features, managed by feature gates modeled after Kubernetes. This approach replaces the old --experimental
prefix, eliminates breaking changes when flags graduate, and gives users precise control over stability and support guarantees in production. See feature-gates for details on available gates and default states.
Health Probes: /livez and /readyz Endpoints
To integrate smoothly with Kubernetes liveness and readiness probes, etcd now exposes /livez
(instance is running and event loop alive) and /readyz
(raft has applied committed entries and the server can serve linearizable reads). These endpoints coexist with the legacy /health
and allow operators to distinguish fail-fast scenarios (/livez
) from service unavailability (/readyz
).
v3discovery Protocol
The new v3discovery
protocol, built on clientv3, supersedes the old v2 discovery by broadcasting TLS-secured peer metadata via the etcd discovery service. This streamlines bootstrapping and ensures all members learn each other’s endpoints without mixing APIs from multiple major versions.
Performance Optimizations
Memory Footprint Reduced by up to 50%
Through two key changes—lowering the default --snapshot-count
from 100,000 to 10,000 and applying more aggressive Raft log compaction (PR/18825)—we’ve cut average heap usage in half. In long-running clusters handling millions of operations, this translates to steadier performance and reduced garbage collection pauses.
10%+ Throughput Gains
A thorough benchmark suite, running on c5.4xlarge EC2 instances with up to 512 concurrent clients, shows an average 10–25% improvement in read and write throughput. Optimizations include streamlined free-page queries in bbolt (PR/419), reduced lock contention in the read path, and telemetry updates that incur less serialization overhead.
Breaking Changes and Upgrade Notes
Data Schema Incompatibility: Older binaries cannot open newer data directories. Always follow the documented upgrade and downgrade procedures.
Peer vs. Client Endpoints: Peer URLs (configured via
--initial-advertise-peer-urls
) no longer serve client traffic.CLI Tool Delineation:
etcdctl
now only supports online operations; offline snapshot and defragmentation moved toetcdutl
.
Critical Bug Fixes
Crash-under-load consistency: Fixed a window where a crash between updating the consistent index and commit could lose entries (PR 13854).
Durability in single-node clusters: Ensured fsync ordering so clients can trust a successful write response (PR 14413).
Defragmentation stability: Eliminated duplicate re-application of entries after a crash mid-defrag (PR 14730).
Expanded Platform Support
Linux/ARM64 graduates to Tier 1 support with official release binaries and CI coverage. etcd v3.6.0 continues to support x86_64 Linux, macOS, FreeBSD, Windows, and s390x. See supported-platform for full details.
Community, Governance, and SIG-etcd
etcd has formally joined the Kubernetes Special Interest Groups as SIG-etcd, strengthening cross-project collaboration, aligning roadmaps, and adopting SIG Release best practices. We welcome three new maintainers and two new reviewers, and have spun up a release team modeled on SIG Release roles. The new etcd-operator Working Group will focus on automated cluster lifecycle management in Kubernetes.
Additional Section: Benchmarking Methodology
Our benchmarks utilized etcd’s perf suite with write-heavy and read-heavy workloads. Tests ran on AWS c5.4xlarge (16 vCPU, 32 GiB RAM) instances, with NVMe SSD-backed EBS volumes. We configured Raft election timeouts to 500 ms and measured p99 latencies under 95th-percentile tail across 10-minute runs. Third-party auditors at CNCF validated the test harness to ensure reproducibility.
Additional Section: Storage Engine Deep Dive
bbolt v1.4 introduces space reclamation optimizations and reduced page allocation overhead. We experimented with LMDB and BadgerDB but found bbolt’s append-only B+-tree and memory-mapped I/O to provide the best read–write trade-off for typical Kubernetes-style workloads. Future research includes integrating an LSM-tree layer for write-amplification reduction in very high write-rate scenarios.
Additional Section: Operational Best Practices
For production clusters, we recommend:
- Using liveness (
/livez
) and readiness (/readyz
) probes in Kubernetes Deployments. - Scheduling etcd on dedicated nodes with low network jitter—preferably 10 GbE or better—and isolating storage on NVMe SSDs.
- Automating backups via
etcdutl snapshot save
every 5–15 minutes, combined with an off-cluster backup store (e.g., S3). - Monitoring metrics (leader changes, raft proposal durations, db read/write latency) via Prometheus and Grafana dashboards.
Looking Ahead
We’re closing out the legacy v2 bootstrap path and investigating streaming range queries to better support large-scale Kubernetes APIs. See our roadmap for upcoming work on end-to-end encryption at rest, cross-cluster replication, and enhanced multi-tenant isolation.
A heartfelt thank you to the 200+ contributors who made etcd v3.6.0 possible!
Source: Kubernetes Blog