v2.14 · Now with multi-cluster federation

Container fleets thatthink for themselves.

A control plane that maps every pod, node, and service in real time — self-healing, auto-scaling, never sleeping. Built for the engineers whose pagers don't rest.

Rolling updates · zero downtime

kube dashboard · deploy view
live
scroll for proof
live cluster metrics
0/day
Deployments
across all clusters today
0.0s
Mean Recovery
pod failure → healthy state
0.0%
Cluster Uptime
trailing 12 months
0.0M
Containers
orchestrated right now

uptime · 99.998%

Zero-downtime deploys aren't aspirational. They're the default.

Every rolling update holds traffic in-flight, waits for readiness probes, and only terminates old pods after new ones are healthy. The incident timeline is real — captured from a production cluster at 3:14 AM.

90-day uptime99.998%
90 days ago
operationaldegraded
today
deployment · prod · api-gateway
03:14:02deploy triggered
api-gateway v1.8.3 → v1.8.4 · 5 replicas
03:14:04rolling update started
strategy: RollingUpdate · maxSurge: 1 · maxUnavailable: 0
03:14:08replica 1/5 replaced
pod api-gateway-7d8f9c-xk2p1 · readiness probe passed
03:14:14replica 2/5 replaced
pod api-gateway-7d8f9c-mn3q8 · 0 dropped connections
03:14:21replica 3/5 replaced
pod api-gateway-7d8f9c-pz7w4 · traffic rerouted cleanly
03:14:29replica 4/5 replaced
pod api-gateway-7d8f9c-rs9k2 · health: 200 OK
03:14:36replica 5/5 replaced
pod api-gateway-7d8f9c-ty1n6 · all probes green
03:14:37deployment complete35s total
downtime: 0ms · error rate: 0.00% · p99: 12ms

mean recovery · 4.2s

The 3 a.m. page that never comes.

Watch a real OOMKill event get detected, classified, and fully remediated in 4.2 seconds. No runbook. No human.

kube heal-watch
00:00.000pod crash detected

api-worker-6c8d9f · OOMKill · exit 137 · container memory limit 512Mi

00:00.041failure classified

type: OOMKill · severity: medium · restart policy: OnFailure

00:00.089replacement scheduled

new pod api-worker-6c8d9f-r7x2k · node: node-pool-3a · priority: high

00:00.112image pulled from cache

gcr.io/kube/api-worker:v2.1.4 · layer cache hit · 0ms pull

00:00.890container initializing

init containers: 2/2 complete · env injection: done

00:04.201readiness probe passed

GET /healthz → 200 OK · endpoint registered in service mesh

00:04.203traffic restored

api-worker-6c8d9f-r7x2k · READY 1/1 · 0 requests dropped

recovery time breakdown

detection41ms
scheduling71ms
init + pull801ms
readiness3200ms
mesh sync2ms

cluster health

2,847
pods running
0
failed pods
48/48
nodes ready
0
alerts firing
from the trenches
"

We used to have a Slack channel called #prod-fires. It's been archived for 8 months. Kube's self-healing caught every OOM event before our on-call even got the page.

0P0 incidents since deploy
MO

Marcus Okonkwo

Principal SRE · DataStream Labs

"

Managing 50+ microservices across GCP, AWS, and on-prem felt impossible. Kube's federation layer makes it one cluster in my mental model. The topology view alone saved 3 hours last week.

3cloud providers, one plane
PV

Priya Venkataraman

Platform Engineering Lead · Meridian Fintech

"

The rolling update incident timeline feature is honestly what sold our CTO. We could show exactly what happened during every deploy — zero ambiguity, full audit trail.

847deploys/day, zero downtime
TS

Tobias Schreiber

DevOps Architect · Helios Infrastructure

ready to ship

Your cluster is one command away.

Install the CLI, connect your kubeconfig, and get a live dashboard in under 90 seconds. No YAML ceremony. No agent installation.

terminal
$brew install kube-ctl && kube init

requires kubectl ≥ 1.26 · k8s ≥ 1.24 · works with EKS, GKE, AKS, and bare metal

Try the Dashboard →
14k+clusters connected
Apache 2.0open source
SOC2 Type IIcertified