v2.14 · Now with multi-cluster federation

Container fleets thatthink for themselves.

A control plane that maps every pod, node, and service in real time — self-healing, auto-scaling, never sleeping. Built for the engineers whose pagers don't rest.

Rolling updates · zero downtime

kube dashboard · deploy view

live

scroll for proof

live cluster metrics

0/day

Deployments

across all clusters today

0.0s

Mean Recovery

pod failure → healthy state

0.0%

Cluster Uptime

trailing 12 months

0.0M

Containers

orchestrated right now

uptime · 99.998%

Zero-downtime deploys aren't aspirational. They're the default.

Every rolling update holds traffic in-flight, waits for readiness probes, and only terminates old pods after new ones are healthy. The incident timeline is real — captured from a production cluster at 3:14 AM.

90-day uptime99.998%

90 days ago

operationaldegraded

today

deployment · prod · api-gateway

03:14:02deploy triggered

api-gateway v1.8.3 → v1.8.4 · 5 replicas

03:14:04rolling update started

strategy: RollingUpdate · maxSurge: 1 · maxUnavailable: 0

03:14:08replica 1/5 replaced

pod api-gateway-7d8f9c-xk2p1 · readiness probe passed

03:14:14replica 2/5 replaced

pod api-gateway-7d8f9c-mn3q8 · 0 dropped connections

03:14:21replica 3/5 replaced

pod api-gateway-7d8f9c-pz7w4 · traffic rerouted cleanly

03:14:29replica 4/5 replaced

pod api-gateway-7d8f9c-rs9k2 · health: 200 OK

03:14:36replica 5/5 replaced

pod api-gateway-7d8f9c-ty1n6 · all probes green

03:14:37deployment complete35s total

downtime: 0ms · error rate: 0.00% · p99: 12ms

mean recovery · 4.2s

The 3 a.m. page that never comes.

Watch a real OOMKill event get detected, classified, and fully remediated in 4.2 seconds. No runbook. No human.

kube heal-watch

00:00.000○ pod crash detected

api-worker-6c8d9f · OOMKill · exit 137 · container memory limit 512Mi

00:00.041○ failure classified

type: OOMKill · severity: medium · restart policy: OnFailure

00:00.089○ replacement scheduled

new pod api-worker-6c8d9f-r7x2k · node: node-pool-3a · priority: high

00:00.112○ image pulled from cache

gcr.io/kube/api-worker:v2.1.4 · layer cache hit · 0ms pull

00:00.890○ container initializing

init containers: 2/2 complete · env injection: done

00:04.201○ readiness probe passed

GET /healthz → 200 OK · endpoint registered in service mesh

00:04.203○ traffic restored

api-worker-6c8d9f-r7x2k · READY 1/1 · 0 requests dropped

recovery time breakdown

detection41ms

scheduling71ms

init + pull801ms

readiness3200ms

mesh sync2ms

cluster health

2,847

pods running

failed pods

48/48

nodes ready

alerts firing

from the trenches

We used to have a Slack channel called #prod-fires. It's been archived for 8 months. Kube's self-healing caught every OOM event before our on-call even got the page.

0P0 incidents since deploy

Marcus Okonkwo

Principal SRE · DataStream Labs

Managing 50+ microservices across GCP, AWS, and on-prem felt impossible. Kube's federation layer makes it one cluster in my mental model. The topology view alone saved 3 hours last week.

3cloud providers, one plane

Priya Venkataraman

Platform Engineering Lead · Meridian Fintech

The rolling update incident timeline feature is honestly what sold our CTO. We could show exactly what happened during every deploy — zero ambiguity, full audit trail.

847deploys/day, zero downtime

Tobias Schreiber

DevOps Architect · Helios Infrastructure

ready to ship

Your cluster is one command away.

Install the CLI, connect your kubeconfig, and get a live dashboard in under 90 seconds. No YAML ceremony. No agent installation.

terminal

$brew install kube-ctl && kube init

requires kubectl ≥ 1.26 · k8s ≥ 1.24 · works with EKS, GKE, AKS, and bare metal

Try the Dashboard →

14k+clusters connected

Apache 2.0open source

SOC2 Type IIcertified