Backup & Restore
openDesk's enterprise backup feature: K8up + Restic for every Restic-backed source (PG, MariaDB, LDAP, Cassandra, K8s Secrets, namespace YAML, STACKIT Secrets Manager, PVC snapshots, control-plane forensic snapshot) and rclone + crypt for application S3 buckets. End-to-end encrypted, opt-in, per-instance, multi-tenant-safe.
Audience: operators implementing or auditing the backup story for one or many openDesk clusters.
State of this topic (2026-06-03)
Active and fully wired. Two independent cluster verifications:
- 2026-05-08 — instance
ta8ce612e/ clusterod-timi/ namespaceplayground-timi. 37 distinct snapshots produced. - 2026-06-03 — instance
tbcd30494/ clusterod-timy-3/ namespaceplayground-timi-3-c-timy. 33 distinct snapshots produced (per-snapshot content audit verified bytes per source). The shape difference (stage embedded in namespace) proved the dynamic-discovery contract works across cluster layouts.
The original 3,500-line spec (enterprise-backup-s3-postgresql.md) has been fully decomposed into the page list below. Anyone working on backup should start from this README or MOC; the spec file is being archived.
Pages
Concepts (design)
- concept-backup-architecture — architecture overview, principles, scheduling
- concept-source-methods — per-source backup methods (one section per source)
- concept-restic-vs-rclone — why two engines, disjoint source sets
- concept-bucket-and-encryption — bucket structure + full encryption matrix
- concept-per-tier-retention — per-source 30-day defaults, all tunable
- concept-two-tier-credentials —
backup-credentials.json(file) ↔ K8s Secrets - concept-multi-tenant-discovery — dynamic per-instance discovery from runtime-state.json
- concept-monitoring-alerting — health checks + the snapshot-size-not-just-CR-status pattern
- concept-resources-portability — compute/network/storage budget + provider migration
- concept-security-hardening — TLS audit, secretKeyRef migration, verify-ca, checklist
- concept-limitations-and-alternatives — known limitations + considered + rejected alternatives
Component
- component-backup-service — port 8094, 9-step pipeline, plan_enumerator
Runbooks
- runbook-deploy-backup — first-time deploy + snapshot-size content verification
- runbook-destroy-backup — destroy semantics + bucket rotation
- runbook-rotate-bucket — manual key/bucket rotation paths
- runbook-restore-preparation — what to keep offline + restore inputs matrix + manual restore flow
- runbook-reimplement-from-scratch — fork-master phase order with all gotchas pre-fixed
Fixes / gotchas
- fix-kubeconfig-anchor — anchor InstanceDir before LoadRuntimeState
- fix-k8up-crds-not-bundled — kubectl apply CRDs first
- fix-container-image-pins — image pins + alpine/k8s kubectl path quirk
- fix-backup-command-wrapper —
set +e + exit 0pattern (trade-off: silent 0-byte snapshots) - fix-unique-container-names — silent data loss if not unique per source
- fix-keycloak-extensions-0bytes — the one legitimate 0-byte snapshot (schema-ownership gap)
- fix-rwo-multiattach — scratch PVC auto-annotation
- fix-destroy-three-layer-contract — 5 layers, 3 cryptic errors if one is missing
- fix-multi-namespace-footprint — K8up Schedules are namespace-scoped
- fix-busybox-date-in-alpine —
date -d "@<epoch>"only in rclone CronJobs - fix-pod-executor-sa-quirk — K8up auto-creates
pod-executoronly in target namespaces - fix-skip-if-running-lock — ConfigMap-backed lock-guard for rclone CronJobs
Incidents
- incident-2026-06-03-silent-content-bugs — LDAP + SM dump silent-content audit + canonical "verify snapshot SIZE" lesson
Related topics
- infrastructure — provides the runtime-state.json the backup deployer reads for DB/SM/S3 discovery
- security — STACKIT Secrets Manager dump is the identity-continuity layer the backup captures
- deployment — the deployer's
cfg.Namespaceis what the backup renderer uses as the application namespace - monitoring —
backupMissedMaxalert threshold lives in monitoring/Alertmanager config
When to add a page here
- A backup-related incident occurs (
incident-*) - A new backup source is added (
concept-*+ renderer update) - A new fix is applied to the renderer or a runtime patch (
fix-*) - A new operator runbook is exercised — restore drills, bucket migration, provider migration (
runbook-*)