Deployment
The end-to-end story of how openDesk gets onto a STACKIT cluster: the two Go microservices that drive it, the conceptual model (rendered-manifests mode, per-release Applications + sync-wave, helmfile, two-process model), the 35-step pipeline, the source patches (fork pipeline) and render-time YAML fixes (declarative
runtime-patches/) that make the open-source charts actually deploy, the architectural decisions, and the runbooks/incidents that pile up around all of it. This is the bulk of the wiki because it's where the project's complexity lives.Answers: "What does step N do?", "Why is patch X applied?", "How do I force-sync an Application?", "Why was CMP mode removed?", "Where do source patches live vs render-time fixes?", "Where does the deployer record its progress?".
Audience: anyone working on the deployer itself, debugging a deployment in flight, or onboarding to the project.
Pages
Components
- component-deployer-service —
opendesk-deployerGo service on port 8092; runs the 35-step pipeline - component-web-service —
opendesk-webGo service on port 8090; Vue 3 UI, project/instance CRUD, EventSource fan-out
Concepts
- concept-rendered-manifests-mode — pre-render YAML, push to git, ArgoCD syncs without CMP — the only supported mode today
- concept-applicationset — historical: ApplicationSet model, since superseded by per-release Applications + sync-waves; kept for context
- concept-helmfile — declarative Helm release manager; how
helmfile template -e prodis invoked - concept-rendering-pipeline — the render step's full pipeline: helmfile template --output-dir → patches → JIT → overrides → dedup → lint → inventory → ESO → redact → guards (SCIM, orphan-sidecars, namespace markers, required-patch fatal)
- concept-component-split — historical 8-bucket split, since superseded by one directory per helmfile release
- concept-rolling-sync — historical 4-phase RollingSync, since superseded by sync-wave annotations from
release-graph.yaml - concept-argocd-hooks — PreSync / Sync / PostSync;
helm.sh/hookvsargocd.argoproj.io/hook - concept-checkpointing — HISTORICAL: checkpointing removed; every step runs every deploy (liveness = inflight marker + K8s Lease)
- concept-deploy-modules — per-component deploys; a module is a mode (
module:<key>, 10 deployer + 6 monitoring modules) - concept-deploy-runtime-switches — the 2026-06-12 fix-series kill-switches (mid-run kubeconfig refresh, run budget, loud-failure switches, skew guard)
- concept-fork-repo — the upstream fork branch model + the manifest-driven source patches in
upstream-patches/(manifest.yaml→lib.py, run by the Apply-Patches CI pipeline;scripts/create-fork-branch.pyis the legacy predecessor) - concept-fixes-repo — the shared
runtime-patches/rendered-patches/*.yamldeclarative patch model +apply.pyprimitives DSL (per-patch version scoping viaappliesTo) - concept-two-process-model — server vs
--mode-runnersubprocess, keyed per-instance sessions, graceful stop, inflight admission
Decisions
- decision-rendered-manifests-only — why CMP mode was removed; rendered-manifests is the only path
- decision-applicationset-vs-app — historical context for the per-release Application model that replaced both single-Application and ApplicationSet
Steps (35)
Authoritative order is in
runner.go(NewDeployerRunner). Neither file-number prefixes (00_, 01b_, 05_, 07_…) nor the page filenames' step numbers match the current execution positions — page filenames are historical; the[NN]below is the current 0-based position. There is no checkpointing — every step runs every deploy (concept-checkpointing); the per-instance deploy Lease is taken before step [00].
In-Cluster Infrastructure:
- [00] step-00-preflight-external-deps — verify SKE / Postgres / S3 / Redis / DNS / SM (
OPENDESK_SKIP_PREFLIGHT=1bypasses) - [01] step-01-refresh-kubeconfig — fetch kubeconfig from STACKIT SKE (3h TTL;
MaybeRefreshKubeconfigre-fetches mid-run) - [02] step-02-deploy-mariadb-galera — in-cluster Galera (gated by
terraform.provisionGalera; tier-sized memory/buffer-pool/max_connections/storage) - [03] step-03-register-mariadb-credentials — read passwords, copy Secrets cross-namespace, wire
cfg.ExternalServices.MariaDB - [04] step-04-deploy-coturn — TURN/STUN LoadBalancer for Element 1-on-1 calls (single replica by design; one-sided-TURN limitation)
- [05] step-05-deploy-redis-proxy — ghostunnel TLS termination + upstream reachability probe (tier-sized, bounded
--max-concurrent-conns)
Deployment Setup:
- [06] step-06-clone-opendesk — clone fork, resolve
__NAMESPACE__, neutralise post-renderers - [07] step-07-ensure-jit-pull-secret — pull Secret for the JIT plugin image (no-op unless
externalIdp.userSync) - [08] step-07-generate-helmfile-values — write
cluster/secrets/policy/sizing.yaml.gotmpl - [09] step-08-setup-enterprise — apply enterprise license keys, registry, OpenKruise, Dovecot keys
- [10] step-09-render-manifests —
helmfile template --output-dir+ the full rendering pipeline - [11] step-10-push-rendered-manifests — one directory per helmfile release + one Application YAML per release +
render-meta.yaml - [12] step-11-install-argocd — helm upgrade --install argo-cd (no CMP)
- [13] step-12-install-eso — External Secrets Operator (skipped if SecretsManager unset)
- [14] step-13-prepare-argocd-secrets — register repo creds, create nginx-fake-ca, clean legacy CMP artifacts
- [15] step-14-create-argocd-application — create the per-release Applications (Application-of-Applications)
- [16] step-15-ensure-certificate-issued — wait for
opendesk-certificatesCert; auto-recover stuck ACME
Prerequisites (run BEFORE the ArgoCD sync wait):
- [17] step-17-ensure-ox-mariadb-database — pre-create
PRIMARYDB_9to break OX bootstrap deadlock - [18] step-18-ox-bootstrap-fix —
initconfigdb, register filestore/server/database - [19] step-24-ensure-ox-context — create OX context 1 if missing; restart ox-connector (PRE-sync — the sync wedges without it)
- [20] step-20-reconcile-ox-admin-password — roll OX master-admin hash forward across master rotations
- [21] step-21-ensure-scim-token — per-customer SCIM bearer-token Secret (no-op unless
scim.enabled) - [22] step-19-nextcloud-init — create
fs_config_storeschema if NC pods crash-loop - [23] step-20-restart-nextcloud — rollout restart to fix trusted-domain race
- [24] step-24-openproject-migrate — run pending OpenProject migrations via one-off Job (starved-Sync-hook fix)
- [25] step-25-invalidate-keycloak-bootstrap — delete keycloak-bootstrap Jobs when realm-config inputs changed
ArgoCD sync:
- [26] step-21-wait-for-argocd-sync — poll every 30s, max 90 min; fatal on timeout
Post-fixes (need a synced cluster; run BEFORE health validation since 2026-06-12):
- [27] step-16-ensure-2fa-browser-flow — Keycloak safety net: copy built-in
browserflow into2fa-browserif empty/missing - [28] step-23-configure-idp-federation — verify
sso-federation-idpIdP entry; (when set) kcadm-patch UMS LDAPusernameAttribute - [29] step-29-surface-wire-saml-metadata — surface the Wire SAML IdP artifacts (gated by
wireSSO.enabled) - [30] step-25-register-matrix-accounts — register UVS + neodatefix-bot Synapse accounts
- [31] step-26-store-keycloak-credentials — write Keycloak + Administrator creds to STACKIT SM
Health checks:
- [32] step-22-validate-deployment-health — OX DB, Nextcloud domain, OpenProject bootstrap, Notes/Impress schema; moved after the post-fixes (2026-06-12)
Finalisation:
- [33] step-27-ensure-keycloak-admin-ingress — manages the
keycloak-adminIngress (gated bykeycloak.exposeAdminConsole) AND theopendesk-force-loginIngress (created whenexternalIdp.enabled) - [34] step-28-cleanup-failed-pods — delete
status.phase=Failedpods (cosmetic, non-fatal)
Source patches — applied by the Apply-Patches CI pipeline (scripts/apply-patches.py → upstream-patches/manifest.yaml → upstream-patches/lib.py; baked into the fork's stable/v1.X.Y branch, manifest-gated per version via applies_to). scripts/create-fork-branch.py is the legacy/superseded predecessor, called by no pipeline. The migration renamed/split the older patches:
- patch-nubus-values — SPLIT into
patch_nubus_data_loader(dataLoader.enabled: true) +patch_nubus_security_context(dropsecurityContext.enabled+ duplicateseccompProfile,< v1.14.0). AWS_DEFAULT_REGION dedupe is NO longer a source patch — handled post-render by render patch021-dedup-env-vars. - patch-secrets-file — now
patch_sha1sum: remove| sha1sumfrom LDAP password derivation (cracklib rejects the hex output) - patch-helmfile-children — now
patch_wait_for_jobs:waitForJobs: truefailing when batch/job CRD not yet installed - patch-chart-verification — now
patch_chart_verify:verify: trueneeds.prov; OCI registries don't serve them - patch-namespace-refs —
patch_namespace_refs(same name):.Release.Namespaceresolving toargocdinstead of deployment ns (deployer-sideresolveNamespacePlaceholderis the runtime backstop) - patch-haproxy-rewrite-target — NOT in
upstream-patches/; HAProxy ingress rewrite-target is now handled by a Go stage inmodules/opendesk/internal/steps/07b_render_manifests.go - patch-nextcloud-php-ca —
patch_nextcloud_php_ca: Nextcloud PHP custom CA bundle wiring (restored toupstream-patches/2026-06-23 after being dropped in the migration)
Removed earlier as redundant (pages kept as historical records): patch-ox-core-mw-values (runtime patches 004/009), patch-post-renderers (now source patch patch_post_renderers; the deployer's neutralizePostRenderers is the clone-step backstop), patch-intercom-redis-username (runtime patches 018/019).
Render-time fixes — applied as declarative YAML in the shared runtime-patches/rendered-patches/*.yaml (per-patch version scoping via appliesTo)
- fix-subpath-casing —
subpath:→subPath:(SSA rejects lowercase) - fix-spurious-port — remove spurious
port:in containerPort blocks - fix-probe-enabled — remove
enabled:from probes (ox-connector chart bug) - fix-empty-protocol —
protocol:(empty) →protocol: TCP(SSA merge-key conflict) - fix-nginx-mountpath — fix matrix widget chart mountPath / ConfigMap key placement
- fix-ox-initconfigdb-root-pwd — fix initconfigdb root password and
--skip-ssl-verify-server-cert - fix-mysql-root-pwd-injection — inject MYSQL_ROOT_PASSWD into OX bootstrap Job
- fix-bootstrap-ssl-certs — inject self-signed CA into Keycloak/OpenProject/Nextcloud bootstrap Jobs
- fix-hook-deletion-policy —
HookSucceeded→BeforeHookCreation,HookSucceededon Jobs (SSA immutability) - fix-helm-hook-to-argo-hook — inject
argocd.argoproj.io/hook: PostSyncforhelm.sh/hook-only Jobs - fix-jgroups-timeout — Keycloak JGroups
sock_conn_timeout=1000for fast stale-member detection - fix-resource-dedup — deduplicate resource documents (some helmfile child-charts render twice)
- fix-secrets-to-eso — convert plaintext Secrets to ExternalSecret CRs (when SM configured)
- fix-openproject-bootstrap-backoff — raise OpenProject bootstrap
backoffLimitfrom 6 to 20 - fix-inventory-guard-orphan-sidecars — guard against orphan Synapse sidecars without homeserver
- fix-ums-probes — inject probes into
ums-udm-rest-apiDeployment - fix-ox-appsuite-ingress — (legacy <2.28) OX appsuite-api ingress regex
- fix-intercom-redis-username —
usernameSecret stringData +REDIS_USERenv - fix-synapse-federation-whitelist — Synapse
federation_domain_whitelist - fix-dedup-env-vars — workload env-var dedup
- fix-jobs-remove-hook-ttl — strip
ttlSecondsAfterFinishedon hook Jobs
Runbooks
- runbook-local-dev-stack — build/run/stop/status the 5-service dev stack locally with
start.sh(start/stop/statuscommands +--container-based/--background/--git-token/--reset-admin) - runbook-force-sync-application — force-sync an ArgoCD Application
- runbook-hard-refresh-application — hard-refresh (clear cache) an Application
- runbook-reset-checkpoint — HISTORICAL (checkpointing removed); now points to the Lease/
--force-leaseescape hatch - runbook-list-customer-apps — list every Application for a
customerId - runbook-debug-deployer-stream — tail the deployer's SSE event stream
- runbook-check-argocd-sync-status — query sync/health/operationPhase for an Application
- runbook-remove-finalizer — remove
resources-finalizer.argocd.argoproj.iobefore delete - runbook-get-argocd-password — extract the ArgoCD initial admin password
Incidents
- incident-force-sync-vs-ssa — force-sync interacting badly with Server-Side Apply
- incident-comparisonerror-duplicate-env-vars — ComparisonError from duplicate env vars in init containers
- incident-helm-template-no-release-name —
helm templatefails without--release-name; root cause - incident-applicationset-finalizer-wedge — ApplicationSet wedge from leftover
resources-finalizer(legacy era) - incident-cmp-mode-removed — context for the CMP-mode removal; what cleanup remains
- incident-sev1-cross-instance-contamination — sev1 where one instance's state leaked into another
- incident-synapse-sidecars-without-homeserver — orphan Synapse sidecars rendered without their homeserver
- incident-logs-menu-bug — Cockpit UI Logs page hidden when changing menu
Related topics
- infrastructure —
runtime-state.jsonis the input gate for the deployer; STACKIT-side recovery lives there - apps — the apps these steps actually deploy; per-app incidents and runbooks
- idp — Keycloak/Nubus realm bootstrap and IdP-related incidents
- security —
concept-master-passwordand ESO setup are referenced by the rendering pipeline - config — input fields the deployer consumes
- sizing —
platformSizingis read bystep-07-generate-helmfile-values
When to add a page here
- A new deployer step, helmfile feature, or ArgoCD mechanism —
concept-*,step-* - A new upstream-chart workaround —
patch-*(fork pipeline) orfix-*(render-time YAML) - A new operational procedure for the deployer / ArgoCD / git-side artifacts —
runbook-* - A deployer or ArgoCD incident with a distinct root cause —
incident-* - A deployer-architecture decision —
decision-*
App-specific runtime issues that don't change the deployer's behavior belong in apps. STACKIT/infra-side issues belong in infrastructure. IdP-specific deployment issues belong in idp (cross-link from here when the deployer code is involved).