Architecture Decision Records — Orchestrator Portal
This directory contains the ADRs for the Orchestrator Portal: the Go-based multi-service platform (web UI + 5 backend services) that provisions STACKIT infrastructure and deploys the openDesk digital workplace suite via ArgoCD.
These ADRs document cross-cutting, portal-wide decisions — how the system
is organised, how deployments flow end-to-end, how upgrades are made safe, how
tenants are isolated. For service-level decisions (one ADR per service:
infrastructure, deployer, monitoring, backup, IdP, web UI, secrets), see
docs/ADRs/.
Index
Service architecture
How the portal is split into processes, how those processes communicate, where state lives.
| # | Title | Subject |
|---|---|---|
| 001 | Microservices architecture | Five separate Go binaries (web, infra, deployer, monitoring, backup) over HTTP — not a monolith |
| 002 | Server / runner dual-mode pattern | One binary, two modes: HTTP server by default, --mode-runner subprocess for step execution |
| 003 | SSE / EventSource log streaming | text/event-stream on /stream with replay-on-connect — not WebSockets, not polling |
| 004 | File-based instance state | projects/<uuid>/instances/<iid>/ on disk — no database, tar czf is the backup |
Deployment architecture
How rendered manifests reach the cluster, how ArgoCD is structured, how upstream is consumed.
| # | Title | Subject |
|---|---|---|
| 005 | Rendered-manifests deployment mode | helmfile template locally → push plain YAML to git → ArgoCD syncs YAML. CMP sidecar removed. |
| 006 | Application-of-Applications pattern | One parent Application + one child per helmfile release, generated in Go. No ApplicationSet. |
| 007 | Sync-wave ordering via release-graph.yaml | 8 waves declared as data; ArgoCD enforces, deployer only polls |
| 008 | Fork-and-pin upstream openDesk | stable/v<X.Y.Z> branch in internal fork; source patches applied once per release |
Step engine / orchestration
How the deployer sequences ~28 steps, resumes after failure, and recovers from convergence problems.
| # | Title | Subject |
|---|---|---|
| 009 | Checkpoint-based step resumption | deployer-checkpoint.json + classifyRun (Fresh / Resume / Upgrade) + version-sensitive replay |
| 010 | Always-re-run steps | Five steps bypass the checkpoint (preflight, kubeconfig, Galera, upgrade-preflight, clone) |
| 011 | Prerequisites before ArgoCD sync | Steps 17–20 mutate the cluster while ArgoCD is still converging to break bootstrap deadlocks |
| 012 | ArgoCD wait + recovery semantics | 90-min poll, Failed-grace, label-scoped cert recovery, ghost-hook detection, FATAL timeout |
Patching, tenancy, sizing
How upstream is customised, how multiple customers coexist, how the platform scales by tier.
| # | Title | Subject |
|---|---|---|
| 013 | Two-stage patch architecture | Fork-pipeline source patches + render-time YAML patches — deployer no longer carries patch code |
| 014 | Declarative YAML patches | runtime-patches/v<X.Y.Z>/rendered-patches/*.yaml evaluated by apply.py; Go reserved for the dynamic minority |
| 015 | customerId multi-tenancy | One rendered-manifests repo serves many customers via directory isolation + label scoping |
| 016 | Registered-user sizing tiers | Tier keys = registered-user counts (50 … 100k); t-shirt aliases kept for backward compat |
Format
Each ADR follows the same structure:
# ADR-NNN: <Title>
## Status — Accepted / Superseded / Deprecated
## Date — ISO date of acceptance
## Context — The problem and constraints that forced a decision
## Decision — What we chose, in numbered subsections, with file:line references
## Consequences
### Positive
### Negative / Trade-offs
### Neutral
## Alternatives Considered — What we rejected and why
## Related — Sibling ADRs, code paths, wiki pages
ADRs are immutable once accepted. To change a decision, write a new ADR that explicitly supersedes the old one and update the Status of the superseded ADR.
Scope and adjacent docs
- Portal-wide cross-cutting decisions → here (
docs/adr/) - Service-level decisions (one ADR per backend service) →
docs/ADRs/ - How-to / runbook content →
operations/runbooks/ - Atomic technical reference (one page per concept, fix, patch, incident) →
wiki/, entry pointwiki/HOME.md, indexwiki/MOC.md - Deep code reference (step sequence, config fields, debugging) →
docs/, specificallyDEPLOYER.md,CONFIG.md,DEBUGGING.md,PATCHES.md
When the same topic appears in two places, the wiki is the searchable depth and ADRs explain why we built it this way. Update both when the underlying code changes.
A note on stale documentation
While authoring these ADRs the agents discovered several factual drift points
in CLAUDE.md and the brief that were corrected against the live code:
- The frontend is Vue 3, not React.
- The Go workspace is Go 1.24.0 with 8 modules, not 1.22.3 / 7.
- The checkpoint file is preserved on full success (with
CompletedStepscleared) — it is not deleted, because the next run needsLastSuccessfulTagto classify Fresh vs Upgrade. - The git host is
gitlab.opencode.de, notgit.opencode.de. - The Kubernetes label is
opendesk.io/release-name, notopendesk.io/release. generateChildAppYAMLslives in07c_push_rendered_manifests.go, not07_create_argocd_app.go.- The sizing-tier set has 12 entries (
50, 100, 150, 500, 1k, 2k, 3k, 5k, 10k, 20k, 50k, 100k), and the validator + canonical map are duplicated across four files.
Each affected ADR documents the actual state. CLAUDE.md should be refreshed
in a follow-up pass.