Skip to main content

Architecture Decision Records — Orchestrator Portal

This directory contains the ADRs for the Orchestrator Portal: the Go-based multi-service platform (web UI + 5 backend services) that provisions STACKIT infrastructure and deploys the openDesk digital workplace suite via ArgoCD.

These ADRs document cross-cutting, portal-wide decisions — how the system is organised, how deployments flow end-to-end, how upgrades are made safe, how tenants are isolated. For service-level decisions (one ADR per service: infrastructure, deployer, monitoring, backup, IdP, web UI, secrets), see docs/ADRs/.


Index

Service architecture

How the portal is split into processes, how those processes communicate, where state lives.

#TitleSubject
001Microservices architectureFive separate Go binaries (web, infra, deployer, monitoring, backup) over HTTP — not a monolith
002Server / runner dual-mode patternOne binary, two modes: HTTP server by default, --mode-runner subprocess for step execution
003SSE / EventSource log streamingtext/event-stream on /stream with replay-on-connect — not WebSockets, not polling
004File-based instance stateprojects/<uuid>/instances/<iid>/ on disk — no database, tar czf is the backup

Deployment architecture

How rendered manifests reach the cluster, how ArgoCD is structured, how upstream is consumed.

#TitleSubject
005Rendered-manifests deployment modehelmfile template locally → push plain YAML to git → ArgoCD syncs YAML. CMP sidecar removed.
006Application-of-Applications patternOne parent Application + one child per helmfile release, generated in Go. No ApplicationSet.
007Sync-wave ordering via release-graph.yaml8 waves declared as data; ArgoCD enforces, deployer only polls
008Fork-and-pin upstream openDeskstable/v<X.Y.Z> branch in internal fork; source patches applied once per release

Step engine / orchestration

How the deployer sequences ~28 steps, resumes after failure, and recovers from convergence problems.

#TitleSubject
009Checkpoint-based step resumptiondeployer-checkpoint.json + classifyRun (Fresh / Resume / Upgrade) + version-sensitive replay
010Always-re-run stepsFive steps bypass the checkpoint (preflight, kubeconfig, Galera, upgrade-preflight, clone)
011Prerequisites before ArgoCD syncSteps 17–20 mutate the cluster while ArgoCD is still converging to break bootstrap deadlocks
012ArgoCD wait + recovery semantics90-min poll, Failed-grace, label-scoped cert recovery, ghost-hook detection, FATAL timeout

Patching, tenancy, sizing

How upstream is customised, how multiple customers coexist, how the platform scales by tier.

#TitleSubject
013Two-stage patch architectureFork-pipeline source patches + render-time YAML patches — deployer no longer carries patch code
014Declarative YAML patchesruntime-patches/v<X.Y.Z>/rendered-patches/*.yaml evaluated by apply.py; Go reserved for the dynamic minority
015customerId multi-tenancyOne rendered-manifests repo serves many customers via directory isolation + label scoping
016Registered-user sizing tiersTier keys = registered-user counts (50100k); t-shirt aliases kept for backward compat

Format

Each ADR follows the same structure:

# ADR-NNN: <Title>

## Status — Accepted / Superseded / Deprecated
## Date — ISO date of acceptance
## Context — The problem and constraints that forced a decision
## Decision — What we chose, in numbered subsections, with file:line references
## Consequences
### Positive
### Negative / Trade-offs
### Neutral
## Alternatives Considered — What we rejected and why
## Related — Sibling ADRs, code paths, wiki pages

ADRs are immutable once accepted. To change a decision, write a new ADR that explicitly supersedes the old one and update the Status of the superseded ADR.


Scope and adjacent docs

  • Portal-wide cross-cutting decisions → here (docs/adr/)
  • Service-level decisions (one ADR per backend service) → docs/ADRs/
  • How-to / runbook contentoperations/runbooks/
  • Atomic technical reference (one page per concept, fix, patch, incident) → wiki/, entry point wiki/HOME.md, index wiki/MOC.md
  • Deep code reference (step sequence, config fields, debugging) → docs/, specifically DEPLOYER.md, CONFIG.md, DEBUGGING.md, PATCHES.md

When the same topic appears in two places, the wiki is the searchable depth and ADRs explain why we built it this way. Update both when the underlying code changes.


A note on stale documentation

While authoring these ADRs the agents discovered several factual drift points in CLAUDE.md and the brief that were corrected against the live code:

  • The frontend is Vue 3, not React.
  • The Go workspace is Go 1.24.0 with 8 modules, not 1.22.3 / 7.
  • The checkpoint file is preserved on full success (with CompletedSteps cleared) — it is not deleted, because the next run needs LastSuccessfulTag to classify Fresh vs Upgrade.
  • The git host is gitlab.opencode.de, not git.opencode.de.
  • The Kubernetes label is opendesk.io/release-name, not opendesk.io/release.
  • generateChildAppYAMLs lives in 07c_push_rendered_manifests.go, not 07_create_argocd_app.go.
  • The sizing-tier set has 12 entries (50, 100, 150, 500, 1k, 2k, 3k, 5k, 10k, 20k, 50k, 100k), and the validator + canonical map are duplicated across four files.

Each affected ADR documents the actual state. CLAUDE.md should be refreshed in a follow-up pass.