Operations¶
Who is this page for?
Operators deploying and running GAME. Pairs with Configuration Reference
(every variable), Security (hardening), and Observability
(signals). The repository also keeps DEPLOYMENT.md and
KUBERNETES_SETUP.md as quick references.
Deployment topology¶
GAME is stateless: it holds no per-request server state beyond the database (and optional Redis). That means you scale it by running more identical replicas behind a load balancer; PostgreSQL and Redis are the only shared state.
┌────────────┐ ┌──────────────┐ ┌──────────────┐
│ Ingress / │────►│ GAME API │────►│ PostgreSQL │
│ Load bal. │ │ (N replicas)│ └──────────────┘
└────────────┘ │ gunicorn + │────►┌──────────────┐
│ │ uvicorn │ │ Redis │ (optional:
│ └──────┬───────┘ └──────────────┘ rate-limit +
▼ │ apikey cache)
┌────────────┐ ▼
│ Keycloak │◄───── JWT validation (JWKS)
└────────────┘
The process model in containers is gunicorn managing uvicorn workers
(app/gunicorn_conf.py, app/start-prod.sh).
Local & dev with Docker Compose¶
The repository ships several Compose files and a Makefile that wraps them
(auto-detecting docker compose v2 vs docker-compose v1):
Make target |
What it does |
|---|---|
|
First-run: installs Docker if missing, creates |
|
Dev stack ( |
|
Dev stack without a bundled DB (bring your own). |
|
Integrated stack ( |
|
Start in background / foreground. |
|
Tail logs (all services / just the API). |
|
Show running containers. |
|
Shell into the API container / |
|
Stop+remove containers / …and volumes (destructive). |
|
Run |
Override the compose file or command per invocation, e.g.
make up FILE=docker-compose.yml DC="docker-compose".
Raw Compose, without Make:
# Dev
docker-compose -f docker-compose-dev.yml up --build
docker-compose -f docker-compose-dev.yml down --remove-orphans
# Production-style single host
docker-compose up --build -d
docker-compose logs -f
docker-compose up --scale app=3 # horizontal scale
Production deployment¶
Configure the environment for
ENV=prod(orstage). The fail-fast guards will block boot on missing secrets - that is intended; see Configuration Reference and Security.Run migrations before serving traffic (see below).
Deploy the image with your orchestrator (Compose, Kubernetes, or a managed container platform), behind an ingress that terminates TLS.
Set ``TRUSTED_PROXY_IPS`` to the ingress IP/CIDR so per-IP rate limits work and forwarding headers are trusted.
Protect ``/metrics`` at the ingress, or set
METRICS_ENABLED=false.Externalize shared state: point
REDIS_URLand switchABUSE_PREVENTION_BACKEND/APIKEY_CACHE_BACKENDtoredisso limits and key revocations are consistent across replicas.
Kubernetes¶
Manifests live under kubernetes/ and a helper script
deploy-kubernetes.sh is provided. See KUBERNETES_SETUP.md for the full
walkthrough. Operational notes:
Define liveness/readiness probes -
GET /api/v1/kpi/health_checkis a natural readiness target.Provide configuration via
ConfigMap(non-secret) andSecret(SECRET_KEY, DB password,KEYCLOAK_CLIENT_SECRET).Roll back with
kubectl rollout undo deployment/<name>- Kubernetes keeps the deployment history.
Database migrations (Alembic)¶
Schema changes are Alembic migrations (migrations/, alembic.ini). The
golden rule: migrate before the new code serves traffic, in CI/CD.
# Local / Poetry
poetry run alembic upgrade head
# Inside a running container
docker-compose exec app alembic upgrade head
# Generate a new migration after a model change (review before committing!)
poetry run alembic revision --autogenerate -m "describe change"
Health, readiness & graceful shutdown¶
Health -
GET /api/v1/kpi/health_check.Graceful shutdown - the FastAPI lifespan hook flushes the DSL execution-log queue on shutdown (
observer.aclose()) so buffered audit rows aren’t lost. Give the container a few seconds of termination grace so the flush completes.
Scaling guidance¶
Lever |
Guidance |
|---|---|
Replicas / workers |
Scale horizontally; the app is stateless. Size gunicorn workers to CPU. |
DB pool |
Total connections ≈ replicas × workers × ( |
Rate-limit & cache backend |
Use |
DSL trace sink |
Watch |
Load & performance testing¶
A k6 load suite ships in tests/load with a runner:
./scripts/run_load_test.sh --mode 100 # 100 VUs
./scripts/run_load_test.sh --mode 1000 # stress
./scripts/run_load_test.sh --vus 300 \
--mix-a 60 --mix-b 30 --mix-c 10 \
--warmup 20s --hold 2m --ramp-down 20s
See Contributing for the full testing story and the README for every flag.
Runbooks¶
DSL strategy incidents (a published strategy erroring, hitting limits, or needing rollback) →
docs/dsl/runbook.mdand Strategies.“Network Error” in the dashboard → almost always a backend
500; check API logs (Observability).Boot failure in prod/stage → a fail-fast guard tripped; the error names the variable (Configuration Reference).