主题
ECS Auto-Deploy Runbook
One-time setup to enable GitHub Actions → Cloudflare Tunnel SSH → ECS deploys. After this runs, every push to main deploys automatically with health-gated rollback. See docs/superpowers/specs/2026-05-14-ecs-deploy-design.md for the design rationale.
Prereqs
- ECS box already provisioned via
deploy/alibaba-ecs-install.sh. cloudflaredtunneltwilight-backendalready running on ECS.- Repo admin access on GitHub (for secrets + variables).
- Cloudflare account access with Zero Trust → Access permissions.
1. Generate deploy SSH key (on your laptop)
bash
ssh-keygen -t ed25519 -f ~/.ssh/twilight-ecs-deploy -C "twilight-ecs-deploy" -N ""
# Public key: ~/.ssh/twilight-ecs-deploy.pub -> upload to ECS
# Private key: ~/.ssh/twilight-ecs-deploy -> paste into GH secret2. Provision deploy user on ECS
Copy the public key to the box, then re-run install.sh with the DEPLOY_PUBKEY_FILE variable pointing at it:
bash
scp ~/.ssh/twilight-ecs-deploy.pub root@39.106.170.204:/tmp/deploy.pub
ssh root@39.106.170.204
cd ~/twilight/source
DEPLOY_PUBKEY_FILE=/tmp/deploy.pub bash deploy/alibaba-ecs-install.shThis creates user deploy, adds it to the docker group, installs /usr/local/bin/twilight-deploy, and writes the forced-command entry to /home/deploy/.ssh/authorized_keys.
Verify:
bash
sudo -u deploy /usr/local/bin/twilight-deploy </dev/null 2>&1 || true
# Expected: "invalid tag:" — the script is reachable.3. Configure Cloudflare Access
In Cloudflare dashboard → Zero Trust → Access → Applications → Add:
- Application type: Self-hosted
- Application name:
twilight-ssh-ecs - Subdomain:
ssh-ecs - Domain:
fsagent.cc - Identity providers: none (uncheck all)
- Policy: name
service-token-only, ActionService Auth, IncludeService Token→ create a new token namedgh-actions-deploy - Copy the Client ID and Client Secret that appear — they only show once.
4. Add SSH ingress to cloudflared
On ECS, edit the tunnel config (location varies; commonly /etc/cloudflared/config.yml or ~/.cloudflared/config.yml):
Add the snippet from deploy/cloudflared-ssh-ingress.yml.snippet above the catch-all 404 entry, then:
bash
sudo systemctl restart cloudflaredAdd the DNS record pointing the hostname at the tunnel:
bash
cloudflared tunnel route dns twilight-backend ssh-ecs.fsagent.cc5. Wire GitHub repo
In Settings → Secrets and variables → Actions:
Secrets:
ECS_SSH_PRIVATE_KEY— paste contents of~/.ssh/twilight-ecs-deployCF_ACCESS_CLIENT_ID— from step 3CF_ACCESS_CLIENT_SECRET— from step 3
Variables:
ECS_DEPLOY_USER=deployECS_SSH_HOSTNAME=ssh-ecs.fsagent.cc
6. Smoke test from your laptop first
Confirm the path works before letting CI try:
bash
cloudflared access ssh-config --hostname ssh-ecs.fsagent.cc
# Then:
ssh -o ProxyCommand="cloudflared access ssh --hostname %h" \
-i ~/.ssh/twilight-ecs-deploy \
deploy@ssh-ecs.fsagent.cc "deploy sha-abc1234"
# Expected: "invalid tag" if you used a fake SHA, OR a real deploy attempt
# if you used a real one.7. Trigger a deploy
Push any small change to main. The Docker build runs; on success the deploy workflow fires automatically. Watch in Actions tab.
8. One-time: mark initial Prisma migration as applied on ECS
After merging PR-D (Prisma option B switch), run this once on ECS before the next deploy. This tells Prisma the 0001_init migration is already applied (it was applied via db push in prior deploys).
bash
# SSH to ECS as twilight user (has DATABASE_URL in environment)
ssh root@39.106.170.204
# Get DATABASE_URL from app .env
export DATABASE_URL=$(grep '^DATABASE_URL=' /home/twilight/twilight-app/.env | cut -d= -f2-)
# Pull the new backend image (sha with 0001_init in migrations/)
docker pull crpi-61k80mbsluufppdv.cn-beijing.personal.cr.aliyuncs.com/lacatfly/twilight-drive-backend:latest
# Mark migration as already applied — DOES NOT run any SQL, only inserts a row
# into _prisma_migrations table. Safe to run on live DB.
docker run --rm \
-e DATABASE_URL="$DATABASE_URL" \
--network twilight-app_default \
crpi-61k80mbsluufppdv.cn-beijing.personal.cr.aliyuncs.com/lacatfly/twilight-drive-backend:latest \
npx prisma migrate resolve --applied 0001_init
# Verify
docker run --rm \
-e DATABASE_URL="$DATABASE_URL" \
--network twilight-app_default \
crpi-61k80mbsluufppdv.cn-beijing.personal.cr.aliyuncs.com/lacatfly/twilight-drive-backend:latest \
npx prisma migrate status
# Expected: "All migrations have been applied"After this, prisma migrate deploy in ecs-deploy.sh will work correctly: it will find 0001_init already applied and skip it, running only new migrations.
⚠️ If this step is skipped: migrate deploy will try to run 0001_init against an existing DB → fails on duplicate tables → deploy aborts → no data loss, but deploy blocked until resolve is run.
Recovery
If a deploy auto-rollback leaves the box on a known-good prior version, the workflow run is red but service is healthy. Investigate at leisure.
If both new and previous fail (extremely unlikely — would mean previous was already broken), SSH as root, set TWILIGHT_VERSION to a known-good tag in ~/twilight/.env, then docker compose -f ~/twilight/source/deploy/compose.yml up -d data.
Revocation
Revoke the deploy key without touching ECS by deleting the CF Access service token in the Cloudflare dashboard. Even with the SSH private key, no connection succeeds without a valid CF Access token. As a second layer, remove ECS_SSH_PRIVATE_KEY from GH secrets.