Skip to content

ECS Auto-Deploy Runbook

One-time setup to enable GitHub Actions → Cloudflare Tunnel SSH → ECS deploys. After this runs, every push to main deploys automatically with health-gated rollback. See docs/superpowers/specs/2026-05-14-ecs-deploy-design.md for the design rationale.

Prereqs

  • ECS box already provisioned via deploy/alibaba-ecs-install.sh.
  • cloudflared tunnel twilight-backend already running on ECS.
  • Repo admin access on GitHub (for secrets + variables).
  • Cloudflare account access with Zero Trust → Access permissions.

1. Generate deploy SSH key (on your laptop)

bash
ssh-keygen -t ed25519 -f ~/.ssh/twilight-ecs-deploy -C "twilight-ecs-deploy" -N ""
# Public key:  ~/.ssh/twilight-ecs-deploy.pub  -> upload to ECS
# Private key: ~/.ssh/twilight-ecs-deploy      -> paste into GH secret

2. Provision deploy user on ECS

Copy the public key to the box, then re-run install.sh with the DEPLOY_PUBKEY_FILE variable pointing at it:

bash
scp ~/.ssh/twilight-ecs-deploy.pub root@39.106.170.204:/tmp/deploy.pub
ssh root@39.106.170.204
cd ~/twilight/source
DEPLOY_PUBKEY_FILE=/tmp/deploy.pub bash deploy/alibaba-ecs-install.sh

This creates user deploy, adds it to the docker group, installs /usr/local/bin/twilight-deploy, and writes the forced-command entry to /home/deploy/.ssh/authorized_keys.

Verify:

bash
sudo -u deploy /usr/local/bin/twilight-deploy </dev/null 2>&1 || true
# Expected: "invalid tag:" — the script is reachable.

3. Configure Cloudflare Access

In Cloudflare dashboard → Zero Trust → Access → Applications → Add:

  • Application type: Self-hosted
  • Application name: twilight-ssh-ecs
  • Subdomain: ssh-ecs
  • Domain: fsagent.cc
  • Identity providers: none (uncheck all)
  • Policy: name service-token-only, Action Service Auth, Include Service Token → create a new token named gh-actions-deploy
  • Copy the Client ID and Client Secret that appear — they only show once.

4. Add SSH ingress to cloudflared

On ECS, edit the tunnel config (location varies; commonly /etc/cloudflared/config.yml or ~/.cloudflared/config.yml):

Add the snippet from deploy/cloudflared-ssh-ingress.yml.snippet above the catch-all 404 entry, then:

bash
sudo systemctl restart cloudflared

Add the DNS record pointing the hostname at the tunnel:

bash
cloudflared tunnel route dns twilight-backend ssh-ecs.fsagent.cc

5. Wire GitHub repo

In Settings → Secrets and variables → Actions:

Secrets:

  • ECS_SSH_PRIVATE_KEY — paste contents of ~/.ssh/twilight-ecs-deploy
  • CF_ACCESS_CLIENT_ID — from step 3
  • CF_ACCESS_CLIENT_SECRET — from step 3

Variables:

  • ECS_DEPLOY_USER = deploy
  • ECS_SSH_HOSTNAME = ssh-ecs.fsagent.cc

6. Smoke test from your laptop first

Confirm the path works before letting CI try:

bash
cloudflared access ssh-config --hostname ssh-ecs.fsagent.cc
# Then:
ssh -o ProxyCommand="cloudflared access ssh --hostname %h" \
    -i ~/.ssh/twilight-ecs-deploy \
    deploy@ssh-ecs.fsagent.cc "deploy sha-abc1234"
# Expected: "invalid tag" if you used a fake SHA, OR a real deploy attempt
# if you used a real one.

7. Trigger a deploy

Push any small change to main. The Docker build runs; on success the deploy workflow fires automatically. Watch in Actions tab.

8. One-time: mark initial Prisma migration as applied on ECS

After merging PR-D (Prisma option B switch), run this once on ECS before the next deploy. This tells Prisma the 0001_init migration is already applied (it was applied via db push in prior deploys).

bash
# SSH to ECS as twilight user (has DATABASE_URL in environment)
ssh root@39.106.170.204

# Get DATABASE_URL from app .env
export DATABASE_URL=$(grep '^DATABASE_URL=' /home/twilight/twilight-app/.env | cut -d= -f2-)

# Pull the new backend image (sha with 0001_init in migrations/)
docker pull crpi-61k80mbsluufppdv.cn-beijing.personal.cr.aliyuncs.com/lacatfly/twilight-drive-backend:latest

# Mark migration as already applied — DOES NOT run any SQL, only inserts a row
# into _prisma_migrations table. Safe to run on live DB.
docker run --rm \
  -e DATABASE_URL="$DATABASE_URL" \
  --network twilight-app_default \
  crpi-61k80mbsluufppdv.cn-beijing.personal.cr.aliyuncs.com/lacatfly/twilight-drive-backend:latest \
  npx prisma migrate resolve --applied 0001_init

# Verify
docker run --rm \
  -e DATABASE_URL="$DATABASE_URL" \
  --network twilight-app_default \
  crpi-61k80mbsluufppdv.cn-beijing.personal.cr.aliyuncs.com/lacatfly/twilight-drive-backend:latest \
  npx prisma migrate status
# Expected: "All migrations have been applied"

After this, prisma migrate deploy in ecs-deploy.sh will work correctly: it will find 0001_init already applied and skip it, running only new migrations.

⚠️ If this step is skipped: migrate deploy will try to run 0001_init against an existing DB → fails on duplicate tables → deploy aborts → no data loss, but deploy blocked until resolve is run.

Recovery

If a deploy auto-rollback leaves the box on a known-good prior version, the workflow run is red but service is healthy. Investigate at leisure.

If both new and previous fail (extremely unlikely — would mean previous was already broken), SSH as root, set TWILIGHT_VERSION to a known-good tag in ~/twilight/.env, then docker compose -f ~/twilight/source/deploy/compose.yml up -d data.

Revocation

Revoke the deploy key without touching ECS by deleting the CF Access service token in the Cloudflare dashboard. Even with the SSH private key, no connection succeeds without a valid CF Access token. As a second layer, remove ECS_SSH_PRIVATE_KEY from GH secrets.

团队内部文档