主题
Deploy pipeline — follow-up tasks
日期: 2026-05-16 状态: Open — pick up after /clear
Continuation backlog after today's ACR cutover. Today's work landed PRs #74–#86 and produced a live twilight-data container served from ACR via --env-file-fixed ecs-deploy.sh. Everything below is still TODO.
Issue references in (R#) point to entries in docs/err/2026-05-16-deploy-pipeline-acr-cutover.md.
Priority 1 — finish the auto-deploy story
T1. Symlink /usr/local/bin/twilight-deploy to repo path (R2)
Stops the drift between repo deploy/ecs-deploy.sh and the version actually executed by the SSH forced-command. One-shot on the ECS:
bash
ssh root@ssh-ecs.fsagent.cc bash -lc '
cp /usr/local/bin/twilight-deploy /usr/local/bin/twilight-deploy.bak.$(date +%s)
ln -sf /home/twilight/twilight/source/deploy/ecs-deploy.sh /usr/local/bin/twilight-deploy
ls -la /usr/local/bin/twilight-deploy
'Then verify a deploy latest SSH call still works.
T2. Consolidate .env paths (R1)
Either:
- symlink
/home/deploy/twilight/.env -> /home/twilight/twilight/.envand remove the duplicate file, OR - change the forced command to
TWILIGHT_HOME=/home/deploy/twilight(mustchown -R deploy:deploy /home/deploy/twilight/sourcefirst; source is currently a symlink into/home/twilight/twilight/source).
Recommend the symlink — minimal change, no ownership thrash.
T3. Make source/ a real git repo (R3)
Needed for Phase 3 (the git pull step in app-stack-auto-deploy.md).
bash
ssh root@ssh-ecs.fsagent.cc bash -lc '
cd /home/twilight/twilight/source
sudo -u twilight git init
sudo -u twilight git remote add origin https://github.com/LaCatFly/twilight-drive.git
sudo -u twilight git fetch --quiet origin main
sudo -u twilight git reset --hard origin/main
# the deploy user needs read on .git too:
chgrp -R twilight /home/twilight/twilight/source/.git
chmod -R g+r /home/twilight/twilight/source/.git
'After this, git pull is the source-update mechanism on the host.
T4. Phase 3 — extend ecs-deploy.sh for the app stack (R4)
Already drafted in 2026-05-16-app-stack-auto-deploy.md (Phase 3 section). Has been blocked on T1+T3. Once they land:
- pull data + app-backend + app-frontend in parallel,
- run
prisma db push --skip-generatefor the NestJS schema, - 4-service health gate (data + backend + frontend + mcp-tushare),
- all-or-nothing rollback.
The Pre-Deploy Checklist (Phase 6.0 in the same doc) is also still unimplemented. Externally-dependent checks (NewAPI reachability, SearXNG, Hermes upstream image) belong in the GH Action, not in the SSH-executed script.
T5. Add mcp-tushare to the matrix or formally document opt-out (R5)
Either:
- add a fourth row to the
docker-build.ymlmatrix (name: mcp-tushare,dockerfile: mcp/tushare_mcp/Dockerfile,context: .,repo: twilight-mcp-tushare) and wire it through to the composeimage:line, OR - write a one-line ops doc explicitly saying "mcp-tushare is hand-built and hand-tagged; bump it by editing
/home/twilight/twilight/.env'sMCP_TUSHARE_VERSION=line, thencompose up -d mcp-tushare".
Pick one. Current state is "neither", which is what produced the "deploys mcp-tushare but doesn't build it" gap (R5).
Priority 2 — hygiene
T6. Secret-rotation helper (R8)
scripts/deploy/secrets-rotate.sh. Inputs: which secret (ACR fixed password / GHCR PAT / NewAPI admin pwd / etc.). Outputs:
- update macOS keychain entry,
gh secret setthe new value,- ssh to ECS, refresh
~/.docker/config.jsonand any.envline.
Reduces the multi-hop manual sync that the cutover exposed.
T7. core (ruff + pytest) is red on main (R7)
Either restore core.data or delete the orphaned tests. Today everyone is colour-blind because every PR fails this check identically. gh pr merge --squash ignored UNSTABLE so 10 PRs went in anyway. Don't let that become permanent — fix or remove the test.
T8. Unlock local ~/.docker/config.json (R6)
chflags nouchg ~/.docker/config.json (or whatever set the lock). Document in docs/deploy/acr-mirror-setup.md so future manual.sh users do not waste an hour on the DOCKER_CONFIG=/tmp/... workaround.
T9. Audit-trail entry for the ACR password in keychain
Memory secrets_keychain.md doesn't yet list aliyun-acr-twilight-drive. Update on next memory pass.
Priority 3 — adjacent work that the cutover deferred
T10. Hermes lifecycle rebuild
Stays open. Plan: 2026-05-16-instance-lifecycle-rebuild.md. Pre-reqs all merged (PR #79 health, PR #80 mcp healthz, PR #81 worker graceful). Next concrete step:
- Phase 1 Prisma schema migration (destructive — needs PR-review +
pg_dumpred-line perapp-stack-auto-deploy.mdPhase 4). - Then phases 2/3/4 (worker rewrite, service-runtime, frontend state machine).
T11. Hermes image pin enforcement on host
PR #75 merged the :0.13.0 pin into scripts/admin/spawn-profile.sh. Once T3 (git repo on ECS) is done, the host script will track the pin automatically. Until then, the ECS still has whatever was last extracted from the install tarball — verify by reading /home/twilight/twilight/source/scripts/admin/spawn-profile.sh and re-installing if needed.
T12. ACR repo retention policy
Personal Edition has a 5GB/day egress quota and storage caps. With daily-ish merges of three images, the registry will fill up. Add a weekly cron (or GH Action) that deletes ACR tags older than N days, keeping only latest, main, and the last few sha-*. Aliyun CLI supports this; can be a workflow step.
How to resume after /clear
Read this file plus
docs/err/2026-05-16-deploy-pipeline-acr-cutover.md.Check
git log origin/main --oneline -20— PRs #74-#86 should all be in.Health-check live state:
bashcurl -fsS https://api.fsagent.cc/healthz # data curl -fsS https://backend.fsagent.cc/health # backend curl -fsS https://app.fsagent.cc/ # frontendPick the highest-priority unfinished task (T1 if no other input).
The ACR password is in macOS keychain under
aliyun-acr-twilight-drive; CF Access service token is undertwilight-cf-access-deploy; SSH key is~/.ssh/alibaba-ecs(root) or~/.ssh/twilight-ecs-deploy(deploy user — restricted todeploy <tag>via forced command).