主题
NewAPI Deployment Plan
Status: ⏳ Revised 2026-05-14 (v2) · ready to execute Scope: Deploy NewAPI LLM proxy gateway on the Vultr Japan VPS (outside the GFW), fronted by a dedicated Cloudflare Tunnel. Both ECS-side and Vultr-side workloads call it.
Goal: Put a single reliable LLM gateway in front of all model calls — twilight-backend (on Alibaba ECS) and Hermes profile containers (also on ECS) both route through one endpoint at https://llm.fsagent.cc/v1, instead of each container managing its own provider keys and outbound proxy.
v1 → v2 Pivot Summary
The original v1 plan placed NewAPI on the Alibaba ECS in Chengdu and added a sing-box + shadowsocks-rust hop to a Vultr Japan relay so NewAPI could reach overseas APIs. v2 moves NewAPI itself to Vultr Japan, which sits outside the GFW. That collapses three moving parts (gateway + sing-box + relay) into one and removes the GFW bypass plumbing entirely.
Consequences:
- No sing-box on ECS. No SOCKS5 env vars in NewAPI's container.
- The ss-rust server already installed on Vultr (port 41388, see [[ss_rust_vultr_jp]]) is now unused for this purpose. Keep or remove (see §11).
- Vultr-side memory budget is tighter than ECS's would have been. Hard-cap NewAPI at 384 MB.
- ECS-side consumers (twilight-backend, Hermes profile containers) reach NewAPI through the public Cloudflare Tunnel at
https://llm.fsagent.cc/v1, secured with CF Access service tokens.
Architecture
Hermes containers (Alibaba ECS, Chengdu) ─┐
twilight-backend (Alibaba ECS, Chengdu) ──┤ → https://llm.fsagent.cc/v1
│ (CF Tunnel, Access service token)
│
cloudflared (Vultr Japan)
│
http://127.0.0.1:3000
▼
NewAPI (Vultr Japan)
│
(direct — Japan reaches upstream)
▼
OpenAI · Anthropic · OpenRouter · Groq · …The Vultr-local twilight-backend zombie container (if still running) can also hit http://127.0.0.1:3000/v1 via loopback. Anything in a separate network namespace (ECS hosts, Hermes containers) must go through the tunnel.
Repo Surface
deploy/
├── newapi-compose.yml # NewAPI container (bridge, :3000 loopback)
├── newapi-env.example # SESSION_SECRET skeleton
└── newapi-cloudflared-config.yml.template # Tunnel config for llm.fsagent.ccHost Topology on Vultr
| User | Owns | Notes |
|---|---|---|
openclaw | twilight-backend zombie container, prd-dashboard cloudflared | PRD-related; do not co-mingle |
root | NewAPI (/opt/newapi), NewAPI's cloudflared (system service) | New ownership for this work |
nobody (system) | shadowsocks-rust (/etc/shadowsocks-rust/config.json) | Already running; keep or remove (§11) |
linuxuser, ops | (unrelated) | Out of scope |
NewAPI deploys as root in /opt/newapi. Creating a dedicated twilight user on Vultr was considered and rejected — too much new infra (systemd-user lingering, docker group membership, ssh key install) for a single-service deployment. Root + /opt is the simplest match for Vultr's existing pattern.
§1 — Resource Budget
Vultr is 1 vCPU / 951 MB RAM / 3 GB swap (as of 2026-05-14). Pre-NewAPI footprint:
twilight-backend(zombie): ~191 MBfordefi-signer: ~30 MBcloudflared(openclaw, prd-dashboard): ~50 MBssserver: ~6.5 MBdockerd+ host services: ~100 MB
Available: ~398 MB free, but 1.8 GB swap is already in use — the host is memory-pressured. NewAPI's typical RSS under low load is 150–250 MB; under burst it can climb past 400 MB.
Cap: cpus: 0.50, memory: 384m. Prefer OOM-kill over swap-thrash. If NewAPI gets evicted frequently under real load, the answer is to upsize Vultr, not to relax the cap.
Pre-flight cleanup options (only if cap is too tight in practice):
- Stop the Vultr-side
twilight-backendzombie. Verify nothing depends on it first —api.fsagent.ccshould already resolve to the ECS tunnel. - Audit
fordefi-signer— 7 weeks uptime, unrelated workload, owner unknown.
§2 — Cloudflare Tunnel
A new dedicated tunnel named newapi-jp carries llm.fsagent.cc. It is NOT the same as openclaw's prd-dashboard tunnel, even though both processes live on the same host. Separation of concerns: PRD failures or config errors must not knock out twilight-drive's LLM gateway, and vice versa.
Run NewAPI's cloudflared as a system service (/etc/systemd/system/newapi-cloudflared.service), not as a --user unit, since NewAPI itself runs as root. The system unit is also reproducible by cloudflared service install.
Cloudflare dashboard checklist:
- Zone
fsagent.cc→ SSL/TLS → Full (not Full Strict — origin is HTTP loopback) - Zone
fsagent.cc→ SSL/TLS → Always Use HTTPS on - Zone
fsagent.cc→ Network → WebSockets on, HTTP/2 to origin off, HTTP/3 on - Zero Trust → Networks → Tunnels → create
newapi-jp - Zero Trust → Networks → Tunnels →
newapi-jp→ Public Hostname →llm.fsagent.cc→http://127.0.0.1:3000with advanced:connectTimeout: 30s keepAliveTimeout: 120s noTLSVerify: false http2Origin: false
§3 — Cloudflare Access + WAF
| Surface | Policy |
|---|---|
/v1/* | CF Access service token required (machine-to-machine) |
/api/* | CF Access email OTP required (human admin) |
/ (root, login UI) | CF Access email OTP required (human admin) |
| Everything else | Default-deny via the WAF rule below |
WAF rate-limit rule scoped to /v1/* only — admin endpoints stay un-rate-limited. Suggested limit: 60 requests / 10 seconds per IP. Adjust after seeing real traffic.
The service token used by twilight-backend / Hermes is added to their .env as NEWAPI_CF_ACCESS_CLIENT_ID and NEWAPI_CF_ACCESS_CLIENT_SECRET. NewAPI itself never sees these — they're CF Access headers, terminated at the edge.
§4 — Cloudflared Install Path
Vultr can reach pkg.cloudflare.com and github.com directly. Pick one of:
Option A — apt (preferred):
bash
curl -fsSL https://pkg.cloudflare.com/cloudflare-main.gpg | tee /usr/share/keyrings/cloudflare-main.gpg >/dev/null
echo 'deb [signed-by=/usr/share/keyrings/cloudflare-main.gpg] https://pkg.cloudflare.com/cloudflared jammy main' | tee /etc/apt/sources.list.d/cloudflared.list
apt-get update && apt-get install -y cloudflaredOption B — binary:
bash
curl -L https://github.com/cloudflare/cloudflared/releases/latest/download/cloudflared-linux-amd64 -o /usr/local/bin/cloudflared
chmod +x /usr/local/bin/cloudflaredEither works. Option A gets auto-updates; B pins version. Default to A.
§5 — NewAPI Compose Hardening
deploy/newapi-compose.yml (already written) does:
- Bridge networking,
127.0.0.1:3000:3000— public ingress flows through cloudflared only cpus: 0.50,memory: 384m— fits Vultr's headroomlogging: json-filewith rotation (10 MB × 3 files)healthcheck: GET http://127.0.0.1:3000/every 30 s- No proxy env vars — Vultr egress is unfiltered
TZ=Asia/Tokyo— matches host
§6 — Env File
/opt/newapi/.env (mode 600 root:root):
SESSION_SECRET=<openssl rand -hex 32>
# Optional pin:
# NEWAPI_IMAGE=calciumion/new-api:v0.7.0NewAPI reads its config primarily from the admin UI; this file holds boot-time secrets only.
§7 — NewAPI First-Run Checklist
After docker compose up -d and the tunnel is live:
- [ ] Visit
https://llm.fsagent.cc/and log in asadmin / 123456 - [ ] Rotate admin password immediately (save to keychain entry
newapi-admin-vultr-jp) - [ ] Settings → General → disable public registration
- [ ] Settings → General → set the site URL to
https://llm.fsagent.cc - [ ] Channels → add OpenRouter (Claude family, GPT-4o family)
- [ ] Channels → add SiliconFlow (Qwen, DeepSeek) — domestic, fast for ECS-side calls
- [ ] Channels → add Anthropic direct (if you hold a key)
- [ ] Tokens → create token
twilight-drive, scope to needed models - [ ] Save token to keychain entry
newapi-token-twilight-drive - [ ] Add token to
deploy/.envasNEWAPI_API_KEY(twilight-backend) and to the Hermes profile template
§8 — Twilight-Drive Wiring
Add to ECS ~/twilight/.env:
bash
NEWAPI_BASE_URL=https://llm.fsagent.cc/v1
NEWAPI_API_KEY=<from §7>
NEWAPI_CF_ACCESS_CLIENT_ID=<from CF Access service token>
NEWAPI_CF_ACCESS_CLIENT_SECRET=<from CF Access service token>These keys are not pre-baked into deploy/env.example — they're consumer-side secrets, written into .env at install time and rotated on leak. The example file documents the backend env contract; this set documents the gateway consumer contract and is mirrored in the Hermes profile template.
twilight-backend on ECS calls NewAPI through the public tunnel — there is no loopback path between ECS and Vultr. The same is true for every Hermes container on ECS.
Hermes profile template (profile/template-stock-research-pro/) gets:
OPENAI_BASE_URL=https://llm.fsagent.cc/v1OPENAI_API_KEY=<NEWAPI_API_KEY>OPENAI_DEFAULT_HEADERS={"CF-Access-Client-Id":"...","CF-Access-Client-Secret":"..."}
The Hermes agent SDK is OpenAI-compatible, so the gateway model swap is transparent — only the headers and base URL change.
§9 — Verification
bash
# Vultr loopback
ssh vultr 'curl -fsS -o /dev/null -w "%{http_code}\n" http://127.0.0.1:3000'
# Public endpoint (anonymous → 403 from CF Access)
curl -I https://llm.fsagent.cc
# Public endpoint (with service token → 200)
curl -I https://llm.fsagent.cc \
-H "CF-Access-Client-Id: $NEWAPI_CF_ACCESS_CLIENT_ID" \
-H "CF-Access-Client-Secret: $NEWAPI_CF_ACCESS_CLIENT_SECRET"
# Model list
curl https://llm.fsagent.cc/v1/models \
-H "Authorization: Bearer $NEWAPI_API_KEY" \
-H "CF-Access-Client-Id: $NEWAPI_CF_ACCESS_CLIENT_ID" \
-H "CF-Access-Client-Secret: $NEWAPI_CF_ACCESS_CLIENT_SECRET"
# Streaming completion (verifies SSE through tunnel)
curl -N -X POST https://llm.fsagent.cc/v1/chat/completions \
-H "Authorization: Bearer $NEWAPI_API_KEY" \
-H "CF-Access-Client-Id: $NEWAPI_CF_ACCESS_CLIENT_ID" \
-H "CF-Access-Client-Secret: $NEWAPI_CF_ACCESS_CLIENT_SECRET" \
-H "Content-Type: application/json" \
-d '{"model":"gpt-4o-mini","stream":true,"messages":[{"role":"user","content":"hi"}]}'
# Live tail
ssh vultr 'journalctl -u newapi-cloudflared -f'
ssh vultr 'docker logs -f new-api'
ssh vultr 'docker stats --no-stream new-api'§10 — Upgrade / Rollback
bash
ssh vultr
cd /opt/newapi
docker compose pull && docker compose up -d
docker logs -f new-api
# Rollback: pin `NEWAPI_IMAGE=calciumion/new-api:<prev-tag>` in .env,
# `docker compose up -d`. Data at /opt/newapi/new-api-data survives.§11 — ss-rust on Vultr (orphaned)
The shadowsocks-rust SIP022 server installed during v1 (port 41388) has no consumer in v2. Options:
| Option | Cost | When to pick |
|---|---|---|
| Keep as a generic outbound relay | ~6.5 MB RAM | If any future workload on ECS (or elsewhere) needs an offshore exit |
| Remove | nil after one-time cleanup | If no consumer materializes within the next sprint |
Recommend keep for now — it's tiny, already hardened, and removing it now and reinstalling later is more work than leaving it. Revisit after v0.4.0 lands. See [[ss_rust_vultr_jp]].
Cleanup commands if removing:
bash
ssh root@139.180.196.53
systemctl disable --now shadowsocks-rust
rm -f /etc/systemd/system/shadowsocks-rust.service /usr/local/bin/ssserver /usr/local/bin/sslocal
rm -rf /etc/shadowsocks-rust
ufw delete allow 41388/tcp
ufw delete allow 41388/udp
# Then delete the keychain entry locally:
security delete-generic-password -s "ss-rust-vultr-jp"§12 — Vultr Zombie Cleanup (separate decision)
The Vultr twilight-backend container is left over from before the ECS cutover. It's still running (healthy) but api.fsagent.cc now resolves through ECS's ec125552-... tunnel, so no public traffic reaches it.
Two actions to take soon, not blocking this plan:
- Decommission the Vultr
twilight-backendcontainer —docker stopthendocker rm, and prune theopenclaw-owned compose tree at/home/openclaw/twilight/source/deploy. - Verify
fordefi-signerownership and decide whether it stays.
Both free a few hundred MB on Vultr, which gives NewAPI more breathing room and lets the 384 MB cap be raised if needed.
Execution Order
- Pre-flight on Vultr — confirm pkg.cloudflare.com reachable, check free memory, check ports 3000 / 9092 free.
- Bootstrap
/opt/newapi— root-owned tree,docker-compose.yml,.envwith freshSESSION_SECRET. docker compose pull && docker compose up -d— verifycurl -I http://127.0.0.1:3000returns 200/302.- Create CF tunnel
newapi-jp— Zero Trust → Networks → Tunnels. - Install cloudflared on Vultr (Option A apt, or B binary).
cloudflared service install <TOKEN>— registers/etc/systemd/system/cloudflared.service.- Add public hostname
llm.fsagent.cc→http://127.0.0.1:3000(with the originRequest block). - CF SSL/TLS settings — Full + Always HTTPS + WebSockets on + HTTP/2 origin off.
- CF Access policies — service token for
/v1/*, email OTP for/api/*and/. - CF WAF — rate-limit
/v1/*. - NewAPI first-run (§7).
- Wire twilight-drive (§8) — update
deploy/.envon ECS, restart backend. - End-to-end verification (§9).
Risks
| Risk | Mitigation |
|---|---|
| Vultr memory exhaustion → NewAPI OOM-killed | memory: 384m hard cap; monitor docker stats new-api; clean up zombie twilight-backend (§12) if cap proves tight |
cloudflared system unit vs openclaw --user unit collision | Different unit names (cloudflared.service system vs cloudflared.service user) live in different scopes — no conflict by design, but verify with systemctl status cloudflared after install |
| Vultr is single point of failure | NewAPI down = all model calls fail. Accept for v0.4.0; revisit after billing data justifies HA |
| ECS → Vultr added latency for LLM calls | Tunnel adds ~50 ms vs loopback; SSE streaming amortizes this. Acceptable for v0.4.0 |
| CF Access service token leakage | Store in Keychain only; rotate on any chat-paste leak (same pattern as [[secrets_keychain]]) |
GFW blocks llm.fsagent.cc later | Domain is on Cloudflare; if blocked, swap to a different sub-zone or move to Cloudflare Workers fronting |
Out of Scope
- Hermes image rebuild to bake in NewAPI base URL / headers (separate v0.4.0 work)
- Payment → provisioning bridge (P1.1)
- Prometheus scrape of NewAPI metrics (
:9092is wired but no scraper exists yet) - HA / multi-region NewAPI (revisit after real usage data)
- Decommissioning the Vultr zombie
twilight-backend(tracked separately, §12)