NewAPI Deployment Plan

Status: ⏳ Revised 2026-05-14 (v2) · ready to execute Scope: Deploy NewAPI LLM proxy gateway on the Vultr Japan VPS (outside the GFW), fronted by a dedicated Cloudflare Tunnel. Both ECS-side and Vultr-side workloads call it.

Goal: Put a single reliable LLM gateway in front of all model calls — twilight-backend (on Alibaba ECS) and Hermes profile containers (also on ECS) both route through one endpoint at https://llm.fsagent.cc/v1, instead of each container managing its own provider keys and outbound proxy.

v1 → v2 Pivot Summary

The original v1 plan placed NewAPI on the Alibaba ECS in Chengdu and added a sing-box + shadowsocks-rust hop to a Vultr Japan relay so NewAPI could reach overseas APIs. v2 moves NewAPI itself to Vultr Japan, which sits outside the GFW. That collapses three moving parts (gateway + sing-box + relay) into one and removes the GFW bypass plumbing entirely.

Consequences:

No sing-box on ECS. No SOCKS5 env vars in NewAPI's container.
The ss-rust server already installed on Vultr (port 41388, see [[ss_rust_vultr_jp]]) is now unused for this purpose. Keep or remove (see §11).
Vultr-side memory budget is tighter than ECS's would have been. Hard-cap NewAPI at 384 MB.
ECS-side consumers (twilight-backend, Hermes profile containers) reach NewAPI through the public Cloudflare Tunnel at https://llm.fsagent.cc/v1, secured with CF Access service tokens.

Architecture

Hermes containers (Alibaba ECS, Chengdu) ─┐
twilight-backend (Alibaba ECS, Chengdu) ──┤  →  https://llm.fsagent.cc/v1
                                          │       (CF Tunnel, Access service token)
                                          │
                            cloudflared (Vultr Japan)
                                          │
                              http://127.0.0.1:3000
                                          ▼
                                   NewAPI (Vultr Japan)
                                          │
                          (direct — Japan reaches upstream)
                                          ▼
                  OpenAI · Anthropic · OpenRouter · Groq · …

The Vultr-local twilight-backend zombie container (if still running) can also hit http://127.0.0.1:3000/v1 via loopback. Anything in a separate network namespace (ECS hosts, Hermes containers) must go through the tunnel.

Repo Surface

deploy/
├── newapi-compose.yml                       # NewAPI container (bridge, :3000 loopback)
├── newapi-env.example                       # SESSION_SECRET skeleton
└── newapi-cloudflared-config.yml.template   # Tunnel config for llm.fsagent.cc

Host Topology on Vultr

User	Owns	Notes
`openclaw`	`twilight-backend` zombie container, `prd-dashboard` cloudflared	PRD-related; do not co-mingle
`root`	NewAPI (`/opt/newapi`), NewAPI's cloudflared (system service)	New ownership for this work
`nobody` (system)	`shadowsocks-rust` (`/etc/shadowsocks-rust/config.json`)	Already running; keep or remove (§11)
`linuxuser`, `ops`	(unrelated)	Out of scope

NewAPI deploys as root in /opt/newapi. Creating a dedicated twilight user on Vultr was considered and rejected — too much new infra (systemd-user lingering, docker group membership, ssh key install) for a single-service deployment. Root + /opt is the simplest match for Vultr's existing pattern.

§1 — Resource Budget

Vultr is 1 vCPU / 951 MB RAM / 3 GB swap (as of 2026-05-14). Pre-NewAPI footprint:

twilight-backend (zombie): ~191 MB
fordefi-signer: ~30 MB
cloudflared (openclaw, prd-dashboard): ~50 MB
ssserver: ~6.5 MB
dockerd + host services: ~100 MB

Available: ~398 MB free, but 1.8 GB swap is already in use — the host is memory-pressured. NewAPI's typical RSS under low load is 150–250 MB; under burst it can climb past 400 MB.

Cap: cpus: 0.50, memory: 384m. Prefer OOM-kill over swap-thrash. If NewAPI gets evicted frequently under real load, the answer is to upsize Vultr, not to relax the cap.

Pre-flight cleanup options (only if cap is too tight in practice):

Stop the Vultr-side twilight-backend zombie. Verify nothing depends on it first — api.fsagent.cc should already resolve to the ECS tunnel.
Audit fordefi-signer — 7 weeks uptime, unrelated workload, owner unknown.

§2 — Cloudflare Tunnel

A new dedicated tunnel named newapi-jp carries llm.fsagent.cc. It is NOT the same as openclaw's prd-dashboard tunnel, even though both processes live on the same host. Separation of concerns: PRD failures or config errors must not knock out twilight-drive's LLM gateway, and vice versa.

Run NewAPI's cloudflared as a system service (/etc/systemd/system/newapi-cloudflared.service), not as a --user unit, since NewAPI itself runs as root. The system unit is also reproducible by cloudflared service install.

Cloudflare dashboard checklist:

Zone fsagent.cc → SSL/TLS → Full (not Full Strict — origin is HTTP loopback)
Zone fsagent.cc → SSL/TLS → Always Use HTTPS on
Zone fsagent.cc → Network → WebSockets on, HTTP/2 to origin off, HTTP/3 on
Zero Trust → Networks → Tunnels → create newapi-jp
Zero Trust → Networks → Tunnels → newapi-jp → Public Hostname → llm.fsagent.cc → http://127.0.0.1:3000 with advanced:
```
connectTimeout: 30s
keepAliveTimeout: 120s
noTLSVerify: false
http2Origin: false
```

§3 — Cloudflare Access + WAF

Surface	Policy
`/v1/*`	CF Access service token required (machine-to-machine)
`/api/*`	CF Access email OTP required (human admin)
`/` (root, login UI)	CF Access email OTP required (human admin)
Everything else	Default-deny via the WAF rule below

WAF rate-limit rule scoped to /v1/* only — admin endpoints stay un-rate-limited. Suggested limit: 60 requests / 10 seconds per IP. Adjust after seeing real traffic.

The service token used by twilight-backend / Hermes is added to their .env as NEWAPI_CF_ACCESS_CLIENT_ID and NEWAPI_CF_ACCESS_CLIENT_SECRET. NewAPI itself never sees these — they're CF Access headers, terminated at the edge.

§4 — Cloudflared Install Path

Vultr can reach pkg.cloudflare.com and github.com directly. Pick one of:

Option A — apt (preferred):

bash

curl -fsSL https://pkg.cloudflare.com/cloudflare-main.gpg | tee /usr/share/keyrings/cloudflare-main.gpg >/dev/null
echo 'deb [signed-by=/usr/share/keyrings/cloudflare-main.gpg] https://pkg.cloudflare.com/cloudflared jammy main' | tee /etc/apt/sources.list.d/cloudflared.list
apt-get update && apt-get install -y cloudflared

Option B — binary:

bash

curl -L https://github.com/cloudflare/cloudflared/releases/latest/download/cloudflared-linux-amd64 -o /usr/local/bin/cloudflared
chmod +x /usr/local/bin/cloudflared

Either works. Option A gets auto-updates; B pins version. Default to A.

§5 — NewAPI Compose Hardening

deploy/newapi-compose.yml (already written) does:

Bridge networking, 127.0.0.1:3000:3000 — public ingress flows through cloudflared only
cpus: 0.50, memory: 384m — fits Vultr's headroom
logging: json-file with rotation (10 MB × 3 files)
healthcheck: GET http://127.0.0.1:3000/ every 30 s
No proxy env vars — Vultr egress is unfiltered
TZ=Asia/Tokyo — matches host

§6 — Env File

/opt/newapi/.env (mode 600 root:root):

SESSION_SECRET=<openssl rand -hex 32>
# Optional pin:
# NEWAPI_IMAGE=calciumion/new-api:v0.7.0

NewAPI reads its config primarily from the admin UI; this file holds boot-time secrets only.

§7 — NewAPI First-Run Checklist

After docker compose up -d and the tunnel is live:

[ ] Visit https://llm.fsagent.cc/ and log in as admin / 123456
[ ] Rotate admin password immediately (save to keychain entry newapi-admin-vultr-jp)
[ ] Settings → General → disable public registration
[ ] Settings → General → set the site URL to https://llm.fsagent.cc
[ ] Channels → add OpenRouter (Claude family, GPT-4o family)
[ ] Channels → add SiliconFlow (Qwen, DeepSeek) — domestic, fast for ECS-side calls
[ ] Channels → add Anthropic direct (if you hold a key)
[ ] Tokens → create token twilight-drive, scope to needed models
[ ] Save token to keychain entry newapi-token-twilight-drive
[ ] Add token to deploy/.env as NEWAPI_API_KEY (twilight-backend) and to the Hermes profile template

§8 — Twilight-Drive Wiring

Add to ECS ~/twilight/.env:

bash

NEWAPI_BASE_URL=https://llm.fsagent.cc/v1
NEWAPI_API_KEY=<from §7>
NEWAPI_CF_ACCESS_CLIENT_ID=<from CF Access service token>
NEWAPI_CF_ACCESS_CLIENT_SECRET=<from CF Access service token>

These keys are not pre-baked into deploy/env.example — they're consumer-side secrets, written into .env at install time and rotated on leak. The example file documents the backend env contract; this set documents the gateway consumer contract and is mirrored in the Hermes profile template.

twilight-backend on ECS calls NewAPI through the public tunnel — there is no loopback path between ECS and Vultr. The same is true for every Hermes container on ECS.

Hermes profile template (profile/template-stock-research-pro/) gets:

OPENAI_BASE_URL=https://llm.fsagent.cc/v1
OPENAI_API_KEY=<NEWAPI_API_KEY>
OPENAI_DEFAULT_HEADERS={"CF-Access-Client-Id":"...","CF-Access-Client-Secret":"..."}

The Hermes agent SDK is OpenAI-compatible, so the gateway model swap is transparent — only the headers and base URL change.

§9 — Verification

bash

# Vultr loopback
ssh vultr 'curl -fsS -o /dev/null -w "%{http_code}\n" http://127.0.0.1:3000'

# Public endpoint (anonymous → 403 from CF Access)
curl -I https://llm.fsagent.cc

# Public endpoint (with service token → 200)
curl -I https://llm.fsagent.cc \
  -H "CF-Access-Client-Id: $NEWAPI_CF_ACCESS_CLIENT_ID" \
  -H "CF-Access-Client-Secret: $NEWAPI_CF_ACCESS_CLIENT_SECRET"

# Model list
curl https://llm.fsagent.cc/v1/models \
  -H "Authorization: Bearer $NEWAPI_API_KEY" \
  -H "CF-Access-Client-Id: $NEWAPI_CF_ACCESS_CLIENT_ID" \
  -H "CF-Access-Client-Secret: $NEWAPI_CF_ACCESS_CLIENT_SECRET"

# Streaming completion (verifies SSE through tunnel)
curl -N -X POST https://llm.fsagent.cc/v1/chat/completions \
  -H "Authorization: Bearer $NEWAPI_API_KEY" \
  -H "CF-Access-Client-Id: $NEWAPI_CF_ACCESS_CLIENT_ID" \
  -H "CF-Access-Client-Secret: $NEWAPI_CF_ACCESS_CLIENT_SECRET" \
  -H "Content-Type: application/json" \
  -d '{"model":"gpt-4o-mini","stream":true,"messages":[{"role":"user","content":"hi"}]}'

# Live tail
ssh vultr 'journalctl -u newapi-cloudflared -f'
ssh vultr 'docker logs -f new-api'
ssh vultr 'docker stats --no-stream new-api'

§10 — Upgrade / Rollback

bash

ssh vultr
cd /opt/newapi
docker compose pull && docker compose up -d
docker logs -f new-api

# Rollback: pin `NEWAPI_IMAGE=calciumion/new-api:<prev-tag>` in .env,
# `docker compose up -d`. Data at /opt/newapi/new-api-data survives.

§11 — ss-rust on Vultr (orphaned)

The shadowsocks-rust SIP022 server installed during v1 (port 41388) has no consumer in v2. Options:

Option	Cost	When to pick
Keep as a generic outbound relay	~6.5 MB RAM	If any future workload on ECS (or elsewhere) needs an offshore exit
Remove	nil after one-time cleanup	If no consumer materializes within the next sprint

Recommend keep for now — it's tiny, already hardened, and removing it now and reinstalling later is more work than leaving it. Revisit after v0.4.0 lands. See [[ss_rust_vultr_jp]].

Cleanup commands if removing:

bash

ssh root@139.180.196.53
systemctl disable --now shadowsocks-rust
rm -f /etc/systemd/system/shadowsocks-rust.service /usr/local/bin/ssserver /usr/local/bin/sslocal
rm -rf /etc/shadowsocks-rust
ufw delete allow 41388/tcp
ufw delete allow 41388/udp
# Then delete the keychain entry locally:
security delete-generic-password -s "ss-rust-vultr-jp"

§12 — Vultr Zombie Cleanup (separate decision)

The Vultr twilight-backend container is left over from before the ECS cutover. It's still running (healthy) but api.fsagent.cc now resolves through ECS's ec125552-... tunnel, so no public traffic reaches it.

Two actions to take soon, not blocking this plan:

Decommission the Vultr twilight-backend container — docker stop then docker rm, and prune the openclaw-owned compose tree at /home/openclaw/twilight/source/deploy.
Verify fordefi-signer ownership and decide whether it stays.

Both free a few hundred MB on Vultr, which gives NewAPI more breathing room and lets the 384 MB cap be raised if needed.

Execution Order

Pre-flight on Vultr — confirm pkg.cloudflare.com reachable, check free memory, check ports 3000 / 9092 free.
Bootstrap /opt/newapi — root-owned tree, docker-compose.yml, .env with fresh SESSION_SECRET.
docker compose pull && docker compose up -d — verify curl -I http://127.0.0.1:3000 returns 200/302.
Create CF tunnel newapi-jp — Zero Trust → Networks → Tunnels.
Install cloudflared on Vultr (Option A apt, or B binary).
cloudflared service install <TOKEN> — registers /etc/systemd/system/cloudflared.service.
Add public hostname llm.fsagent.cc → http://127.0.0.1:3000 (with the originRequest block).
CF SSL/TLS settings — Full + Always HTTPS + WebSockets on + HTTP/2 origin off.
CF Access policies — service token for /v1/*, email OTP for /api/* and /.
CF WAF — rate-limit /v1/*.
NewAPI first-run (§7).
Wire twilight-drive (§8) — update deploy/.env on ECS, restart backend.
End-to-end verification (§9).

Risks

Risk	Mitigation
Vultr memory exhaustion → NewAPI OOM-killed	`memory: 384m` hard cap; monitor `docker stats new-api`; clean up zombie `twilight-backend` (§12) if cap proves tight
cloudflared system unit vs openclaw `--user` unit collision	Different unit names (`cloudflared.service` system vs `cloudflared.service` user) live in different scopes — no conflict by design, but verify with `systemctl status cloudflared` after install
Vultr is single point of failure	NewAPI down = all model calls fail. Accept for v0.4.0; revisit after billing data justifies HA
ECS → Vultr added latency for LLM calls	Tunnel adds ~50 ms vs loopback; SSE streaming amortizes this. Acceptable for v0.4.0
CF Access service token leakage	Store in Keychain only; rotate on any chat-paste leak (same pattern as [[secrets_keychain]])
GFW blocks `llm.fsagent.cc` later	Domain is on Cloudflare; if blocked, swap to a different sub-zone or move to Cloudflare Workers fronting

Out of Scope

Hermes image rebuild to bake in NewAPI base URL / headers (separate v0.4.0 work)
Payment → provisioning bridge (P1.1)
Prometheus scrape of NewAPI metrics (:9092 is wired but no scraper exists yet)
HA / multi-region NewAPI (revisit after real usage data)
Decommissioning the Vultr zombie twilight-backend (tracked separately, §12)

NewAPI Deployment Plan ​

v1 → v2 Pivot Summary ​

Architecture ​

Repo Surface ​

Host Topology on Vultr ​

§1 — Resource Budget ​

§2 — Cloudflare Tunnel ​

§3 — Cloudflare Access + WAF ​

§4 — Cloudflared Install Path ​

§5 — NewAPI Compose Hardening ​

§6 — Env File ​

§7 — NewAPI First-Run Checklist ​

§8 — Twilight-Drive Wiring ​

§9 — Verification ​

§10 — Upgrade / Rollback ​

§11 — ss-rust on Vultr (orphaned) ​

§12 — Vultr Zombie Cleanup (separate decision) ​

Execution Order ​

Risks ​

Out of Scope ​

NewAPI Deployment Plan

v1 → v2 Pivot Summary

Architecture

Repo Surface

Host Topology on Vultr

§1 — Resource Budget

§2 — Cloudflare Tunnel

§3 — Cloudflare Access + WAF

§4 — Cloudflared Install Path

§5 — NewAPI Compose Hardening

§6 — Env File

§7 — NewAPI First-Run Checklist

§8 — Twilight-Drive Wiring

§9 — Verification

§10 — Upgrade / Rollback

§11 — ss-rust on Vultr (orphaned)

§12 — Vultr Zombie Cleanup (separate decision)

Execution Order

Risks

Out of Scope