Skip to content

NewAPI Deployment Plan

Status: ⏳ Revised 2026-05-14 (v2) · ready to execute Scope: Deploy NewAPI LLM proxy gateway on the Vultr Japan VPS (outside the GFW), fronted by a dedicated Cloudflare Tunnel. Both ECS-side and Vultr-side workloads call it.

Goal: Put a single reliable LLM gateway in front of all model calls — twilight-backend (on Alibaba ECS) and Hermes profile containers (also on ECS) both route through one endpoint at https://llm.fsagent.cc/v1, instead of each container managing its own provider keys and outbound proxy.


v1 → v2 Pivot Summary

The original v1 plan placed NewAPI on the Alibaba ECS in Chengdu and added a sing-box + shadowsocks-rust hop to a Vultr Japan relay so NewAPI could reach overseas APIs. v2 moves NewAPI itself to Vultr Japan, which sits outside the GFW. That collapses three moving parts (gateway + sing-box + relay) into one and removes the GFW bypass plumbing entirely.

Consequences:

  • No sing-box on ECS. No SOCKS5 env vars in NewAPI's container.
  • The ss-rust server already installed on Vultr (port 41388, see [[ss_rust_vultr_jp]]) is now unused for this purpose. Keep or remove (see §11).
  • Vultr-side memory budget is tighter than ECS's would have been. Hard-cap NewAPI at 384 MB.
  • ECS-side consumers (twilight-backend, Hermes profile containers) reach NewAPI through the public Cloudflare Tunnel at https://llm.fsagent.cc/v1, secured with CF Access service tokens.

Architecture

Hermes containers (Alibaba ECS, Chengdu) ─┐
twilight-backend (Alibaba ECS, Chengdu) ──┤  →  https://llm.fsagent.cc/v1
                                          │       (CF Tunnel, Access service token)

                            cloudflared (Vultr Japan)

                              http://127.0.0.1:3000

                                   NewAPI (Vultr Japan)

                          (direct — Japan reaches upstream)

                  OpenAI · Anthropic · OpenRouter · Groq · …

The Vultr-local twilight-backend zombie container (if still running) can also hit http://127.0.0.1:3000/v1 via loopback. Anything in a separate network namespace (ECS hosts, Hermes containers) must go through the tunnel.


Repo Surface

deploy/
├── newapi-compose.yml                       # NewAPI container (bridge, :3000 loopback)
├── newapi-env.example                       # SESSION_SECRET skeleton
└── newapi-cloudflared-config.yml.template   # Tunnel config for llm.fsagent.cc

Host Topology on Vultr

UserOwnsNotes
openclawtwilight-backend zombie container, prd-dashboard cloudflaredPRD-related; do not co-mingle
rootNewAPI (/opt/newapi), NewAPI's cloudflared (system service)New ownership for this work
nobody (system)shadowsocks-rust (/etc/shadowsocks-rust/config.json)Already running; keep or remove (§11)
linuxuser, ops(unrelated)Out of scope

NewAPI deploys as root in /opt/newapi. Creating a dedicated twilight user on Vultr was considered and rejected — too much new infra (systemd-user lingering, docker group membership, ssh key install) for a single-service deployment. Root + /opt is the simplest match for Vultr's existing pattern.


§1 — Resource Budget

Vultr is 1 vCPU / 951 MB RAM / 3 GB swap (as of 2026-05-14). Pre-NewAPI footprint:

  • twilight-backend (zombie): ~191 MB
  • fordefi-signer: ~30 MB
  • cloudflared (openclaw, prd-dashboard): ~50 MB
  • ssserver: ~6.5 MB
  • dockerd + host services: ~100 MB

Available: ~398 MB free, but 1.8 GB swap is already in use — the host is memory-pressured. NewAPI's typical RSS under low load is 150–250 MB; under burst it can climb past 400 MB.

Cap: cpus: 0.50, memory: 384m. Prefer OOM-kill over swap-thrash. If NewAPI gets evicted frequently under real load, the answer is to upsize Vultr, not to relax the cap.

Pre-flight cleanup options (only if cap is too tight in practice):

  1. Stop the Vultr-side twilight-backend zombie. Verify nothing depends on it first — api.fsagent.cc should already resolve to the ECS tunnel.
  2. Audit fordefi-signer — 7 weeks uptime, unrelated workload, owner unknown.

§2 — Cloudflare Tunnel

A new dedicated tunnel named newapi-jp carries llm.fsagent.cc. It is NOT the same as openclaw's prd-dashboard tunnel, even though both processes live on the same host. Separation of concerns: PRD failures or config errors must not knock out twilight-drive's LLM gateway, and vice versa.

Run NewAPI's cloudflared as a system service (/etc/systemd/system/newapi-cloudflared.service), not as a --user unit, since NewAPI itself runs as root. The system unit is also reproducible by cloudflared service install.

Cloudflare dashboard checklist:

  • Zone fsagent.cc → SSL/TLS → Full (not Full Strict — origin is HTTP loopback)
  • Zone fsagent.cc → SSL/TLS → Always Use HTTPS on
  • Zone fsagent.cc → Network → WebSockets on, HTTP/2 to origin off, HTTP/3 on
  • Zero Trust → Networks → Tunnels → create newapi-jp
  • Zero Trust → Networks → Tunnels → newapi-jp → Public Hostname → llm.fsagent.cchttp://127.0.0.1:3000 with advanced:
    connectTimeout: 30s
    keepAliveTimeout: 120s
    noTLSVerify: false
    http2Origin: false

§3 — Cloudflare Access + WAF

SurfacePolicy
/v1/*CF Access service token required (machine-to-machine)
/api/*CF Access email OTP required (human admin)
/ (root, login UI)CF Access email OTP required (human admin)
Everything elseDefault-deny via the WAF rule below

WAF rate-limit rule scoped to /v1/* only — admin endpoints stay un-rate-limited. Suggested limit: 60 requests / 10 seconds per IP. Adjust after seeing real traffic.

The service token used by twilight-backend / Hermes is added to their .env as NEWAPI_CF_ACCESS_CLIENT_ID and NEWAPI_CF_ACCESS_CLIENT_SECRET. NewAPI itself never sees these — they're CF Access headers, terminated at the edge.


§4 — Cloudflared Install Path

Vultr can reach pkg.cloudflare.com and github.com directly. Pick one of:

Option A — apt (preferred):

bash
curl -fsSL https://pkg.cloudflare.com/cloudflare-main.gpg | tee /usr/share/keyrings/cloudflare-main.gpg >/dev/null
echo 'deb [signed-by=/usr/share/keyrings/cloudflare-main.gpg] https://pkg.cloudflare.com/cloudflared jammy main' | tee /etc/apt/sources.list.d/cloudflared.list
apt-get update && apt-get install -y cloudflared

Option B — binary:

bash
curl -L https://github.com/cloudflare/cloudflared/releases/latest/download/cloudflared-linux-amd64 -o /usr/local/bin/cloudflared
chmod +x /usr/local/bin/cloudflared

Either works. Option A gets auto-updates; B pins version. Default to A.


§5 — NewAPI Compose Hardening

deploy/newapi-compose.yml (already written) does:

  • Bridge networking, 127.0.0.1:3000:3000 — public ingress flows through cloudflared only
  • cpus: 0.50, memory: 384m — fits Vultr's headroom
  • logging: json-file with rotation (10 MB × 3 files)
  • healthcheck: GET http://127.0.0.1:3000/ every 30 s
  • No proxy env vars — Vultr egress is unfiltered
  • TZ=Asia/Tokyo — matches host

§6 — Env File

/opt/newapi/.env (mode 600 root:root):

SESSION_SECRET=<openssl rand -hex 32>
# Optional pin:
# NEWAPI_IMAGE=calciumion/new-api:v0.7.0

NewAPI reads its config primarily from the admin UI; this file holds boot-time secrets only.


§7 — NewAPI First-Run Checklist

After docker compose up -d and the tunnel is live:

  • [ ] Visit https://llm.fsagent.cc/ and log in as admin / 123456
  • [ ] Rotate admin password immediately (save to keychain entry newapi-admin-vultr-jp)
  • [ ] Settings → General → disable public registration
  • [ ] Settings → General → set the site URL to https://llm.fsagent.cc
  • [ ] Channels → add OpenRouter (Claude family, GPT-4o family)
  • [ ] Channels → add SiliconFlow (Qwen, DeepSeek) — domestic, fast for ECS-side calls
  • [ ] Channels → add Anthropic direct (if you hold a key)
  • [ ] Tokens → create token twilight-drive, scope to needed models
  • [ ] Save token to keychain entry newapi-token-twilight-drive
  • [ ] Add token to deploy/.env as NEWAPI_API_KEY (twilight-backend) and to the Hermes profile template

§8 — Twilight-Drive Wiring

Add to ECS ~/twilight/.env:

bash
NEWAPI_BASE_URL=https://llm.fsagent.cc/v1
NEWAPI_API_KEY=<from §7>
NEWAPI_CF_ACCESS_CLIENT_ID=<from CF Access service token>
NEWAPI_CF_ACCESS_CLIENT_SECRET=<from CF Access service token>

These keys are not pre-baked into deploy/env.example — they're consumer-side secrets, written into .env at install time and rotated on leak. The example file documents the backend env contract; this set documents the gateway consumer contract and is mirrored in the Hermes profile template.

twilight-backend on ECS calls NewAPI through the public tunnel — there is no loopback path between ECS and Vultr. The same is true for every Hermes container on ECS.

Hermes profile template (profile/template-stock-research-pro/) gets:

  • OPENAI_BASE_URL=https://llm.fsagent.cc/v1
  • OPENAI_API_KEY=<NEWAPI_API_KEY>
  • OPENAI_DEFAULT_HEADERS={"CF-Access-Client-Id":"...","CF-Access-Client-Secret":"..."}

The Hermes agent SDK is OpenAI-compatible, so the gateway model swap is transparent — only the headers and base URL change.


§9 — Verification

bash
# Vultr loopback
ssh vultr 'curl -fsS -o /dev/null -w "%{http_code}\n" http://127.0.0.1:3000'

# Public endpoint (anonymous → 403 from CF Access)
curl -I https://llm.fsagent.cc

# Public endpoint (with service token → 200)
curl -I https://llm.fsagent.cc \
  -H "CF-Access-Client-Id: $NEWAPI_CF_ACCESS_CLIENT_ID" \
  -H "CF-Access-Client-Secret: $NEWAPI_CF_ACCESS_CLIENT_SECRET"

# Model list
curl https://llm.fsagent.cc/v1/models \
  -H "Authorization: Bearer $NEWAPI_API_KEY" \
  -H "CF-Access-Client-Id: $NEWAPI_CF_ACCESS_CLIENT_ID" \
  -H "CF-Access-Client-Secret: $NEWAPI_CF_ACCESS_CLIENT_SECRET"

# Streaming completion (verifies SSE through tunnel)
curl -N -X POST https://llm.fsagent.cc/v1/chat/completions \
  -H "Authorization: Bearer $NEWAPI_API_KEY" \
  -H "CF-Access-Client-Id: $NEWAPI_CF_ACCESS_CLIENT_ID" \
  -H "CF-Access-Client-Secret: $NEWAPI_CF_ACCESS_CLIENT_SECRET" \
  -H "Content-Type: application/json" \
  -d '{"model":"gpt-4o-mini","stream":true,"messages":[{"role":"user","content":"hi"}]}'

# Live tail
ssh vultr 'journalctl -u newapi-cloudflared -f'
ssh vultr 'docker logs -f new-api'
ssh vultr 'docker stats --no-stream new-api'

§10 — Upgrade / Rollback

bash
ssh vultr
cd /opt/newapi
docker compose pull && docker compose up -d
docker logs -f new-api

# Rollback: pin `NEWAPI_IMAGE=calciumion/new-api:<prev-tag>` in .env,
# `docker compose up -d`. Data at /opt/newapi/new-api-data survives.

§11 — ss-rust on Vultr (orphaned)

The shadowsocks-rust SIP022 server installed during v1 (port 41388) has no consumer in v2. Options:

OptionCostWhen to pick
Keep as a generic outbound relay~6.5 MB RAMIf any future workload on ECS (or elsewhere) needs an offshore exit
Removenil after one-time cleanupIf no consumer materializes within the next sprint

Recommend keep for now — it's tiny, already hardened, and removing it now and reinstalling later is more work than leaving it. Revisit after v0.4.0 lands. See [[ss_rust_vultr_jp]].

Cleanup commands if removing:

bash
ssh root@139.180.196.53
systemctl disable --now shadowsocks-rust
rm -f /etc/systemd/system/shadowsocks-rust.service /usr/local/bin/ssserver /usr/local/bin/sslocal
rm -rf /etc/shadowsocks-rust
ufw delete allow 41388/tcp
ufw delete allow 41388/udp
# Then delete the keychain entry locally:
security delete-generic-password -s "ss-rust-vultr-jp"

§12 — Vultr Zombie Cleanup (separate decision)

The Vultr twilight-backend container is left over from before the ECS cutover. It's still running (healthy) but api.fsagent.cc now resolves through ECS's ec125552-... tunnel, so no public traffic reaches it.

Two actions to take soon, not blocking this plan:

  1. Decommission the Vultr twilight-backend container — docker stop then docker rm, and prune the openclaw-owned compose tree at /home/openclaw/twilight/source/deploy.
  2. Verify fordefi-signer ownership and decide whether it stays.

Both free a few hundred MB on Vultr, which gives NewAPI more breathing room and lets the 384 MB cap be raised if needed.


Execution Order

  1. Pre-flight on Vultr — confirm pkg.cloudflare.com reachable, check free memory, check ports 3000 / 9092 free.
  2. Bootstrap /opt/newapi — root-owned tree, docker-compose.yml, .env with fresh SESSION_SECRET.
  3. docker compose pull && docker compose up -d — verify curl -I http://127.0.0.1:3000 returns 200/302.
  4. Create CF tunnel newapi-jp — Zero Trust → Networks → Tunnels.
  5. Install cloudflared on Vultr (Option A apt, or B binary).
  6. cloudflared service install <TOKEN> — registers /etc/systemd/system/cloudflared.service.
  7. Add public hostname llm.fsagent.cchttp://127.0.0.1:3000 (with the originRequest block).
  8. CF SSL/TLS settings — Full + Always HTTPS + WebSockets on + HTTP/2 origin off.
  9. CF Access policies — service token for /v1/*, email OTP for /api/* and /.
  10. CF WAF — rate-limit /v1/*.
  11. NewAPI first-run (§7).
  12. Wire twilight-drive (§8) — update deploy/.env on ECS, restart backend.
  13. End-to-end verification (§9).

Risks

RiskMitigation
Vultr memory exhaustion → NewAPI OOM-killedmemory: 384m hard cap; monitor docker stats new-api; clean up zombie twilight-backend (§12) if cap proves tight
cloudflared system unit vs openclaw --user unit collisionDifferent unit names (cloudflared.service system vs cloudflared.service user) live in different scopes — no conflict by design, but verify with systemctl status cloudflared after install
Vultr is single point of failureNewAPI down = all model calls fail. Accept for v0.4.0; revisit after billing data justifies HA
ECS → Vultr added latency for LLM callsTunnel adds ~50 ms vs loopback; SSE streaming amortizes this. Acceptable for v0.4.0
CF Access service token leakageStore in Keychain only; rotate on any chat-paste leak (same pattern as [[secrets_keychain]])
GFW blocks llm.fsagent.cc laterDomain is on Cloudflare; if blocked, swap to a different sub-zone or move to Cloudflare Workers fronting

Out of Scope

  • Hermes image rebuild to bake in NewAPI base URL / headers (separate v0.4.0 work)
  • Payment → provisioning bridge (P1.1)
  • Prometheus scrape of NewAPI metrics (:9092 is wired but no scraper exists yet)
  • HA / multi-region NewAPI (revisit after real usage data)
  • Decommissioning the Vultr zombie twilight-backend (tracked separately, §12)

团队内部文档