Skip to content

Bug Fix Sweep Implementation Plan

For agentic workers: REQUIRED SUB-SKILL: Use superpowers:subagent-driven-development (recommended) or superpowers:executing-plans to implement this plan task-by-task. Steps use checkbox (- [ ]) syntax for tracking.

Goal: Fix 12 bugs across Python backend, core data providers, MCP proxy, and NestJS app-backend — all identified by code review on 2026-05-16.

Architecture: Three independent subsystems (Python service/core/mcp, TypeScript backend). Fixes within each subsystem are ordered by dependency (fix callee before caller). All Python changes go on one feature branch; all TypeScript changes on another, or both on one branch.

Tech Stack: Python 3.11 (FastAPI, APScheduler, DuckDB, httpx), TypeScript (NestJS, Prisma, Axios), pytest


Files Modified

FileBug
core/core/data/_market.pyL71: <=< for CLOSE_TIME
core/core/data/tushare.pyL41: data["data"] can be None, chained .get() raises AttributeError
core/core/data/audit_log.pyL207: if value_b is falsy for 0.0; use != 0
core/core/data/providers/tushare_daily.pyL83: no explicit sort on adj_items[0]; L161-168: None factor silently stored
mcp/tushare_mcp/proxy.pyL131: catches all Exception incl. programming errors; narrow to duckdb.Error | RuntimeError
src/service/cache.pyL103, L166: concurrent put() → DuckDB file-lock conflict; add threading.Lock
src/service/auth.pyL75: paid_until string vs date comparison; use date.fromisoformat()
backend/src/modules/newapi/newapi.service.tsL51: missing await on retry path
backend/src/modules/billing/billing.service.tsL213: ignores provisionPaidOrder return value
backend/src/modules/provisioning/provisioning.service.tsL72-73: NewAPI failure logged but instance stays PENDING; mark DEGRADED
backend/src/modules/auth/auth.service.tsL171-173, L299-301: OTP race — two concurrent verifies both succeed
backend/src/modules/provisioning/provision-worker.service.tsL70: attempts read from stale pre-claim task snapshot

Phase 1: Python core/core/data fixes (no tests needed to read, but update existing)

Task 1: Fix _market.py — 15:00 boundary

Files:

  • Modify: core/core/data/_market.py:71

  • Test: core/tests/unit/test_data_market.py (existing)

  • [ ] Step 1: Read the existing test to understand coverage

bash
cat core/tests/unit/test_data_market.py
  • [ ] Step 2: Add a failing test for 15:00 boundary

In core/tests/unit/test_data_market.py, add:

python
def test_is_market_open_at_close_time_returns_false():
    """15:00:00 exactly is CLOSED, not open."""
    from zoneinfo import ZoneInfo
    from datetime import datetime
    from core.data._market import is_market_open
    now = datetime(2024, 1, 2, 15, 0, 0, tzinfo=ZoneInfo("Asia/Shanghai"))  # Tuesday
    assert is_market_open(now=now) is False

def test_is_market_open_just_before_close():
    """14:59:59 is still open."""
    from zoneinfo import ZoneInfo
    from datetime import datetime
    from core.data._market import is_market_open
    now = datetime(2024, 1, 2, 14, 59, 59, tzinfo=ZoneInfo("Asia/Shanghai"))
    assert is_market_open(now=now) is True
  • [ ] Step 3: Run to confirm fail
bash
cd core && python -m pytest tests/unit/test_data_market.py::test_is_market_open_at_close_time_returns_false -v

Expected: FAIL (currently returns True at 15:00).

  • [ ] Step 4: Fix _market.py

In core/core/data/_market.py, line 71, change:

python
    if not (_OPEN_TIME <= now.time() <= _CLOSE_TIME):

to:

python
    if not (_OPEN_TIME <= now.time() < _CLOSE_TIME):
  • [ ] Step 5: Run tests
bash
cd core && python -m pytest tests/unit/test_data_market.py -v

Expected: all pass.

  • [ ] Step 6: Commit
bash
git add core/core/data/_market.py core/tests/unit/test_data_market.py
git commit -m "fix(market): exclude 15:00 from open window (< not <=)"

Task 2: Fix tushare.pydata["data"] None AttributeError

Files:

  • Modify: core/core/data/tushare.py:41

Current code:

python
items = data.get("data", {}).get("items") or []

Problem: if data["data"] is None (key exists, value is null), data.get("data", {}) returns None, then .get("items") raises AttributeError.

  • [ ] Step 1: Add failing test

In core/tests/unit/ create or add to existing tushare test:

python
# core/tests/unit/test_data_tushare.py  (add to existing file)
from unittest.mock import patch
import httpx
from core.data import tushare

def test_call_handles_null_data_field(respx_mock):
    """Tushare sometimes returns {"code":0,"data":null} — must not AttributeError."""
    respx_mock.post(tushare.TUSHARE_URL).mock(
        return_value=httpx.Response(200, json={"code": 0, "data": None})
    )
    result = tushare.call(
        api_name="trade_cal", params={}, fields="cal_date", token="tok"
    )
    assert result == []
  • [ ] Step 2: Run to confirm fail
bash
cd core && python -m pytest tests/unit/test_data_tushare.py::test_call_handles_null_data_field -v

Expected: FAIL with AttributeError: 'NoneType' object has no attribute 'get'.

  • [ ] Step 3: Fix tushare.py

In core/core/data/tushare.py, change line 41:

python
    items = data.get("data", {}).get("items") or []

to:

python
    items = (data.get("data") or {}).get("items") or []
  • [ ] Step 4: Run tests
bash
cd core && python -m pytest tests/unit/test_data_tushare.py -v
  • [ ] Step 5: Commit
bash
git add core/core/data/tushare.py core/tests/unit/test_data_tushare.py
git commit -m "fix(tushare): handle null data field without AttributeError"

Task 3: Fix audit_log.py — division by zero semantic

Files:

  • Modify: core/core/data/audit_log.py:207

Current code (line 207):

python
        diff_pct = (
            (value_a - value_b) / abs(value_b) * 100.0 if value_b else 0.0
        )

Problem: if value_b is False for value_b = 0.0 — correct in Python, but non-explicit and confusing. More importantly, value_b = -0.0 or NaN could produce unexpected results. Use explicit != 0 check.

  • [ ] Step 1: Add test

In core/tests/unit/test_data_audit_log.py (add to existing):

python
def test_record_diff_zero_value_b_does_not_divide(tmp_path):
    """diff_pct is 0 when value_b is zero, not a ZeroDivisionError."""
    log = AuditLog(str(tmp_path / "audit.duckdb"))
    # Should not raise
    log.record_diff(
        code="600519.SH",
        trade_date="2024-01-02",
        provider_a="akshare",
        value_a=100.0,
        provider_b="tushare",
        value_b=0.0,
        severity="warn",
    )
  • [ ] Step 2: Run to confirm passes (it should already — this confirms the behavior)
bash
cd core && python -m pytest tests/unit/test_data_audit_log.py::test_record_diff_zero_value_b_does_not_divide -v
  • [ ] Step 3: Fix audit_log.py

Change line 207:

python
        diff_pct = (
            (value_a - value_b) / abs(value_b) * 100.0 if value_b else 0.0
        )

to:

python
        diff_pct = (
            (value_a - value_b) / abs(value_b) * 100.0 if value_b != 0 else 0.0
        )
  • [ ] Step 4: Run tests
bash
cd core && python -m pytest tests/unit/test_data_audit_log.py -v
  • [ ] Step 5: Commit
bash
git add core/core/data/audit_log.py core/tests/unit/test_data_audit_log.py
git commit -m "fix(audit_log): use explicit != 0 check for diff_pct denominator"

Task 4: Fix tushare_daily.py — missing sort + silent None factor

Files:

  • Modify: core/core/data/providers/tushare_daily.py
  • Test: core/tests/unit/test_data_providers_tushare_daily.py

Two bugs:

  1. adj_items[0] assumes DESC sort by trade_date but no explicit sort
  2. _extract_factors returns (None, latest) silently when target date has no factor
  • [ ] Step 1: Add tests

In core/tests/unit/test_data_providers_tushare_daily.py, add:

python
def test_extract_factors_sorts_by_date_desc():
    """Latest factor is the highest date regardless of API return order."""
    # adj_items in ASC order (wrong API order) — oldest first
    adj_items = [
        ["600519.SH", "20240101", "1.0"],
        ["600519.SH", "20240103", "1.2"],
        ["600519.SH", "20240102", "1.1"],
    ]
    target_factor, latest = TushareDailyProvider._extract_factors(adj_items, "20240102")
    assert latest == 1.2  # highest date = highest index after sort
    assert target_factor == 1.1

def test_extract_factors_raises_when_target_date_missing():
    """ProviderUnavailable raised if adj_factor for target date absent."""
    adj_items = [
        ["600519.SH", "20240103", "1.2"],
    ]
    with pytest.raises(ProviderUnavailable, match="adj_factor"):
        TushareDailyProvider._extract_factors(adj_items, "20240101")
  • [ ] Step 2: Run to confirm fail
bash
cd core && python -m pytest tests/unit/test_data_providers_tushare_daily.py::test_extract_factors_sorts_by_date_desc tests/unit/test_data_providers_tushare_daily.py::test_extract_factors_raises_when_target_date_missing -v
  • [ ] Step 3: Fix _extract_factors

In core/core/data/providers/tushare_daily.py, replace the _extract_factors staticmethod (lines 151-169):

python
    @staticmethod
    def _extract_factors(
        adj_items: list[list[Any]], target_yyyymmdd: str
    ) -> tuple[float | None, float | None]:
        if not adj_items:
            return None, None

        # Tushare returns DESC by default but we sort explicitly to be safe.
        sorted_items = sorted(adj_items, key=lambda r: str(r[1]), reverse=True)

        try:
            latest = float(sorted_items[0][2])
        except (TypeError, ValueError) as exc:
            raise CanonicalUnitViolation(
                f"adj_factor not numeric: {sorted_items[0]!r}"
            ) from exc

        target_factor: float | None = None
        for row in sorted_items:
            if str(row[1]) == target_yyyymmdd:
                try:
                    target_factor = float(row[2])
                except (TypeError, ValueError) as exc:
                    raise CanonicalUnitViolation(
                        f"adj_factor not numeric: {row!r}"
                    ) from exc
                break

        if target_factor is None:
            raise ProviderUnavailable(
                f"no adj_factor for {target_yyyymmdd} in {len(sorted_items)} rows"
            )

        return target_factor, latest
  • [ ] Step 4: Run tests
bash
cd core && python -m pytest tests/unit/test_data_providers_tushare_daily.py -v
  • [ ] Step 5: Commit
bash
git add core/core/data/providers/tushare_daily.py core/tests/unit/test_data_providers_tushare_daily.py
git commit -m "fix(tushare_daily): explicit sort on adj_items; raise ProviderUnavailable when target factor missing"

Phase 2: Python MCP proxy fix

Task 5: Fix proxy.py — overly broad exception catch

Files:

  • Modify: mcp/tushare_mcp/proxy.py:131

Current code (lines 129-134):

python
        if self._conn is not None:
            try:
                return self._duckdb_query(table, columns, date_cols, **kwargs)
            except Exception as exc:
                logger.warning("DuckDB query failed for %s, falling back: %s", method, exc)

        if self._fallback is None:
            raise RuntimeError(f"No Tushare token and DuckDB unavailable for '{method}'")

Problem: catches all Exception including programming bugs (KeyError, AttributeError). Should only catch DuckDB errors and "no data" RuntimeError. Bugs silently fall back to API, hiding real issues.

  • [ ] Step 1: Fix proxy.py

Change lines 129-132:

python
        if self._conn is not None:
            try:
                return self._duckdb_query(table, columns, date_cols, **kwargs)
            except (duckdb.Error, RuntimeError) as exc:
                logger.warning(
                    "DuckDB query failed for %s, falling back to API: %s",
                    method, exc,
                )

Verify import duckdb is already at the top of proxy.py. If not, add it.

  • [ ] Step 2: Run MCP tests if they exist
bash
cd mcp && python -m pytest tests/ -v 2>/dev/null || echo "no mcp tests"
  • [ ] Step 3: Commit
bash
git add mcp/tushare_mcp/proxy.py
git commit -m "fix(mcp/proxy): narrow exception catch to duckdb.Error|RuntimeError"

Phase 3: Python service fixes

Task 6: Fix cache.py — concurrent put() file lock conflict

Files:

  • Modify: src/service/cache.py

DailyCache.put() and FundamentalsCache.put() each open a fresh duckdb.connect(). Concurrent FastAPI requests → simultaneous writers → IO Error: Conflicting lock. Fix: add threading.Lock per cache instance; hold lock during writes.

get() reads are safe because DuckDB allows concurrent readers.

  • [ ] Step 1: Add test

In core/tests/integration/test_service_cache.py, add:

python
import threading

def test_daily_cache_concurrent_puts_no_lock_error(tmp_path):
    """Multiple threads calling put() simultaneously must not raise."""
    cache = DailyCache(str(tmp_path / "cache.duckdb"))
    errors = []

    def write(i):
        try:
            cache.put(code="600519.SH", trade_date=f"202401{i:02d}", close=float(i), fetched_at=datetime.now(UTC))
        except Exception as e:
            errors.append(e)

    threads = [threading.Thread(target=write, args=(i,)) for i in range(1, 20)]
    for t in threads:
        t.start()
    for t in threads:
        t.join()

    assert errors == [], f"concurrent put errors: {errors}"
  • [ ] Step 2: Run to confirm fail
bash
cd core && python -m pytest tests/integration/test_service_cache.py::test_daily_cache_concurrent_puts_no_lock_error -v

Expected: FAIL with IO Error: Conflicting lock or similar (may be intermittent at low thread count — increase to 20 threads).

  • [ ] Step 3: Fix DailyCache

In src/service/cache.py, update DailyCache:

python
import threading  # add to imports if not present

class DailyCache:
    def __init__(self, path: str) -> None:
        self.path = path
        self._lock = threading.Lock()
        Path(path).parent.mkdir(parents=True, exist_ok=True)
        with duckdb.connect(path) as conn:
            conn.execute(
                """
                CREATE TABLE IF NOT EXISTS daily_cache (
                    code        VARCHAR   NOT NULL,
                    trade_date  VARCHAR   NOT NULL,
                    close       DOUBLE    NOT NULL,
                    fetched_at  TIMESTAMP NOT NULL,
                    PRIMARY KEY (code, trade_date)
                )
                """
            )

    # ... (keep _ttl_floor, get, get_latest unchanged) ...

    def put(
        self,
        *,
        code: str,
        trade_date: str,
        close: float,
        fetched_at: datetime,
    ) -> None:
        with self._lock:
            with duckdb.connect(self.path) as conn:
                conn.execute(
                    "INSERT INTO daily_cache(code, trade_date, close, fetched_at)"
                    " VALUES (?, ?, ?, ?)"
                    " ON CONFLICT (code, trade_date) DO UPDATE SET"
                    "   close = excluded.close,"
                    "   fetched_at = excluded.fetched_at",
                    [code, trade_date, close, _to_naive_utc(fetched_at)],
                )
  • [ ] Step 4: Fix FundamentalsCache — same pattern
python
class FundamentalsCache:
    def __init__(self, path: str) -> None:
        self.path = path
        self._lock = threading.Lock()
        Path(path).parent.mkdir(parents=True, exist_ok=True)
        with duckdb.connect(path) as conn:
            conn.execute(
                """
                CREATE TABLE IF NOT EXISTS fundamentals_cache (
                    code        VARCHAR   NOT NULL,
                    period      VARCHAR   NOT NULL,
                    payload     VARCHAR   NOT NULL,
                    fetched_at  TIMESTAMP NOT NULL,
                    PRIMARY KEY (code, period)
                )
                """
            )

    # ... (keep _ttl_floor, get unchanged) ...

    def put(
        self,
        *,
        code: str,
        period: str,
        payload: dict[str, Any],
        fetched_at: datetime,
    ) -> None:
        with self._lock:
            with duckdb.connect(self.path) as conn:
                conn.execute(
                    "INSERT INTO fundamentals_cache(code, period, payload, fetched_at)"
                    " VALUES (?, ?, ?, ?)"
                    " ON CONFLICT (code, period) DO UPDATE SET"
                    "   payload = excluded.payload,"
                    "   fetched_at = excluded.fetched_at",
                    [code, period, json.dumps(payload), _to_naive_utc(fetched_at)],
                )
  • [ ] Step 5: Run tests
bash
cd core && python -m pytest tests/integration/test_service_cache.py -v
  • [ ] Step 6: Commit
bash
git add src/service/cache.py core/tests/integration/test_service_cache.py
git commit -m "fix(cache): add threading.Lock to serialize concurrent put() calls"

Task 7: Fix auth.py — paid_until date comparison clarity

Files:

  • Modify: src/service/auth.py:75

Current code:

python
    if paid_until is not None and paid_until < date.today().isoformat():

This works (ISO strings sort lexicographically) but comparing strings to string representation of a date is fragile if format ever changes. Use date.fromisoformat() for explicit intent.

  • [ ] Step 1: Check existing auth tests
bash
grep -n "paid_until\|expired" core/tests/integration/test_service_auth.py | head -20
  • [ ] Step 2: Add test for expiry

In core/tests/integration/test_service_auth.py, add:

python
def test_expired_token_returns_403(client, tmp_path):
    """Token with paid_until in the past is rejected."""
    import sqlite3
    from datetime import date, timedelta
    from service.auth import hash_token

    token = "expired-token-test"
    past = (date.today() - timedelta(days=1)).isoformat()
    with sqlite3.connect(client.app.state.settings.bearer_db_path) as conn:
        conn.execute(
            "INSERT OR REPLACE INTO bearer_tokens (user_id, token_hash, plan, paid_until, revoked, created_at)"
            " VALUES (?, ?, 'paid', ?, 0, '2026-01-01')",
            ("expired-user", hash_token(token), past),
        )
    resp = client.get("/whoami", headers={"Authorization": f"Bearer {token}"})
    assert resp.status_code == 403
    assert "expired" in resp.json()["detail"]
  • [ ] Step 3: Fix auth.py

In src/service/auth.py, change line 75:

python
    if paid_until is not None and paid_until < date.today().isoformat():

to:

python
    if paid_until is not None and date.fromisoformat(paid_until) < date.today():
  • [ ] Step 4: Run tests
bash
cd core && python -m pytest tests/integration/test_service_auth.py -v
  • [ ] Step 5: Commit
bash
git add src/service/auth.py core/tests/integration/test_service_auth.py
git commit -m "fix(auth): use date.fromisoformat() for paid_until comparison"

Phase 4: TypeScript — critical business logic

Task 8: Fix newapi.service.ts — missing await on retry

Files:

  • Modify: backend/src/modules/newapi/newapi.service.ts:51

Current code (line 51):

typescript
        return fn(this.sessionToken!);

Missing await — if fn throws on retry, the unhandled rejection may not propagate correctly in all Node.js versions.

  • [ ] Step 1: Fix newapi.service.ts

In backend/src/modules/newapi/newapi.service.ts, change line 51:

typescript
        return fn(this.sessionToken!);

to:

typescript
        return await fn(this.sessionToken!);

The full withSession method after fix:

typescript
  private async withSession<T>(fn: (token: string) => Promise<T>): Promise<T> {
    if (!this.sessionToken) await this.login();
    try {
      return await fn(this.sessionToken!);
    } catch (err: unknown) {
      if (axios.isAxiosError(err) && err.response?.status === 401) {
        this.sessionToken = null;
        await this.login();
        return await fn(this.sessionToken!);
      }
      throw err;
    }
  }
  • [ ] Step 2: Build to verify no TS errors
bash
cd backend && npx tsc --noEmit

Expected: no errors.

  • [ ] Step 3: Commit
bash
git add backend/src/modules/newapi/newapi.service.ts
git commit -m "fix(newapi): add await on retry call in withSession"

Task 9: Fix billing.service.ts — unchecked provisionPaidOrder result

Files:

  • Modify: backend/src/modules/billing/billing.service.ts

Current code (line 213):

typescript
      await this.provisioning.provisionPaidOrder(orderId);

provisionPaidOrder never throws — it returns { ok: false, reason: string } on failure. The return value is silently ignored. If provisioning fails, WeChat gets a 200 OK, stops retrying, and the user is paid but not provisioned.

  • [ ] Step 1: Fix billing.service.ts

Replace line 213:

typescript
      await this.provisioning.provisionPaidOrder(orderId);

with:

typescript
      const provResult = await this.provisioning.provisionPaidOrder(orderId);
      if (!provResult.ok) {
        this.logger.error(
          `provisionPaidOrder failed for order ${orderId}: ${provResult.reason}`,
        );
        // Still return SUCCESS to WeChat to stop retry — provisioning failure
        // is recorded on the instance. Ops must manually recover via admin tools.
      }
  • [ ] Step 2: Build
bash
cd backend && npx tsc --noEmit
  • [ ] Step 3: Commit
bash
git add backend/src/modules/billing/billing.service.ts
git commit -m "fix(billing): log provisioning failure instead of silently ignoring result"

Task 10: Fix provisioning.service.ts — DEGRADED state on NewAPI failure

Files:

  • Modify: backend/src/modules/provisioning/provisioning.service.ts

Current code (lines 72-73):

typescript
        } catch (e: unknown) {
          this.logger.error("NewAPI token issuance failed (non-fatal)", (e as Error)?.message);
        }

llmApiKey stays null on the HermesInstance. The worker in runSpawn falls back to process.env.HERMES_DEFAULT_LLM_KEY (line 85 of provision-worker.service.ts), so Hermes still starts. But the logger says "non-fatal" — log it clearly and mark the instance status so ops can see it.

Check the Prisma schema for valid HermesInstance.status values before editing:

  • [ ] Step 1: Check Prisma schema for valid statuses
bash
grep -A 20 "enum HermesInstanceStatus\|status.*String\|PENDING_BIND\|DEGRADED" backend/prisma/schema.prisma | head -30
  • [ ] Step 2: If DEGRADED is a valid status, update provisioning.service.ts

If the enum includes DEGRADED:

typescript
        } catch (e: unknown) {
          this.logger.error(
            `NewAPI token issuance failed for instance ${result.instanceId} — Hermes will use default LLM key`,
            (e as Error)?.message,
          );
          await this.prisma.hermesInstance.update({
            where: { id: result.instanceId },
            data: { status: "DEGRADED" },
          });
        }

If DEGRADED is NOT a valid status: add it to the Prisma schema enum and create a migration:

bash
# Add DEGRADED to HermesInstanceStatus enum in backend/prisma/schema.prisma
# Then:
cd backend && npx prisma migrate dev --name add-degraded-instance-status
  • [ ] Step 3: Build
bash
cd backend && npx tsc --noEmit
  • [ ] Step 4: Commit
bash
git add backend/src/modules/provisioning/provisioning.service.ts backend/prisma/
git commit -m "fix(provisioning): mark instance DEGRADED when NewAPI token issuance fails"

Phase 5: TypeScript — race conditions

Task 11: Fix auth.service.ts — OTP double-use race

Files:

  • Modify: backend/src/modules/auth/auth.service.ts

Two methods have the same pattern: findFirstbcrypt.compareupdate usedAt. Two concurrent requests can both pass findFirst (same vc), both compare, and both set usedAt. Fix: use a conditional Prisma updateMany with usedAt: null filter — only the first update succeeds. Check count to detect the loser.

Affects both registerWithEmail (lines 168-173) and verifyOtp (lines 296-301).

  • [ ] Step 1: Fix registerWithEmail

Replace lines 168-174:

typescript
    const vc = await this.prisma.emailVerificationCode.findFirst({
      where: { email, purpose: 'REGISTER', usedAt: null, expiresAt: { gt: new Date() } },
      orderBy: { createdAt: 'desc' },
    });
    if (!vc) throw new UnauthorizedException('验证码无效或已过期,请重新发送');

    const ok = await bcryptjs.compare(codeRaw, vc.codeHash);
    await this.prisma.emailVerificationCode.update({
      where: { id: vc.id },
      data: { attempts: { increment: 1 }, ...(ok ? { usedAt: new Date() } : {}) },
    });
    if (!ok) throw new UnauthorizedException('验证码无效或已过期,请重新发送');

with:

typescript
    const vc = await this.prisma.emailVerificationCode.findFirst({
      where: { email, purpose: 'REGISTER', usedAt: null, expiresAt: { gt: new Date() } },
      orderBy: { createdAt: 'desc' },
    });
    if (!vc) throw new UnauthorizedException('验证码无效或已过期,请重新发送');

    const ok = await bcryptjs.compare(codeRaw, vc.codeHash);
    if (!ok) {
      await this.prisma.emailVerificationCode.update({
        where: { id: vc.id },
        data: { attempts: { increment: 1 } },
      });
      throw new UnauthorizedException('验证码无效或已过期,请重新发送');
    }

    // Atomic claim: only succeeds if usedAt is still null (first caller wins)
    const claimed = await this.prisma.emailVerificationCode.updateMany({
      where: { id: vc.id, usedAt: null },
      data: { usedAt: new Date(), attempts: { increment: 1 } },
    });
    if (claimed.count === 0) throw new UnauthorizedException('验证码已被使用');
  • [ ] Step 2: Fix verifyOtp

Replace lines 296-302 with the same pattern:

typescript
    const vc = await this.prisma.emailVerificationCode.findFirst({
      where: { email, purpose: 'OTP_LOGIN', usedAt: null, expiresAt: { gt: new Date() } },
      orderBy: { createdAt: 'desc' },
    });
    if (!vc) throw new UnauthorizedException('验证码无效或已过期');

    const ok = await bcryptjs.compare(codeRaw, vc.codeHash);
    if (!ok) {
      await this.prisma.emailVerificationCode.update({
        where: { id: vc.id },
        data: { attempts: { increment: 1 } },
      });
      throw new UnauthorizedException('验证码无效或已过期');
    }

    const claimed = await this.prisma.emailVerificationCode.updateMany({
      where: { id: vc.id, usedAt: null },
      data: { usedAt: new Date(), attempts: { increment: 1 } },
    });
    if (claimed.count === 0) throw new UnauthorizedException('验证码已被使用');
  • [ ] Step 3: Build
bash
cd backend && npx tsc --noEmit
  • [ ] Step 4: Commit
bash
git add backend/src/modules/auth/auth.service.ts
git commit -m "fix(auth): atomic OTP claim via updateMany with usedAt=null guard"

Task 12: Fix provision-worker.service.ts — stale attempts count

Files:

  • Modify: backend/src/modules/provisioning/provision-worker.service.ts

Current code (lines 62-64, 70):

typescript
    const claimed = await this.prisma.provisionTask.updateMany({
      where: { id: task.id, status: "PENDING" },
      data: { status: "RUNNING", attempts: { increment: 1 } },
    });
    if (claimed.count === 0) return;

    // ...
    } catch (e: unknown) {
      const msg = (e as Error)?.message ?? String(e);
      const attempts = task.attempts + 1;  // ← uses PRE-CLAIM snapshot

task.attempts was read before the claim. After updateMany increments it, task.attempts + 1 is one behind the actual DB value. Fix: re-fetch the task after successful claim.

  • [ ] Step 1: Fix provision-worker.service.ts

After line 64 (if (claimed.count === 0) return;), add a re-fetch:

typescript
    if (claimed.count === 0) return;

    // Re-fetch to get post-increment attempts count.
    const freshTask = await this.prisma.provisionTask.findUnique({
      where: { id: task.id },
      include: {
        order: { include: { plan: true } },
        subscription: { include: { instance: true } },
      },
    });
    if (!freshTask) return;

Then replace all subsequent references to task with freshTask in processOne:

  • task.subscription?.instancefreshTask.subscription?.instance
  • task.idfreshTask.id
  • task.order.plan.priceCnyFenfreshTask.order.plan.priceCnyFen
  • task.attempts + 1freshTask.attempts (it's already incremented)

Full updated processOne after re-fetch block:

typescript
    const instance = freshTask.subscription?.instance;
    if (!instance) {
      await this.failTask(freshTask.id, null, "no_instance_on_subscription");
      return;
    }

    try {
      await this.runSpawn(freshTask.id, instance.id, freshTask.order.plan.priceCnyFen);
    } catch (e: unknown) {
      const msg = (e as Error)?.message ?? String(e);
      const attempts = freshTask.attempts;  // already incremented by claim
      if (attempts >= MAX_ATTEMPTS) {
        await this.failTask(freshTask.id, instance.id, msg);
      } else {
        await this.prisma.provisionTask.update({
          where: { id: freshTask.id },
          data: { status: "PENDING", lastError: msg },
        });
        this.logger.warn(`task=${freshTask.id} attempt=${attempts} failed; requeued: ${msg}`);
      }
    }
  • [ ] Step 2: Build
bash
cd backend && npx tsc --noEmit
  • [ ] Step 3: Commit
bash
git add backend/src/modules/provisioning/provision-worker.service.ts
git commit -m "fix(provision-worker): re-fetch task after claim to get accurate attempts count"

Phase 6: Final verification

  • [ ] Run all Python tests
bash
cd core && python -m pytest tests/ -v --tb=short 2>&1 | tail -30

Expected: all pass.

  • [ ] Run TypeScript build
bash
cd backend && npx tsc --noEmit

Expected: no errors.

  • [ ] Push branch and open PR
bash
git push -u origin <branch-name>
gh pr create --title "fix: 12-bug sweep (Python core + NestJS backend)" --body "$(cat <<'EOF'
## Summary
- fix(_market): exclude 15:00 from open window
- fix(tushare): handle null data field
- fix(audit_log): explicit != 0 for diff_pct denominator
- fix(tushare_daily): explicit sort on adj_items; raise on missing factor
- fix(mcp/proxy): narrow exception catch to duckdb.Error|RuntimeError
- fix(cache): threading.Lock for concurrent put() calls
- fix(auth): date.fromisoformat() for paid_until comparison
- fix(newapi): await on retry in withSession
- fix(billing): log provisionPaidOrder failure instead of silently ignoring
- fix(provisioning): mark instance DEGRADED on NewAPI failure
- fix(auth): atomic OTP claim via updateMany
- fix(provision-worker): re-fetch task after claim

## Test Plan
- [ ] pytest core/tests/ passes
- [ ] npx tsc --noEmit passes in backend/
- [ ] Manual: pay flow end-to-end on staging
- [ ] Manual: OTP login with same code from two tabs — second should get 验证码已被使用
- [ ] Deploy to ECS, verify /quality/status shows healthy (fromisoformat fix)
EOF
)"

团队内部文档