Skip to content

Design: T2a/T2b Warehouse Ingestion (Weekly/Monthly/Index)

Date: 2026-05-13 Status: Draft Owner: liang Parent spec: docs/planning/01-data-layer.md §3 (collection plan)

Status: ✅ Implemented (2026-05-14)

All file changes shipped:

  • src/service/schema/01_raw_tables.sql — 6 new raw tables
  • src/service/schema/02_normalized_tables.sql — 6 new normalized tables
  • src/service/ingest/weekly.pyWeeklyIngest, MonthlyIngest
  • src/service/ingest/index_daily.pyIndexDailyIngest, IndexDailybasicIngest, IndexBasicIngest, IndexWeightIngest
  • src/service/scheduler.py — 4 new jobs
  • mcp/tushare_mcp/proxy.py — 6 new warehouse methods in _WAREHOUSE_METHODS

8 MCP tools now warehouse-backed (was 5, now 11 of 28).


Problem

21 of 28 MCP tools use live Tushare HTTP passthrough. No warehouse persistence for weekly/monthly prices, index data, financials, options, etc. Slow, rate-limited, no historical context for Agent queries.

Scope (T2a + T2b Only)

This spec covers 8 new ingest jobs for weekly/monthly price data and index data. T3 (financials) and T4/T5 (events, funds, options) are future specs.

T2a: Weekly + Monthly Price

MCP ToolTushare APITableSchedule
get_weekly_priceweeklynormalized_weekly每周一 03:00 Asia/Shanghai
get_monthly_pricemonthlynormalized_monthly每月 1 日 02:30 Asia/Shanghai

Both APIs accept ts_code or trade_date. Batch strategy: Unlike daily (which supports trade_date for full market), weekly/monthly only support per-stock queries. Strategy: loop over ts_code from normalized_stock_basic where list_status='L'.

Call count: ~5,500 stocks × 1 call = 5,500 calls per run. At 500 calls/min, ~11 minutes.

T2b: Index Data

MCP ToolTushare APITableSchedule
get_index_dailyindex_dailynormalized_index_daily每日 19:00 Asia/Shanghai
get_index_basicindex_basicnormalized_index_basic每月 1 日 03:00
get_index_weightindex_weightnormalized_index_weight每月 1 日 03:30
get_index_dailybasicindex_dailybasicnormalized_index_dailybasic每日 19:00 Asia/Shanghai

Index scope: 4 indices only (沪深300 000300.SH, 中证500 000905.SH, 中证1000 000852.SH, 上证综指 000001.SH). index_weight for 沪深300 only (largest, most queries).

Call count: 2 indices × 1 call/day = 2 calls for daily. 2 calls for dailybasic. Monthly: 4 basic + 1 weight = 5 calls. Negligible.

Schema

normalized_weekly

sql
CREATE TABLE normalized_weekly (
  ts_code      VARCHAR NOT NULL,
  trade_date   DATE    NOT NULL,  -- week-end date
  open         DOUBLE,
  high         DOUBLE,
  low          DOUBLE,
  close        DOUBLE  NOT NULL,
  pre_close    DOUBLE,
  change       DOUBLE,
  pct_chg      DOUBLE,
  vol          BIGINT,
  amount       DOUBLE,
  source       VARCHAR NOT NULL DEFAULT 'tushare',
  ingested_at  TIMESTAMP NOT NULL,
  raw_id       BIGINT  NOT NULL,
  PRIMARY KEY (ts_code, trade_date, source)
);

normalized_monthly

Same columns as normalized_weekly, same PK structure.

normalized_index_daily

sql
CREATE TABLE normalized_index_daily (
  ts_code      VARCHAR NOT NULL,
  trade_date   DATE    NOT NULL,
  open         DOUBLE,
  high         DOUBLE,
  low          DOUBLE,
  close        DOUBLE  NOT NULL,
  pre_close    DOUBLE,
  change       DOUBLE,
  pct_chg      DOUBLE,
  vol          BIGINT,
  amount       DOUBLE,
  source       VARCHAR NOT NULL DEFAULT 'tushare',
  ingested_at  TIMESTAMP NOT NULL,
  raw_id       BIGINT  NOT NULL,
  PRIMARY KEY (ts_code, trade_date, source)
);

normalized_index_basic

sql
CREATE TABLE normalized_index_basic (
  ts_code      VARCHAR NOT NULL,
  name         VARCHAR,
  fullname     VARCHAR,
  market       VARCHAR,
  publisher    VARCHAR,
  category     VARCHAR,
  base_date    DATE,
  base_point   DOUBLE,
  list_date    DATE,
  source       VARCHAR NOT NULL DEFAULT 'tushare',
  ingested_at  TIMESTAMP NOT NULL,
  raw_id       BIGINT  NOT NULL,
  PRIMARY KEY (ts_code, source)
);

normalized_index_weight

sql
CREATE TABLE normalized_index_weight (
  index_code   VARCHAR NOT NULL,
  ts_code      VARCHAR NOT NULL,
  trade_date   DATE    NOT NULL,
  weight       DOUBLE,
  source       VARCHAR NOT NULL DEFAULT 'tushare',
  ingested_at  TIMESTAMP NOT NULL,
  raw_id       BIGINT  NOT NULL,
  PRIMARY KEY (index_code, ts_code, trade_date, source)
);

normalized_index_dailybasic

sql
CREATE TABLE normalized_index_dailybasic (
  ts_code         VARCHAR NOT NULL,
  trade_date      DATE    NOT NULL,
  total_mv        DOUBLE,  -- 万元
  float_mv        DOUBLE,  -- 万元
  total_share     DOUBLE,  -- 万股
  float_share     DOUBLE,  -- 万股
  turnover_rate   DOUBLE,  -- %
  pe              DOUBLE,
  pb              DOUBLE,
  source          VARCHAR NOT NULL DEFAULT 'tushare',
  ingested_at     TIMESTAMP NOT NULL,
  raw_id          BIGINT  NOT NULL,
  PRIMARY KEY (ts_code, trade_date, source)
);

Raw tables

Standard raw_<api_name> pattern (append-only JSON blob with params, fetched_at, response, row_count).

Ingest Jobs

WeeklyIngest / MonthlyIngest

Loop over all ts_code from normalized_stock_basic (list_status='L'). Each call: tushare_call(api_name="weekly"/"monthly", params={"ts_code": ts_code}). Response is all history for that stock — we take all rows, transform, upsert.

Key difference from daily: weekly/monthly return all history per stock, not just one date. Transform extracts all rows from response, not just one.

IndexDailyIngest / IndexDailybasicIngest

Fixed list of 4 (or 2) index codes. Single call per index per day with no params (returns all history). Transform filters to rows since MAX(trade_date) + 1 from existing table to avoid re-processing.

IndexBasicIngest

Single call (no params), returns all indices. Upsert by (ts_code, source).

IndexWeightIngest

Single call with {"ts_code": "000300.SH"}, returns current month's weights. Full upsert.

Scheduler Integration

New jobs in _create_scheduler():

Job IDCron (Asia/Shanghai)Misfire Grace
weekly_refreshMon 03:0086400s (24h)
monthly_refresh_price1st 02:3086400s
index_dailyDaily 19:007200s
index_monthly1st 03:00 (basic) / 03:30 (weight)86400s

index_daily runs at 19:00 (30 min after daily_batch at 18:30) to avoid DuckDB writer contention.

Backfill Strategy

TableRangeEst. CallsEst. Time
normalized_weekly2 years~5,500 stocks × 1 = 5,500~11 min
normalized_monthly2 years~5,500 stocks × 1 = 5,500~11 min
normalized_index_daily2 years4 indices × 1 = 4< 1 min
normalized_index_dailybasic2 years4 indices × 1 = 4< 1 min
normalized_index_basicall1instant
normalized_index_weight2 years1 index × 1 = 1< 1 min

Total backfill: ~22 minutes wall time (weekly + monthly dominate). Can run during off-hours.

File Changes

FileChange
src/service/schema/01_raw_tables.sqlAdd raw tables for 6 new APIs
src/service/schema/02_normalized_tables.sqlAdd 6 normalized tables
src/service/ingest/weekly.pyNew: WeeklyIngest, MonthlyIngest
src/service/ingest/index_daily.pyNew: IndexDailyIngest, IndexDailybasicIngest, IndexBasicIngest, IndexWeightIngest
src/service/scheduler.pyAdd 4 new jobs, update imports
src/service/routes/warehouse.pyAdd /warehouse/weekly, /warehouse/monthly, /warehouse/index routes
mcp/tushare_mcp/server.pyWire new tools to DuckDB (change from passthrough to warehouse)
mcp/tushare_mcp/proxy.pyUpdate TushareProxy to read from new tables

MCP Tool Migration

After warehouse tables exist, the MCP server tools switch from live passthrough to warehouse-backed:

ToolBeforeAfter
get_weekly_pricets.pro_api().weekly()DuckDB normalized_weekly
get_monthly_pricets.pro_api().monthly()DuckDB normalized_monthly
get_index_dailyts.pro_api().index_daily()DuckDB normalized_index_daily
get_index_basicts.pro_api().index_basic()DuckDB normalized_index_basic
get_index_weightts.pro_api().index_weight()DuckDB normalized_index_weight
get_index_dailybasicts.pro_api().index_dailybasic()DuckDB normalized_index_dailybasic

Tools with warehouse fallback pattern: try DuckDB first, if no rows or data too old, fall back to live Tushare.

Risks

  • Weekly/monthly loop time: 5,500 sequential calls × network latency. Could hit Tushare rate limit if network is slow. Mitigation: add small time.sleep(0.1) between calls to stay under 500/min.
  • Index weight data size: index_weight returns ~300 rows (沪深300 constituents). Monthly snapshots → ~3,600 rows/year. Negligible.
  • normalized_stock_basic as source of truth for ts_code list: If stock_basic is stale, weekly/monthly will miss new listings. Mitigation: monthly_refresh (stock_basic) runs at 02:00, monthly_refresh_price at 02:30 — stock_basic is fresh by the time weekly/monthly runs.

Out of Scope (Future Specs)

  • T3: Financial statements (income, balancesheet, cashflow, fina_indicator, forecast, express, fina_audit, fina_mainbz)
  • T4: Events (dividend, top10_holders, stock_company, name_change, hs_const, new_share, suspend_d)
  • T5: Funds & options (fund_daily, opt_basic, opt_daily)

团队内部文档