主题
Design: T2a/T2b Warehouse Ingestion (Weekly/Monthly/Index)
Date: 2026-05-13 Status: Draft Owner: liang Parent spec:
docs/planning/01-data-layer.md§3 (collection plan)
Status: ✅ Implemented (2026-05-14)
All file changes shipped:
src/service/schema/01_raw_tables.sql— 6 new raw tablessrc/service/schema/02_normalized_tables.sql— 6 new normalized tablessrc/service/ingest/weekly.py—WeeklyIngest,MonthlyIngestsrc/service/ingest/index_daily.py—IndexDailyIngest,IndexDailybasicIngest,IndexBasicIngest,IndexWeightIngestsrc/service/scheduler.py— 4 new jobsmcp/tushare_mcp/proxy.py— 6 new warehouse methods in_WAREHOUSE_METHODS
8 MCP tools now warehouse-backed (was 5, now 11 of 28).
Problem
21 of 28 MCP tools use live Tushare HTTP passthrough. No warehouse persistence for weekly/monthly prices, index data, financials, options, etc. Slow, rate-limited, no historical context for Agent queries.
Scope (T2a + T2b Only)
This spec covers 8 new ingest jobs for weekly/monthly price data and index data. T3 (financials) and T4/T5 (events, funds, options) are future specs.
T2a: Weekly + Monthly Price
| MCP Tool | Tushare API | Table | Schedule |
|---|---|---|---|
get_weekly_price | weekly | normalized_weekly | 每周一 03:00 Asia/Shanghai |
get_monthly_price | monthly | normalized_monthly | 每月 1 日 02:30 Asia/Shanghai |
Both APIs accept ts_code or trade_date. Batch strategy: Unlike daily (which supports trade_date for full market), weekly/monthly only support per-stock queries. Strategy: loop over ts_code from normalized_stock_basic where list_status='L'.
Call count: ~5,500 stocks × 1 call = 5,500 calls per run. At 500 calls/min, ~11 minutes.
T2b: Index Data
| MCP Tool | Tushare API | Table | Schedule |
|---|---|---|---|
get_index_daily | index_daily | normalized_index_daily | 每日 19:00 Asia/Shanghai |
get_index_basic | index_basic | normalized_index_basic | 每月 1 日 03:00 |
get_index_weight | index_weight | normalized_index_weight | 每月 1 日 03:30 |
get_index_dailybasic | index_dailybasic | normalized_index_dailybasic | 每日 19:00 Asia/Shanghai |
Index scope: 4 indices only (沪深300 000300.SH, 中证500 000905.SH, 中证1000 000852.SH, 上证综指 000001.SH). index_weight for 沪深300 only (largest, most queries).
Call count: 2 indices × 1 call/day = 2 calls for daily. 2 calls for dailybasic. Monthly: 4 basic + 1 weight = 5 calls. Negligible.
Schema
normalized_weekly
sql
CREATE TABLE normalized_weekly (
ts_code VARCHAR NOT NULL,
trade_date DATE NOT NULL, -- week-end date
open DOUBLE,
high DOUBLE,
low DOUBLE,
close DOUBLE NOT NULL,
pre_close DOUBLE,
change DOUBLE,
pct_chg DOUBLE,
vol BIGINT,
amount DOUBLE,
source VARCHAR NOT NULL DEFAULT 'tushare',
ingested_at TIMESTAMP NOT NULL,
raw_id BIGINT NOT NULL,
PRIMARY KEY (ts_code, trade_date, source)
);normalized_monthly
Same columns as normalized_weekly, same PK structure.
normalized_index_daily
sql
CREATE TABLE normalized_index_daily (
ts_code VARCHAR NOT NULL,
trade_date DATE NOT NULL,
open DOUBLE,
high DOUBLE,
low DOUBLE,
close DOUBLE NOT NULL,
pre_close DOUBLE,
change DOUBLE,
pct_chg DOUBLE,
vol BIGINT,
amount DOUBLE,
source VARCHAR NOT NULL DEFAULT 'tushare',
ingested_at TIMESTAMP NOT NULL,
raw_id BIGINT NOT NULL,
PRIMARY KEY (ts_code, trade_date, source)
);normalized_index_basic
sql
CREATE TABLE normalized_index_basic (
ts_code VARCHAR NOT NULL,
name VARCHAR,
fullname VARCHAR,
market VARCHAR,
publisher VARCHAR,
category VARCHAR,
base_date DATE,
base_point DOUBLE,
list_date DATE,
source VARCHAR NOT NULL DEFAULT 'tushare',
ingested_at TIMESTAMP NOT NULL,
raw_id BIGINT NOT NULL,
PRIMARY KEY (ts_code, source)
);normalized_index_weight
sql
CREATE TABLE normalized_index_weight (
index_code VARCHAR NOT NULL,
ts_code VARCHAR NOT NULL,
trade_date DATE NOT NULL,
weight DOUBLE,
source VARCHAR NOT NULL DEFAULT 'tushare',
ingested_at TIMESTAMP NOT NULL,
raw_id BIGINT NOT NULL,
PRIMARY KEY (index_code, ts_code, trade_date, source)
);normalized_index_dailybasic
sql
CREATE TABLE normalized_index_dailybasic (
ts_code VARCHAR NOT NULL,
trade_date DATE NOT NULL,
total_mv DOUBLE, -- 万元
float_mv DOUBLE, -- 万元
total_share DOUBLE, -- 万股
float_share DOUBLE, -- 万股
turnover_rate DOUBLE, -- %
pe DOUBLE,
pb DOUBLE,
source VARCHAR NOT NULL DEFAULT 'tushare',
ingested_at TIMESTAMP NOT NULL,
raw_id BIGINT NOT NULL,
PRIMARY KEY (ts_code, trade_date, source)
);Raw tables
Standard raw_<api_name> pattern (append-only JSON blob with params, fetched_at, response, row_count).
Ingest Jobs
WeeklyIngest / MonthlyIngest
Loop over all ts_code from normalized_stock_basic (list_status='L'). Each call: tushare_call(api_name="weekly"/"monthly", params={"ts_code": ts_code}). Response is all history for that stock — we take all rows, transform, upsert.
Key difference from daily: weekly/monthly return all history per stock, not just one date. Transform extracts all rows from response, not just one.
IndexDailyIngest / IndexDailybasicIngest
Fixed list of 4 (or 2) index codes. Single call per index per day with no params (returns all history). Transform filters to rows since MAX(trade_date) + 1 from existing table to avoid re-processing.
IndexBasicIngest
Single call (no params), returns all indices. Upsert by (ts_code, source).
IndexWeightIngest
Single call with {"ts_code": "000300.SH"}, returns current month's weights. Full upsert.
Scheduler Integration
New jobs in _create_scheduler():
| Job ID | Cron (Asia/Shanghai) | Misfire Grace |
|---|---|---|
weekly_refresh | Mon 03:00 | 86400s (24h) |
monthly_refresh_price | 1st 02:30 | 86400s |
index_daily | Daily 19:00 | 7200s |
index_monthly | 1st 03:00 (basic) / 03:30 (weight) | 86400s |
index_daily runs at 19:00 (30 min after daily_batch at 18:30) to avoid DuckDB writer contention.
Backfill Strategy
| Table | Range | Est. Calls | Est. Time |
|---|---|---|---|
normalized_weekly | 2 years | ~5,500 stocks × 1 = 5,500 | ~11 min |
normalized_monthly | 2 years | ~5,500 stocks × 1 = 5,500 | ~11 min |
normalized_index_daily | 2 years | 4 indices × 1 = 4 | < 1 min |
normalized_index_dailybasic | 2 years | 4 indices × 1 = 4 | < 1 min |
normalized_index_basic | all | 1 | instant |
normalized_index_weight | 2 years | 1 index × 1 = 1 | < 1 min |
Total backfill: ~22 minutes wall time (weekly + monthly dominate). Can run during off-hours.
File Changes
| File | Change |
|---|---|
src/service/schema/01_raw_tables.sql | Add raw tables for 6 new APIs |
src/service/schema/02_normalized_tables.sql | Add 6 normalized tables |
src/service/ingest/weekly.py | New: WeeklyIngest, MonthlyIngest |
src/service/ingest/index_daily.py | New: IndexDailyIngest, IndexDailybasicIngest, IndexBasicIngest, IndexWeightIngest |
src/service/scheduler.py | Add 4 new jobs, update imports |
src/service/routes/warehouse.py | Add /warehouse/weekly, /warehouse/monthly, /warehouse/index routes |
mcp/tushare_mcp/server.py | Wire new tools to DuckDB (change from passthrough to warehouse) |
mcp/tushare_mcp/proxy.py | Update TushareProxy to read from new tables |
MCP Tool Migration
After warehouse tables exist, the MCP server tools switch from live passthrough to warehouse-backed:
| Tool | Before | After |
|---|---|---|
get_weekly_price | ts.pro_api().weekly() | DuckDB normalized_weekly |
get_monthly_price | ts.pro_api().monthly() | DuckDB normalized_monthly |
get_index_daily | ts.pro_api().index_daily() | DuckDB normalized_index_daily |
get_index_basic | ts.pro_api().index_basic() | DuckDB normalized_index_basic |
get_index_weight | ts.pro_api().index_weight() | DuckDB normalized_index_weight |
get_index_dailybasic | ts.pro_api().index_dailybasic() | DuckDB normalized_index_dailybasic |
Tools with warehouse fallback pattern: try DuckDB first, if no rows or data too old, fall back to live Tushare.
Risks
- Weekly/monthly loop time: 5,500 sequential calls × network latency. Could hit Tushare rate limit if network is slow. Mitigation: add small
time.sleep(0.1)between calls to stay under 500/min. - Index weight data size:
index_weightreturns ~300 rows (沪深300 constituents). Monthly snapshots → ~3,600 rows/year. Negligible. normalized_stock_basicas source of truth for ts_code list: If stock_basic is stale, weekly/monthly will miss new listings. Mitigation:monthly_refresh(stock_basic) runs at 02:00,monthly_refresh_priceat 02:30 — stock_basic is fresh by the time weekly/monthly runs.
Out of Scope (Future Specs)
- T3: Financial statements (income, balancesheet, cashflow, fina_indicator, forecast, express, fina_audit, fina_mainbz)
- T4: Events (dividend, top10_holders, stock_company, name_change, hs_const, new_share, suspend_d)
- T5: Funds & options (fund_daily, opt_basic, opt_daily)