Batch large-table loads to bound memory and add per-table state to stats

2026-06-05 14:44:07 +02:00
parent 85bb84a1a6
commit 286a5f207d
11 changed files with 436 additions and 29 deletions
@@ -6,6 +6,20 @@ All notable changes to this project will be documented in this file.

 ---

+## [1.5.0] - 2026-06-05
+
+### Added
+- **Per-table processing state in `stats`** — `TableStats` now carries `state` (`loading` / `refreshing` / `ready` / `stale` / `error`) and `tracking` (`delta` / `ttl` / `static`), so callers can see whether each table is up to date or being processed. In-progress first loads and failed loads also surface in `stats.tables`.
+- `SQLMEM_FETCH_BATCH` env var (default `10000`) — rows fetched per batch when loading a table.
+
+### Changed
+- `pyproject.toml` — bumped version to `1.5.0`
+- **Large-table loads are streamed in batches** — `load_table` no longer `fetchall()`s the whole table (which double-buffered every row in Python and could OOM/crash on tens of millions of rows). Rows are now fetched `SQLMEM_FETCH_BATCH` at a time into a staging table and swapped in atomically, so peak memory stays bounded, the previous copy stays queryable during a reload, and the network fetch no longer holds the cache lock. Delta catch-ups are streamed the same way.
+- Orphan staging tables left by an interrupted load (crash/backup mid-load) are dropped on startup.
+- Delta upserts compute `row_count` once per refresh instead of a full `COUNT(*)` after every batch (avoids O(rows×batches) work on large catch-ups).
+
+---
+
 ## [1.4.0] - 2026-06-05

 ### Fixed