Batch large-table loads to bound memory and add per-table state to stats
This commit is contained in:
@@ -6,6 +6,20 @@ All notable changes to this project will be documented in this file.
|
||||
|
||||
---
|
||||
|
||||
## [1.5.0] - 2026-06-05
|
||||
|
||||
### Added
|
||||
- **Per-table processing state in `stats`** — `TableStats` now carries `state` (`loading` / `refreshing` / `ready` / `stale` / `error`) and `tracking` (`delta` / `ttl` / `static`), so callers can see whether each table is up to date or being processed. In-progress first loads and failed loads also surface in `stats.tables`.
|
||||
- `SQLMEM_FETCH_BATCH` env var (default `10000`) — rows fetched per batch when loading a table.
|
||||
|
||||
### Changed
|
||||
- `pyproject.toml` — bumped version to `1.5.0`
|
||||
- **Large-table loads are streamed in batches** — `load_table` no longer `fetchall()`s the whole table (which double-buffered every row in Python and could OOM/crash on tens of millions of rows). Rows are now fetched `SQLMEM_FETCH_BATCH` at a time into a staging table and swapped in atomically, so peak memory stays bounded, the previous copy stays queryable during a reload, and the network fetch no longer holds the cache lock. Delta catch-ups are streamed the same way.
|
||||
- Orphan staging tables left by an interrupted load (crash/backup mid-load) are dropped on startup.
|
||||
- Delta upserts compute `row_count` once per refresh instead of a full `COUNT(*)` after every batch (avoids O(rows×batches) work on large catch-ups).
|
||||
|
||||
---
|
||||
|
||||
## [1.4.0] - 2026-06-05
|
||||
|
||||
### Fixed
|
||||
|
||||
Reference in New Issue
Block a user