Fix frozen delta watermark and add error stats, lazy source, concurrent disk reads, and per-engine config

This commit is contained in:
Jan Doubravský
2026-06-08 19:35:33 +02:00
parent 209ae667ab
commit 6dc85e4f3c
17 changed files with 668 additions and 71 deletions
+23 -1
View File
@@ -258,6 +258,7 @@ engine = CachingEngine(base_engine, in_memory=False)
- The cache can **exceed available memory** — nothing is held in RAM beyond SQLite's page cache.
- Every write **persists immediately** (WAL + `synchronous=NORMAL`), so there is no hourly backup thread, no load-into-memory step on startup, and no shutdown flush to lose.
- **Reads run concurrently** — each thread reads through its own read-only WAL connection, so a slow `SELECT` doesn't block writers (loads/upserts) or other readers.
- On open, a cache file with a mismatched schema version is wiped in place and rebuilt; `engine.reset()` drops the cached tables and `VACUUM`s the file (it does not delete the open file).
The constructor argument wins over the env var; when `in_memory` is omitted it falls back to `SQLMEM_IN_MEMORY`.
@@ -277,11 +278,15 @@ Use `reset()` after a **structural change** in the source (columns added/removed
```python
stats = engine.stats # Stats snapshot
print(stats.hits, stats.misses, stats.refetches)
print(stats.hits, stats.misses, stats.refetches, stats.errors)
for name, t in stats.tables.items():
print(name, t.rows, t.state, t.tracking, t.last_refresh)
if t.consecutive_failures:
print(f" {name} failing ×{t.consecutive_failures}: {t.last_error} ({t.last_error_at})")
```
`Stats.errors` is the total number of load/refresh failures since start. Each `TableStats` also carries `last_error`, `last_error_at` and `consecutive_failures` (reset to 0 on the next success) — so a delta that fails *before* streaming (which otherwise leaves `state` looking `ready`) is still visible, and the table is marked `error`.
Each `TableStats` reports a live processing **state** and how the table is kept fresh (**tracking**):
| `state` | Meaning |
@@ -321,6 +326,23 @@ Set via environment variables or a `.env` file:
| `SQLMEM_REFRESH_INTERVAL` | `300` | background refresh tick (seconds) — delta pulls and proactive TTL reloads |
| `SQLMEM_FETCH_BATCH` | `10000` | rows fetched per batch when loading a table — caps peak memory for huge tables |
Most of these can also be passed **per engine** to the constructor, overriding the env default — handy for running two engines (with separate cache files) in one process, and for tests:
```python
engine = CachingEngine(
base_engine,
cache_db_path="orders_cache.db", # SQLMEM_CACHE_DB
in_memory=False, # SQLMEM_IN_MEMORY
backup_interval=3600, # SQLMEM_BACKUP_INTERVAL
refresh_interval=300, # SQLMEM_REFRESH_INTERVAL
fetch_batch=10000, # SQLMEM_FETCH_BATCH
dialect="tsql", # SQLMEM_SQL_DIALECT
blocking_startup_refresh=False, # block startup until caught up? (default: no)
)
```
By default the **startup catch-up** (delta pulls and TTL reloads for tables restored from disk) runs on the background thread so it never blocks application startup; the cache may serve slightly stale data until the first refresh completes. Set `blocking_startup_refresh=True` to catch up synchronously before the engine starts serving.
## Exceptions
| Exception | When raised |