Fix frozen delta watermark and add error stats, lazy source, concurrent disk reads, and per-engine config
This commit is contained in:
@@ -258,6 +258,7 @@ engine = CachingEngine(base_engine, in_memory=False)
|
||||
|
||||
- The cache can **exceed available memory** — nothing is held in RAM beyond SQLite's page cache.
|
||||
- Every write **persists immediately** (WAL + `synchronous=NORMAL`), so there is no hourly backup thread, no load-into-memory step on startup, and no shutdown flush to lose.
|
||||
- **Reads run concurrently** — each thread reads through its own read-only WAL connection, so a slow `SELECT` doesn't block writers (loads/upserts) or other readers.
|
||||
- On open, a cache file with a mismatched schema version is wiped in place and rebuilt; `engine.reset()` drops the cached tables and `VACUUM`s the file (it does not delete the open file).
|
||||
|
||||
The constructor argument wins over the env var; when `in_memory` is omitted it falls back to `SQLMEM_IN_MEMORY`.
|
||||
@@ -277,11 +278,15 @@ Use `reset()` after a **structural change** in the source (columns added/removed
|
||||
|
||||
```python
|
||||
stats = engine.stats # Stats snapshot
|
||||
print(stats.hits, stats.misses, stats.refetches)
|
||||
print(stats.hits, stats.misses, stats.refetches, stats.errors)
|
||||
for name, t in stats.tables.items():
|
||||
print(name, t.rows, t.state, t.tracking, t.last_refresh)
|
||||
if t.consecutive_failures:
|
||||
print(f" {name} failing ×{t.consecutive_failures}: {t.last_error} ({t.last_error_at})")
|
||||
```
|
||||
|
||||
`Stats.errors` is the total number of load/refresh failures since start. Each `TableStats` also carries `last_error`, `last_error_at` and `consecutive_failures` (reset to 0 on the next success) — so a delta that fails *before* streaming (which otherwise leaves `state` looking `ready`) is still visible, and the table is marked `error`.
|
||||
|
||||
Each `TableStats` reports a live processing **state** and how the table is kept fresh (**tracking**):
|
||||
|
||||
| `state` | Meaning |
|
||||
@@ -321,6 +326,23 @@ Set via environment variables or a `.env` file:
|
||||
| `SQLMEM_REFRESH_INTERVAL` | `300` | background refresh tick (seconds) — delta pulls and proactive TTL reloads |
|
||||
| `SQLMEM_FETCH_BATCH` | `10000` | rows fetched per batch when loading a table — caps peak memory for huge tables |
|
||||
|
||||
Most of these can also be passed **per engine** to the constructor, overriding the env default — handy for running two engines (with separate cache files) in one process, and for tests:
|
||||
|
||||
```python
|
||||
engine = CachingEngine(
|
||||
base_engine,
|
||||
cache_db_path="orders_cache.db", # SQLMEM_CACHE_DB
|
||||
in_memory=False, # SQLMEM_IN_MEMORY
|
||||
backup_interval=3600, # SQLMEM_BACKUP_INTERVAL
|
||||
refresh_interval=300, # SQLMEM_REFRESH_INTERVAL
|
||||
fetch_batch=10000, # SQLMEM_FETCH_BATCH
|
||||
dialect="tsql", # SQLMEM_SQL_DIALECT
|
||||
blocking_startup_refresh=False, # block startup until caught up? (default: no)
|
||||
)
|
||||
```
|
||||
|
||||
By default the **startup catch-up** (delta pulls and TTL reloads for tables restored from disk) runs on the background thread so it never blocks application startup; the cache may serve slightly stale data until the first refresh completes. Set `blocking_startup_refresh=True` to catch up synchronously before the engine starts serving.
|
||||
|
||||
## Exceptions
|
||||
|
||||
| Exception | When raised |
|
||||
|
||||
Reference in New Issue
Block a user