Wire datetime_columns through query params and reads; add db_size and vacuum guard

This commit is contained in:
Jan Doubravský
2026-06-10 13:58:29 +02:00
parent 8e46ee3547
commit a68b8994e3
13 changed files with 383 additions and 20 deletions
+12 -4
View File
@@ -297,10 +297,17 @@ engine = CachingEngine(
```
- **Opt-in per column.** Only the columns you name change; everything else keeps the default lossless `TEXT` storage.
- ⚠️ **It changes the output contract for those columns**`execute()` returns them as `int` (µs since epoch), not ISO strings, and a `WHERE` on them must compare against integer µs. Don't list a column your callers read as a string.
- **Transparent in and out.** A `WHERE` on such a column accepts a `datetime` or an ISO string — the param is coerced to integer µs so the comparison matches — and `execute()` returns the column as a real `datetime` (UTC), the same type a direct source query would give. Pass `return_datetime=False` to get the raw integers instead.
- The delta watermark is handled transparently: it is persisted as the integer and bound back to a real `datetime` for the source query, so incremental refresh keeps working.
- ⚠️ This is a **breaking on-disk change** (`SCHEMA_VERSION` 4): an existing cache is wiped and reloaded on first start after enabling it — schedule a maintenance window for a large reload.
To build a `WHERE` param yourself (e.g. an HTTP `?since=` filter) without re-implementing the conversion, use the exported helper:
```python
from sqlmem import datetime_to_epoch_us
rows = engine.execute("SELECT * FROM events WHERE changed > ?", (datetime_to_epoch_us(since),))
```
## Manual cache control
```python
@@ -316,20 +323,20 @@ Use `reset()` after a **structural change** in the source (columns added/removed
`hard_reset()` goes further than `reset()` in disk mode: it closes every connection, deletes `cache.db` (and its `-wal`/`-shm` sidecars) and reopens from scratch — the only way to change a baked-in `page_size`/`auto_vacuum`. In memory mode it falls back to `reset()`.
`vacuum()` reclaims free pages left behind by delta `INSERT OR REPLACE` churn. Incremental (the default) is cheap and non-blocking but needs `auto_vacuum=INCREMENTAL`; `vacuum(incremental=False)` runs a full VACUUM that rewrites the file (~2× disk, blocks readers) — schedule it in a maintenance window. Both are no-ops in memory mode.
`vacuum()` reclaims free pages left behind by delta `INSERT OR REPLACE` churn. Incremental (the default) is cheap and non-blocking but needs `auto_vacuum=INCREMENTAL` (set it via `pragmas=` on a fresh cache); if the cache wasn't created that way, `vacuum(incremental=True)` logs a warning and does nothing rather than silently no-op'ing. `vacuum(incremental=False)` runs a full VACUUM that rewrites the file (~2× disk, blocks readers) — schedule it in a maintenance window. Both are no-ops in memory mode.
## Runtime statistics
```python
stats = engine.stats # Stats snapshot
print(stats.hits, stats.misses, stats.refetches, stats.errors)
print(stats.hits, stats.misses, stats.refetches, stats.errors, stats.db_size_bytes)
for name, t in stats.tables.items():
print(name, t.rows, t.state, t.tracking, t.last_upsert, t.last_refresh)
if t.consecutive_failures:
print(f" {name} failing ×{t.consecutive_failures}: {t.last_error} ({t.last_error_at})")
```
`Stats.errors` is the total number of load/refresh failures since start. Each `TableStats` also carries `last_error`, `last_error_at` and `consecutive_failures` (reset to 0 on the next success) — so a delta that fails *before* streaming (which otherwise leaves `state` looking `ready`) is still visible, and the table is marked `error`.
`Stats.db_size_bytes` is the on-disk cache file size (0 in memory mode) — handy for monitoring cache growth. `Stats.errors` is the total number of load/refresh failures since start. Each `TableStats` also carries `last_error`, `last_error_at` and `consecutive_failures` (reset to 0 on the next success) — so a delta that fails *before* streaming (which otherwise leaves `state` looking `ready`) is still visible, and the table is marked `error`.
Two timestamps distinguish *data freshness* from *liveness*:
@@ -392,6 +399,7 @@ engine = CachingEngine(
dialect="tsql", # SQLMEM_SQL_DIALECT
pragmas={"mmap_size": 32 * 1024**3, "page_size": 8192}, # disk-mode SQLite tuning
datetime_columns={"orders": ["created_at"]}, # store these as INTEGER µs (opt-in)
return_datetime=True, # return datetime_columns as datetime (vs raw µs int)
blocking_startup_refresh=False, # block startup until caught up? (default: no)
)
```