Add declarative TableSpec API with preload and fail-fast; fix shared-connection race

This commit is contained in:
Jan Doubravský
2026-06-11 13:39:56 +02:00
parent 46370fe651
commit 4a86b2282f
11 changed files with 500 additions and 37 deletions
+37 -1
View File
@@ -238,6 +238,41 @@ Each value is a list of index definitions: a string is a single-column index, a
- Indexes are **recreated after every (re)load** — full loads, TTL reloads, and `invalidate()` + re-fetch all rebuild them — so they're always present, and they persist in `cache.db` across restarts.
- Delta-tracked tables already get a unique index on their key columns; secondary indexes are independent and can be combined with `delta` or `ttl`.
## Declarative initialization (`tables=`)
Instead of the lazy "learn columns from queries" mode, you can **declare every table up front** with `tables=[TableSpec(...)]` — its columns, indexes, refresh strategy and which columns are datetimes — and have the engine preload them and reject anything undeclared:
```python
from sqlmem import CachingEngine, TableSpec, Delta, TTL
engine = CachingEngine(
base_engine,
tables=[
TableSpec(
name="VW_P_PRATVALUES",
columns=["PRODUCT_PRODUCTNR", "PRAT_NAME", "PRATVALUE", "CHANGE_DATE"],
indexes=["PRODUCT_PRODUCTNR", "PRAT_NAME", "CHANGE_DATE"],
refresh=Delta(change_column="CHANGE_DATE", key_columns=["PRATVALUE_ID"]),
datetime_columns=["CHANGE_DATE"],
preload=True,
),
TableSpec(
name="VW_PRODUCTS_ASSIGNED_E",
columns=["PRODUCT_PRODUCTNR", "ELEMENT_NAME", "ELEMENT_ID"],
indexes=["PRODUCT_PRODUCTNR"],
refresh=TTL(seconds=1800),
preload=True,
),
],
pragmas={"mmap_size": 32 * 1024**3, "page_size": 8192},
)
```
- **Preload** — `preload=True` tables are loaded at startup (on the background thread by default, so startup isn't blocked; pass `blocking_startup_refresh=True` to load them synchronously before serving). A copy already fresh in the persistent cache is **skipped**, so a warm restart is instant. During warm-up a table reports `TableState.LOADING` in [`stats`](#runtime-statistics) — handy for gating a `503` until it's `ready`.
- **Fail-fast** — a query for a table without a `TableSpec`, or for a column outside a spec's declared `columns` (including `SELECT *` on a column-restricted table), raises `UndeclaredError` instead of silently kicking off an expensive lazy load. Use `columns=None` to cache the whole table and allow any column.
- `refresh=` takes a `Delta(change_column=…, key_columns=…)` (same as `DeltaConfig`) or `TTL(seconds=…)`, or `None` for a static table.
- **Backward compatible** — omit `tables=` and the legacy `delta=`/`ttl=`/`indexes=`/`datetime_columns=` kwargs work exactly as before (lazy mode, no fail-fast). Passing both raises `ValueError`.
## Persistence
By default the cache lives in an **in-memory SQLite** and is persisted to `cache.db` on disk:
@@ -413,9 +448,10 @@ By default the **startup catch-up** (delta pulls and TTL reloads for tables rest
|---|---|
| `ReadOnlyError` | INSERT, UPDATE, or DELETE statement |
| `UnsupportedQueryError` | non-SELECT statement, `SELECT` without `FROM`, or an unqualified column in a multi-table query |
| `UndeclaredError` | in [declarative mode](#declarative-initialization-tables) (`tables=`): a query references a table or column that was not declared |
```python
from sqlmem import ReadOnlyError, UnsupportedQueryError
from sqlmem import ReadOnlyError, UnsupportedQueryError, UndeclaredError
```
## Logging