Add per-table TTL refresh for tables without a change column

This commit is contained in:
Jan Doubravský
2026-06-05 12:12:57 +02:00
parent 33aa126ff6
commit 85bb84a1a6
8 changed files with 240 additions and 19 deletions
+26 -3
View File
@@ -185,7 +185,7 @@ sequenceDiagram
- **First use** of a delta table → full load; the watermark is set to the table's current `max(change_column)`.
- **On startup** → for each delta table restored from disk, a single catch-up query pulls everything changed **since the last shutdown** and upserts it, bringing the cache back in sync without a full reload.
- **While running** → a background thread repeats the delta pull every `SQLMEM_REFRESH_INTERVAL` seconds (default 5 minutes), so the cache trails the source DB by at most that interval.
- Tables **without** a `DeltaConfig` keep the current behaviour: full load on miss, never auto-refreshed.
- Tables **without** a `DeltaConfig` keep the default behaviour: full load on miss, never auto-refreshed — unless they are given a [TTL](#time-based-refresh-tables-without-a-change-column).
### Requirements and limits of delta sync
@@ -195,6 +195,29 @@ sequenceDiagram
- **Structural changes are not covered by delta sync** — adding/removing attributes, or clearing values *without* bumping `change_column`, won't be picked up. For those, force a clean reload with [`engine.reset()`](#manual-cache-control) (or `invalidate()` for a single table).
- Hard `DELETE`s of whole rows are not detected by a change-timestamp; this workload doesn't delete rows, but if yours does, use a soft-delete flag column or `reset()`.
## Time-based refresh (tables without a change column)
Some tables can't be delta-synced because they have no change timestamp. For those you can set a **TTL** (max age in seconds): SQLmem keeps serving from cache and guarantees the cached copy is **never older than the TTL** by doing a full reload when it expires.
```python
engine = CachingEngine(
base_engine,
ttl={
"VW_LOOKUP_CODES": 300, # full-reload if the cache is older than 5 minutes
"VW_SETTINGS": 3600,
},
)
```
- **Read-time guarantee** — when a query touches a TTL table whose cache is older than its TTL, the table is fully reloaded *before* the query is answered, so a stale copy is never returned.
- **Proactive** — the background thread also full-reloads expired TTL tables every `SQLMEM_REFRESH_INTERVAL` seconds, keeping them warm so reads usually don't pay the reload latency.
- TTL age is measured from `last_refresh_at`, which is persisted in `cache.db`, so the guarantee holds across restarts (an expired table is reloaded on first use after start).
- A table may be in **either** `delta` **or** `ttl`, not both (delta already keeps it fresh) — supplying both raises `ValueError`.
```python
engine.refresh() # also reloads any expired TTL tables on demand
```
## Persistence
The in-memory cache is persisted to `cache.db` on disk:
@@ -235,7 +258,7 @@ Set via environment variables or a `.env` file:
| `SQLMEM_CACHE_DB` | `cache.db` | Path to the on-disk persistence file |
| `SQLMEM_BACKUP_INTERVAL` | `3600` | Disk backup interval in seconds |
| `SQLMEM_SQL_DIALECT` | `tsql` | sqlglot dialect used to parse incoming SQL (e.g. `tsql`, `postgres`, `mysql`) |
| `SQLMEM_REFRESH_INTERVAL` | `300` | delta-refresh interval in seconds for delta-tracked tables |
| `SQLMEM_REFRESH_INTERVAL` | `300` | background refresh tick (seconds) — delta pulls and proactive TTL reloads |
## Exceptions
@@ -276,7 +299,7 @@ Set `SQLMEM_DEBUG=true` in `.env` to make the default level DEBUG when no explic
- [x] **Incremental (delta) refresh** via per-table change-timestamp + key columns (see above) — the key feature for large tables.
- [x] **Primary-key auto-discovery** from the source DB (`inspect(engine).get_pk_constraint`) so `key_columns` is only needed for views.
- [x] **`engine.reset()`** — wipe RAM + `cache.db` for a clean rebuild after structural changes.
- [ ] Per-table TTL (time-to-live) expiry.
- [x] **Per-table TTL** (time-to-live) — bounded-staleness full refresh for tables without a change column.
## Dependencies