Store named datetime columns as INTEGER microseconds (datetime_columns)

Add pragmas, hard_reset, and vacuum for tuning disk-backed caches
Split last_upsert (persisted write) and last_refresh (run liveness) in stats
2026-06-09 18:18:38 +02:00 · 2026-06-09 17:58:41 +02:00 · 2026-06-09 08:48:29 +02:00 · 2026-06-08 19:35:33 +02:00 · 2026-06-08 11:39:04 +02:00 · 2026-06-05 18:17:55 +02:00
28 changed files with 3781 additions and 190 deletions
@@ -1,8 +1,6 @@
 # Python
 __pycache__/
 *.py[cod]
 *.pyo
 *.pyd
 *.egg
 *.egg-info/
 dist/
@@ -39,8 +37,12 @@ Thumbs.db
 .env
 .env.*
 # sqlmem cache (incl. WAL sidecars from disk-backed mode)
 cache.db*
 # Agents
 AGENTS.md
 CLAUDE.md
 DESIGN_DOCUMENT_MODULE.md
-.claude
+.claude/
 handover.md
@@ -6,6 +6,174 @@ All notable changes to this project will be documented in this file.
 ---
 ## [1.12.0] - 2026-06-09
 ### ⚠️ Breaking
 - **`SCHEMA_VERSION` bumped `3` → `4`** — on upgrade the existing cache is wiped automatically (disk mode wipes the file in place, in-memory discards the backup) and reloaded from the source on next use. For a large cache (e.g. a multi-hundred-million-row table) the full reload can take a while; deploy in a maintenance window.
 - **`datetime_columns` change the public output contract for the chosen columns** — a column listed in `datetime_columns` is stored and returned as an **INTEGER (microseconds since the Unix epoch, UTC)**, not an ISO `TEXT` string. This is opt-in per column, so no table is affected unless you name its columns; consumers that read or filter such a column must adapt (compare against integer µs, or convert on read).
 ### Added
 - **`datetime_columns=` parameter on `CachingEngine` / `CacheManager`** — `datetime_columns={"VW_X": ["CHANGE_DATE"]}` stores the named datetime columns as INTEGER µs-since-epoch instead of ~28-byte ISO `TEXT`. Saves ~20 bytes per row and makes index comparisons on the column operate on native integers instead of string collation — worthwhile for a pure datetime column on a very large table (e.g. a delta change column that is also range-scanned).
  - `_coerce.to_sqlite_datetime()` converts datetimes (and ISO/`date` values) to exact integer microseconds via integer arithmetic (no float rounding); a naive datetime is treated as UTC, `None` passes through.
  - `load_table` declares those columns `INTEGER` and `upsert_rows` coerces them the same way, so full loads and delta upserts agree on the on-disk representation.
  - The delta high-watermark for such a column is the stored integer; `delta._bind_watermark(..., epoch_us=True)` reconstructs a real UTC `datetime` before binding, so the source still receives a typed timestamp (and the watermark fix from 1.8.0 keeps holding).
 ### Changed
 - `pyproject.toml` — bumped version to `1.12.0`.
 - `CacheManager.max_value` / `set_last_synced_at` now accept/return `int` watermarks alongside `str` (the INTEGER-µs watermark round-trips through the `last_synced_at` TEXT column as its digit string).
 ---
 ## [1.11.0] - 2026-06-09
 ### Added
 - **`pragmas=` parameter on `CachingEngine` / `CacheManager`** — pass a dict of SQLite PRAGMAs (e.g. `mmap_size`, `cache_size`, `temp_store`, `page_size`, `auto_vacuum`) applied to the cache connection at open time, so disk-backed caches can be tuned for the host's I/O profile without bypassing `CacheManager`. Unknown/inapplicable pragmas are silently ignored by SQLite (graceful degradation, no startup crash).
  - **`page_size`** is a layout pragma: it is applied only on a *fresh* file (set before WAL / the first table). On an existing cache with a different page size the request is ignored and a one-time warning is logged — the new value takes effect only after `hard_reset()` or a rebuild.
  - **`auto_vacuum`** is set before the database header is materialized (before switching to WAL) on a fresh file, so `INCREMENTAL`/`FULL` actually stick instead of silently reverting to `NONE`.
 - **`CachingEngine.hard_reset()` / `CacheManager.hard_reset()`** — close every connection, delete the on-disk cache file (and its `-wal`/`-shm` sidecars) and reopen from scratch with all current pragmas applied. Unlike `reset()` (which drops tables but keeps the open file), this lets `page_size`/`auto_vacuum` change, since those are baked into the file at creation. Disk mode only — falls back to `reset()` in memory mode. All tables reload on next use.
 - **`CachingEngine.vacuum(incremental=True, pages=10_000)` / `CacheManager.vacuum(...)`** — run maintenance VACUUM on the on-disk cache to reclaim free pages left by delta `INSERT OR REPLACE` churn. Incremental (default) reclaims up to `pages` pages without blocking readers or extra disk (requires `auto_vacuum=INCREMENTAL`); `incremental=False` runs a full VACUUM (rewrites the file, ~2× disk, blocks readers — maintenance window only). No-op in memory mode.
 ### Changed
 - `pyproject.toml` — bumped version to `1.11.0`.
 - `ColumnRegistry` gained `rebind()` so it follows the cache connection swap performed by `hard_reset()` (the registry previously captured the connection for the process lifetime).
 ---
 ## [1.10.0] - 2026-06-09
 ### Added
 - **`last_upsert` (persisted write) vs `last_refresh` (run/liveness) in `stats`** — `TableStats.last_refresh` previously came from the persisted `last_refresh_at` column, which is only written when rows are actually written (a delta cycle with `total == 0` early-returns and leaves it unchanged). A healthy delta that keeps finding no new rows therefore *looked* frozen. The single value is now split:
  - `last_upsert` — wall-clock (UTC) of the last actual data write (full load / delta with rows). Persisted, survives restarts (this is the existing `last_refresh_at` column, surfaced under a clearer name).
  - `last_refresh` — wall-clock (UTC) of the last time a refresh cycle *ran* for the table, **even when it wrote nothing**. In-memory per process (`None` until the first cycle after start), tracked like `_states`/`_errors` — so **no schema change and no cache wipe**.
  - `CacheManager` gained `mark_refresh_ran()` / `get_last_runs()`; an empty delta cycle now records a run. TTL staleness still uses the last *write* (`seconds_since_refresh` reads `last_refresh_at`), so behaviour is unchanged.
 ### Changed
 - `pyproject.toml` — bumped version to `1.10.0`.
 - **`TableStats.last_refresh` is now `str | None`** (was `str`) and a new required `last_upsert: str | None` field is added. Consumers reading `last_refresh` for "when did data change?" should switch to `last_upsert`.
 ---
 ## [1.8.0] - 2026-06-08
 ### Fixed
 - **Frozen delta watermark on `datetime` change columns** — the delta high-watermark is read back from the cache as an ISO `TEXT` string (e.g. `'2026-06-05T14:54:24.823000'`) and was bound straight back to the source. SQL Server then had to implicitly convert that `nvarchar` to `datetime` and **failed** (`T`-separated ISO with 6 fractional digits exceeds `datetime`'s 3 — error 241 / SQLSTATE 22007), so every delta refresh and the startup catch-up died before streaming and the watermark never advanced (the cache silently froze at the last full load). The watermark is now parsed back to a real `datetime` (`delta._bind_watermark`) so the driver sends a typed timestamp and the comparison runs natively; non-datetime change columns (e.g. integer rowversions) pass through unchanged. Regression tests added.
 ### Added
 - **Refresh/load failures are now visible in `stats`** — `TableStats` gained `last_error`, `last_error_at` and `consecutive_failures`, and `Stats` gained a total `errors` counter. A delta that fails *before* streaming (e.g. the watermark bug above) previously left `state = ready`, hiding the problem; it now also marks the table `error` and records the message. `consecutive_failures` resets to 0 on the next success.
 - **Per-engine configuration** — `CachingEngine` accepts `cache_db_path`, `backup_interval`, `refresh_interval`, `fetch_batch` and `dialect` (each defaults to its env var / config global when omitted), so two engines with independent cache files can run in one process and config is testable without env vars.
 - **`blocking_startup_refresh` flag** (default `False`) — the startup catch-up (deltas/TTL reloads for tables restored from disk) now runs on the background thread by default, so it never blocks application startup. Pass `blocking_startup_refresh=True` to catch up synchronously before serving.
 ### Changed
 - **SQL identifiers are quoted** — table/column names are now quoted everywhere they are interpolated into statements (SQLite double-quote for the cache, the configured dialect — e.g. T-SQL `[brackets]` — for the source), so reserved words or names with spaces work and the f-string interpolation is hardened.
 - **Source connection opened lazily** — `execute()` no longer opens a source connection on every call; a pure cache hit never touches the source (and never occupies a pool slot). The misleading `cast(sqlite3.Connection, …)` on the source handle was removed (it is a pyodbc connection in production).
 - **Concurrent reads in disk mode** — disk-backed reads now use a per-thread read-only WAL connection instead of sharing the single write connection under a lock, so a slow `SELECT` no longer blocks writers (loads/upserts) or other readers. In-memory mode is unchanged (a `:memory:` database can't be shared across connections).
 - **`add_sink` is idempotent** — calling it again for the same sink is a no-op, so a double import no longer duplicates every log line.
 - `pyproject.toml` — bumped version to `1.8.0`; added a scoped pytest `filterwarnings` for the SQLite test source's legacy datetime-adapter deprecation.
 ### Note
 - Cache type fidelity (returning real `datetime`/`Decimal`/numeric types from `execute()` instead of `TEXT` strings, and giving numeric columns proper affinity) was evaluated but **deferred** — it changes the public output contract that consumers currently rely on (and that `test_coerce.py` pins). Decimal/datetime stay stored as exact, lossless `TEXT`.
 ---
 ## [1.7.0] - 2026-06-08
 ### Added
 - **Disk-backed cache mode** — `CachingEngine(engine, in_memory=False)` (or env `SQLMEM_IN_MEMORY=false`) queries the on-disk `cache.db` directly instead of loading it into an in-memory SQLite. Every write persists immediately (no hourly backup thread, no load-on-startup copy, no `atexit`/`SIGTERM` flush needed), and the cache may exceed available RAM. The disk connection uses WAL + `synchronous=NORMAL` for write throughput. In-memory mode (backed up to disk periodically) remains the default. `in_memory` defaults to the `SQLMEM_IN_MEMORY` config when omitted.
  - On open, a disk cache with a mismatched `schema_version` is wiped in place and rebuilt.
  - `engine.reset()` in disk mode drops the cached tables and `VACUUM`s the file (it does not unlink the open file).
 - `SQLMEM_IN_MEMORY` env var (default `true`).
 ### Changed
 - `pyproject.toml` — bumped version to `1.7.0`
 - `cache.py` — `CacheManager` gained an `in_memory` flag; the cache connection (`_mem_conn` → `_conn`) is opened either on `:memory:` or directly on the on-disk file. Disk mode skips the load-on-startup copy, backup thread, and shutdown flush, and `reset()` VACUUMs in place instead of unlinking the open file.
 - `.gitignore` — ignore `cache.db` and its WAL sidecars (`cache.db-wal`, `cache.db-shm`).
 ---
 ## [1.6.0] - 2026-06-05
 ### Added
 - **Secondary indexes** — `CachingEngine(engine, indexes={"VW_X": ["col", ["a", "b"]]})` creates indexes on the in-memory cache to accelerate `WHERE`/`JOIN` lookups. Index columns are auto-loaded so the index exists from the first load, and indexes are recreated after every (re)load and persist in `cache.db`. Combines freely with `delta` and `ttl`.
 ### Changed
 - `pyproject.toml` — bumped version to `1.6.0`
 ---
 ## [1.5.0] - 2026-06-05
 ### Added
 - **Per-table processing state in `stats`** — `TableStats` now carries `state` (`loading` / `refreshing` / `ready` / `stale` / `error`) and `tracking` (`delta` / `ttl` / `static`), so callers can see whether each table is up to date or being processed. In-progress first loads and failed loads also surface in `stats.tables`.
 - `SQLMEM_FETCH_BATCH` env var (default `10000`) — rows fetched per batch when loading a table.
 ### Changed
 - `pyproject.toml` — bumped version to `1.5.0`
 - **Large-table loads are streamed in batches** — `load_table` no longer `fetchall()`s the whole table (which double-buffered every row in Python and could OOM/crash on tens of millions of rows). Rows are now fetched `SQLMEM_FETCH_BATCH` at a time into a staging table and swapped in atomically, so peak memory stays bounded, the previous copy stays queryable during a reload, and the network fetch no longer holds the cache lock. Delta catch-ups are streamed the same way.
 - Orphan staging tables left by an interrupted load (crash/backup mid-load) are dropped on startup.
 - Delta upserts compute `row_count` once per refresh instead of a full `COUNT(*)` after every batch (avoids O(rows×batches) work on large catch-ups).
 ---
 ## [1.4.0] - 2026-06-05
 ### Fixed
 - **`decimal.Decimal` (and `datetime`) binding error** — `NUMERIC`/`DECIMAL`/`MONEY` columns from SQL Server (pyodbc) arrive as `decimal.Decimal`, which `sqlite3` cannot bind, crashing the cache load with `type 'decimal.Decimal' is not supported`. Values are now coerced to sqlite-bindable types (`Decimal`→`str`, `datetime`/`date`/`time`→ISO, `uuid.UUID`→`str`, `bytearray`→`bytes`) at the cache boundary — on full load, on delta upsert, and for WHERE parameters. Coercion is local (no global `sqlite3.register_adapter`), so the host application's `sqlite3` behaviour is untouched. Cache columns are `TEXT`, so the conversion is lossless and exact (no rounding).
 ### Added
 - **Incremental (delta) refresh** — `CachingEngine(engine, delta={...})` with `DeltaConfig(change_column, key_columns)`. Delta-tracked tables are kept in sync by pulling only changed rows (`WHERE change_column >= watermark`) and upserting them by key, instead of full reloads.
  - Data-driven high-watermark = `max(change_column)` cached, persisted in `cache.db`; `>=` overlap + idempotent upsert so no row is missed and boundary rows are harmlessly re-read.
  - Catch-up on startup (since last shutdown) and a background thread refreshing every `SQLMEM_REFRESH_INTERVAL` seconds (default 300); `engine.refresh()` triggers a pull on demand.
  - Primary key is auto-discovered from the source DB (`inspect(engine).get_pk_constraint`) when `key_columns` is omitted; required explicitly for views (raises `ValueError`).
 - **Per-table TTL (time-based refresh)** — `CachingEngine(engine, ttl={"VW_X": 300})` for tables with no change column that can't be delta-synced. The cached copy is guaranteed never older than the TTL: a query touching an expired table triggers a full reload before it is answered (read-time guarantee), and the background thread proactively reloads expired tables. TTL age uses the persisted `last_refresh_at`, so the bound holds across restarts. A table in both `delta` and `ttl` raises `ValueError`.
 - `DeltaConfig` exported from the public API.
 - `engine.reset()` — wipes the whole cache (RAM + `cache.db`) for a clean rebuild after structural source changes.
 - `SQLMEM_REFRESH_INTERVAL` env var (default `300`) — background refresh tick for delta pulls and proactive TTL reloads.
 ### Changed
 - `pyproject.toml` — bumped version to `1.4.0`
 - `cache.py` — schema version bumped to `3`; `_sqlmem_tables` gained a `last_synced_at` watermark column. New methods: `execute_in_memory` (lock-serialized read), `get_table_columns`, `create_unique_index`, `get/set_last_synced_at`, `max_value`, `upsert_rows`, `seconds_since_refresh`, `reset`. Existing on-disk caches are discarded and rebuilt on load.
 - `executor.py` — delta-tracked tables augment their column set with key/change columns (unique key index + initial watermark); TTL-tracked tables full-reload at read time when expired; in-memory reads go through the cache lock.
 ---
 ## [1.2.0] - 2026-06-04
 ### Added
 - **Parametrized queries (R1)** — `execute(sql, params)` accepts positional (`?` tuple/list) and named (`:name` dict) parameters; passed straight to SQLite during in-memory filtering. Cache loads still fetch the full table (parameters are not applied to source fetches).
 - **JOIN support (R2)** — multi-table SELECTs are parsed into per-table column sets; each table is cached independently and the JOIN runs in the in-memory SQLite. Columns in a multi-table query must be qualified by table or alias.
 - **`SELECT *` support (R3)** — wildcard (and `alias.*`) queries discover all columns from the source DB, cache the whole table, and mark it `is_full` so later column queries are guaranteed cache hits without re-fetch.
 - **Three-part table names (R4)** — `[catalog].[schema].[table]` is parsed to its base name for caching; the in-memory query is rewritten to strip catalog/schema prefixes so it runs under SQLite.
 - `SQLMEM_SQL_DIALECT` env var (default `tsql`) — sqlglot dialect used to parse incoming SQL; T-SQL also accepts ANSI SQL and MSSQL bracket quoting.
 - `CacheManager.discover_columns()` and `CacheManager.is_table_full()`; `load_table()` gained a `full` flag.
 ### Changed
 - `pyproject.toml` — bumped version to `1.2.0`
 - `parser.py` — `ParsedQuery.table: str` replaced by `tables: list[str]` plus `columns_by_table`, `sqlite_sql`, `params`, and `wildcard_tables`; SQL is parsed with the configured dialect and rendered to SQLite for execution.
 - `executor.py` — loads each referenced table independently and applies query parameters during in-memory execution.
 - `cache.py` — schema version bumped to `2`; `_sqlmem_tables` gained an `is_full` column (existing on-disk caches are discarded and rebuilt on load).
 ---
 ## [1.1.0] - 2026-06-03
 ### Added
 - `Stats` and `TableStats` frozen dataclasses — snapshot of runtime cache statistics (hit/miss/refetch counts, per-table row count, columns, last refresh timestamp)
 - `StatsCollector` — internal thread-safe counter; increments on every cache hit, miss, and re-fetch
 - `engine.stats` property — returns a `Stats` snapshot at any point in time
 - `Stats` and `TableStats` exported from the public API
 ### Changed
 - `pyproject.toml` — bumped version to `1.1.0`
 ---
 ## [1.0.0] - 2026-06-03
 ### Changed
 - `pyproject.toml` — bumped version to `1.0.0`
 ---
 ## [0.4.0] - 2026-06-03
 ### Added
@@ -1,26 +1,70 @@
 # SQLmem
-Transparent in-memory cache layer between SQLAlchemy and your database. Drop it in front of any SQLAlchemy engine — SELECT queries are served from a fast in-memory SQLite cache, writes pass through unchanged.
+Transparent in-memory cache layer between SQLAlchemy and your database. Drop it in front of any SQLAlchemy engine — `SELECT` queries are served from a fast in-memory SQLite cache, writes are rejected (read-only cache).
 ## Goals
 SQLmem sits **between your application and the database** and behaves like a normal SQLAlchemy connection. It transparently:
 1. **Intercepts every query** that passes through it and learns, from the SQL itself, **which tables and which columns** the application actually uses.
 2. **Holds exactly those tables/columns locally in SQLite** — primarily in **RAM**, secondarily persisted to **disk** (`cache.db`) at regular intervals and on shutdown.
 3. **Serves repeated queries from RAM** with no database round-trip.
 4. **Stays in sync incrementally** (see [Incremental refresh](#incremental-delta-refresh)): for large tables you declare a *change-timestamp* column, and SQLmem only re-fetches rows that changed in the last few minutes (or since the last shutdown) instead of reloading tens of millions of rows on every start.
 The application keeps calling SQL as usual — the cache is an implementation detail behind the interface.
 ## How it works
-```
+```mermaid
-Application (SQLAlchemy)
+flowchart TB
-        │
+    App["Application (SQLAlchemy code)"]
-        ▼
+    DB[("Source database")]
-  [ SQLmem Proxy ]
+
-  ┌──────────────────────────────┐
+    subgraph SM["SQLmem - transparent cache layer"]
-  │  SQL Parser                  │  → detects SELECT vs. write
+        direction TB
-  │  Column Registry             │  → tracks which columns are cached per table
+        P["SQL Parser (sqlglot)<br/>detect SELECT vs write<br/>extract tables + columns"]
-  │  Cache Manager (SQLite RAM)  │  → stores data in memory
+        R["Column Registry<br/>tracks tables + columns in cache"]
-  │  Query Executor              │  → cache hit / miss logic
+        QE["Query Executor<br/>cache hit / miss / refetch"]
-  └──────────────────────────────┘
+        MEM[("In-memory SQLite - PRIMARY")]
-        │
+        DISK[("cache.db on disk - SECONDARY")]
-        ▼
+        P --> R --> QE --> MEM
-  Database (via original SQLAlchemy engine)
+        MEM -->|"backup every N s + on shutdown"| DISK
        DISK -->|"load on startup"| MEM
    end
    App -->|"execute(sql, params)"| P
    QE -->|"cache miss / delta refresh only"| DB
    DB -->|"rows"| MEM
    MEM -->|"list of dicts"| App
 ```
-On the first SELECT for a table, SQLmem fetches the required rows from the database and stores them in an in-memory SQLite instance. Subsequent queries for the same columns hit the in-memory cache with no database round-trip. When a query requests a column not yet in cache, SQLmem re-fetches the table with the expanded column set.
+On the first `SELECT` touching a table, SQLmem fetches the required rows from the database and stores them in the in-memory SQLite. Subsequent queries for the same columns hit RAM with no database round-trip. When a query requests a column not yet cached, SQLmem re-fetches the table with the expanded column set. Parametrized queries, JOINs and `SELECT *` are all supported; each table in a JOIN is cached independently and the JOIN runs inside the in-memory SQLite.
 ### Query lifecycle
 ```mermaid
 sequenceDiagram
    participant App
    participant SQLmem
    participant Mem as In-memory SQLite
    participant DB as Source DB
    App->>SQLmem: execute(SELECT a, b FROM t WHERE id = ?, params)
    SQLmem->>SQLmem: parse -> table = t, columns = {a, b, id}
    alt columns already cached
        SQLmem->>Mem: run query in RAM (with params)
        Mem-->>SQLmem: rows
    else cache miss or new column
        SQLmem->>DB: SELECT a, b, id FROM t   (whole table, no WHERE)
        DB-->>SQLmem: rows
        SQLmem->>Mem: store / expand table
        SQLmem->>Mem: run query in RAM (with params)
        Mem-->>SQLmem: rows
    end
    SQLmem-->>App: list[dict]
 ```
 Note: query **parameters are applied only to the in-memory query**, never to the source fetch — a cache load always pulls the full table so the cache can answer any later `WHERE` on those columns.
 ## Installation
@@ -36,7 +80,7 @@ Requires Python 3.14.
 ```python
 from sqlmem import CachingEngine
-from sqlalchemy import create_engine, text
+from sqlalchemy import create_engine
 base_engine = create_engine("postgresql://user:pass@host/db")
 engine = CachingEngine(base_engine)
@@ -45,9 +89,25 @@ engine = CachingEngine(base_engine)
 results = engine.execute("SELECT id, name FROM users WHERE status = 'active'")
 for row in results:
    print(row["id"], row["name"])
 # Positional parameters (?):
 engine.execute("SELECT id, name FROM users WHERE id = ?", ("42",))
 # Named parameters (:name):
 engine.execute("SELECT id, name FROM users WHERE id = :id", {"id": "42"})
 # JOINs — each table is cached independently:
 engine.execute(
    "SELECT u.name, o.total FROM users u "
    "JOIN orders o ON o.user_id = u.id WHERE u.id = ?",
    ("42",),
 )
 # SELECT * — loads and caches the whole table:
 engine.execute("SELECT * FROM users")
 ```
-`execute()` returns a list of dicts. Results are compatible with standard iteration patterns.
+`execute()` returns a list of dicts. Parameters are passed straight through to SQLite, so positional (`?`) and named (`:name`) styles both work.
 ## Cache behaviour
@@ -57,29 +117,254 @@ for row in results:
 Query 1: SELECT a, b FROM orders   → cache miss → fetch orders(a, b) from DB
 Query 2: SELECT a, d FROM orders   → new column d → re-fetch orders(a, b, d)
 Query 3: SELECT b FROM orders      → cache hit, no DB query
-Query 4: SELECT * FROM orders      → UnsupportedQueryError (wildcard not supported)
+Query 4: SELECT * FROM orders      → fetches all columns, marks the table fully cached
-Query 5: SELECT a FROM orders JOIN … → UnsupportedQueryError (JOIN not supported)
+Query 5: SELECT a FROM orders      → cache hit (table already full)
 ```
 **`SELECT *`** loads every column and marks the table as fully cached, so any later column query is a guaranteed cache hit with no re-fetch.
 **Writes are blocked** — INSERT, UPDATE, and DELETE raise `ReadOnlyError`. SQLmem is a read-only cache.
-## Persistence
+## Incremental (delta) refresh
-The in-memory cache is optionally persisted to `cache.db` on disk:
+Reloading a table with tens of millions of rows on every startup is unacceptable. To avoid it, SQLmem keeps the cache in sync by pulling **only changed rows**. For each delta-tracked table you declare its **last-change timestamp** column and the **key column(s)** that identify a row:
 - **On startup**: if `cache.db` exists, it is loaded into memory.
 - **Hourly**: a background thread writes a snapshot to disk.
 - **On shutdown**: a final flush via `atexit` and SIGTERM handler.
 Schema version is checked on load — if it does not match, the stale file is discarded and the cache is rebuilt from the database.
 ## Manual cache invalidation
 ```python
-engine.invalidate("orders")   # drops the table from cache; next query re-fetches from DB
+from sqlmem import CachingEngine, DeltaConfig
 engine = CachingEngine(
    base_engine,
    delta={
        "VW_P_PRATVALUES": DeltaConfig(
            change_column="LAST_CHANGE_DATE",   # required — the row's change timestamp
            key_columns=["PRODUCT_PRODUCTNR"],  # optional for base tables (auto-discovered)
        ),
    },
 )
 ```
 **What you must configure, and what is automatic:**
 | Item | Source |
 |---|---|
 | which **tables / columns** to cache | **automatic** — learned from the queries that pass through |
 | `change_column` (timestamp) | **manual, always** — its meaning can't be inferred from the column type<sup>*</sup> |
 | `key_columns` (primary key) | **auto-discovered** for real tables (`inspect(engine).get_pk_constraint`); **manual** for views, which carry no key in the DB catalog |
 <sup>*</sup> The one exception is a true MSSQL `rowversion`/`timestamp`-typed column, which is unique per table and auto-maintained — that could be detected automatically. A plain `DATETIME` like `LAST_CHANGE_DATE` cannot.
 If `key_columns` is omitted, SQLmem tries to read the primary key from the source DB on startup and raises a clear error if it can't (e.g. for a view) so you can supply it explicitly.
 ### How sync works
 The boundary of "what changed since last time" is a **data-driven watermark**, not a wall-clock window. SQLmem persists, per delta-tracked table, `last_synced_at` = the **maximum `change_column` value** actually present in the cache after the previous sync (stored in `cache.db`, so it survives restarts). The next sync pulls `WHERE change_column >= last_synced_at`.
 Why a watermark and not `now − 5 min`:
 - **No clock dependency** — it compares DB values to DB values, so app-server vs database clock skew is irrelevant.
 - **Survives downtime for free** — after hours offline, `>= watermark` pulls *everything* since then; "catch up since last shutdown" needs no special case.
 - **Never misses late commits** — a wall-clock window can drop a row whose timestamp falls outside the window by the time it commits.
 The filter is `>=` (not `>`) so rows sharing the exact boundary timestamp are re-read; combined with **idempotent upsert by `key_columns`**, re-reading a handful of boundary rows each tick is harmless (they overwrite themselves), and no row is ever skipped. The 5-minute interval is only the **polling cadence**, never the filter boundary.
 ```mermaid
 sequenceDiagram
    participant Trigger as Startup / every 5 min
    participant SQLmem
    participant Mem as In-memory SQLite
    participant DB as Source DB
    Trigger->>SQLmem: refresh delta-tracked tables
    SQLmem->>Mem: read last_synced_at for table
    SQLmem->>DB: SELECT * FROM t WHERE LAST_CHANGE_DATE >= last_synced_at
    DB-->>SQLmem: only rows changed since the watermark
    SQLmem->>Mem: upsert rows by key_columns (INSERT OR REPLACE)
    SQLmem->>Mem: last_synced_at = max(LAST_CHANGE_DATE)
 ```
 - **First use** of a delta table → full load; the watermark is set to the table's current `max(change_column)`.
 - **On startup** → for each delta table restored from disk, a single catch-up query pulls everything changed **since the last shutdown** and upserts it, bringing the cache back in sync without a full reload.
 - **While running** → a background thread repeats the delta pull every `SQLMEM_REFRESH_INTERVAL` seconds (default 5 minutes), so the cache trails the source DB by at most that interval.
 - Tables **without** a `DeltaConfig` keep the default behaviour: full load on miss, never auto-refreshed — unless they are given a [TTL](#time-based-refresh-tables-without-a-change-column).
 ### Requirements and limits of delta sync
 - The `change_column` must be **set by the source DB on every insert/update** and be non-decreasing (e.g. a `DATETIME`/`rowversion`/`timestamp` maintained by a trigger or the application).
 - `key_columns` must uniquely identify a row — they are used to upsert changed rows in place.
 - **Updates, including "deletes by nulling"** (a row that keeps its identity but has values cleared), are handled automatically: the change timestamp bumps, the row is re-pulled and overwritten in place.
 - **Structural changes are not covered by delta sync** — adding/removing attributes, or clearing values *without* bumping `change_column`, won't be picked up. For those, force a clean reload with [`engine.reset()`](#manual-cache-control) (or `invalidate()` for a single table).
 - Hard `DELETE`s of whole rows are not detected by a change-timestamp; this workload doesn't delete rows, but if yours does, use a soft-delete flag column or `reset()`.
 ## Time-based refresh (tables without a change column)
 Some tables can't be delta-synced because they have no change timestamp. For those you can set a **TTL** (max age in seconds): SQLmem keeps serving from cache and guarantees the cached copy is **never older than the TTL** by doing a full reload when it expires.
 ```python
 engine = CachingEngine(
    base_engine,
    ttl={
        "VW_LOOKUP_CODES": 300,   # full-reload if the cache is older than 5 minutes
        "VW_SETTINGS": 3600,
    },
 )
 ```
 - **Read-time guarantee** — when a query touches a TTL table whose cache is older than its TTL, the table is fully reloaded *before* the query is answered, so a stale copy is never returned.
 - **Proactive** — the background thread also full-reloads expired TTL tables every `SQLMEM_REFRESH_INTERVAL` seconds, keeping them warm so reads usually don't pay the reload latency.
 - TTL age is measured from `last_refresh_at`, which is persisted in `cache.db`, so the guarantee holds across restarts (an expired table is reloaded on first use after start).
 - A table may be in **either** `delta` **or** `ttl`, not both (delta already keeps it fresh) — supplying both raises `ValueError`.
 ```python
 engine.refresh()   # also reloads any expired TTL tables on demand
 ```
 ## Secondary indexes
 To accelerate lookups, you can declare **secondary indexes** per table — they are created on the in-memory SQLite copy so `WHERE`/`JOIN` filters on those columns run as indexed searches instead of full scans:
 ```python
 engine = CachingEngine(
    base_engine,
    indexes={
        "VW_P_PRATVALUES": ["PRODUCT_PRODUCTNR"],   # single-column index
        "VW_ELEMENTS": [["ELEMENT_ID", "ELEMENTVARIANT_ID"], "ELEMENTVARIANT_NAME"],
    },
 )
 ```
 Each value is a list of index definitions: a string is a single-column index, a nested list is a composite index.
 - **Index columns are pulled into the cache automatically** (like delta key columns), so the index exists from the first load even if your queries don't select those columns.
 - Indexes are **recreated after every (re)load** — full loads, TTL reloads, and `invalidate()` + re-fetch all rebuild them — so they're always present, and they persist in `cache.db` across restarts.
 - Delta-tracked tables already get a unique index on their key columns; secondary indexes are independent and can be combined with `delta` or `ttl`.
 ## Persistence
 By default the cache lives in an **in-memory SQLite** and is persisted to `cache.db` on disk:
 - **On startup**: if `cache.db` exists, it is loaded into memory.
 - **Periodically**: a background thread writes a snapshot to disk every `SQLMEM_BACKUP_INTERVAL` seconds.
 - **On shutdown**: a final flush via `atexit` and SIGTERM handler.
 The schema version is checked on load — if it does not match, the stale file is discarded and the cache is rebuilt from the database.
 ### Disk-backed cache (no RAM copy)
 Set `in_memory=False` (or `SQLMEM_IN_MEMORY=false`) to query the on-disk `cache.db` **directly** instead of mirroring it in RAM:
 ```python
 engine = CachingEngine(base_engine, in_memory=False)
 ```
 - The cache can **exceed available memory** — nothing is held in RAM beyond SQLite's page cache.
 - Every write **persists immediately** (WAL + `synchronous=NORMAL`), so there is no hourly backup thread, no load-into-memory step on startup, and no shutdown flush to lose.
 - **Reads run concurrently** — each thread reads through its own read-only WAL connection, so a slow `SELECT` doesn't block writers (loads/upserts) or other readers.
 - On open, a cache file with a mismatched schema version is wiped in place and rebuilt; `engine.reset()` drops the cached tables and `VACUUM`s the file (it does not delete the open file).
 The constructor argument wins over the env var; when `in_memory` is omitted it falls back to `SQLMEM_IN_MEMORY`.
 #### Tuning the SQLite layer (`pragmas=`)
 For a large disk-backed cache, pass SQLite PRAGMAs to tune the read path and on-disk layout without bypassing SQLmem:
 ```python
 engine = CachingEngine(
    base_engine,
    in_memory=False,
    pragmas={
        "mmap_size": 32 * 1024**3,   # map the DB into the address space (32 GB)
        "cache_size": -262144,       # 256 MB page cache (negative = KiB)
        "temp_store": 2,             # ORDER BY / GROUP BY scratch in RAM
        "page_size": 8192,           # larger pages → fewer reads on range scans
        "auto_vacuum": "INCREMENTAL",# reclaim free pages with vacuum() (see below)
    },
 )
 ```
 - Every entry is applied as `PRAGMA <key> = <value>` when the cache connection opens. **Unknown or inapplicable pragmas are silently ignored** by SQLite, so a bad value degrades gracefully instead of crashing startup.
 - **`page_size` and `auto_vacuum` are layout pragmas** — they only take effect on a *fresh* file (set before the first table). On an existing cache, `page_size` is ignored with a one-time warning; use [`hard_reset()`](#manual-cache-control) to rebuild the file with the new value.
 #### INTEGER datetime columns (`datetime_columns=`)
 A pure datetime column stored as an ISO `TEXT` string costs ~28 bytes per row and compares by string collation. For a large table you can store named datetime columns as **INTEGER microseconds since the Unix epoch** instead — 8 bytes, native integer comparison:
 ```python
 engine = CachingEngine(
    base_engine,
    delta={"VW_P_PRATVALUES": DeltaConfig("CHANGE_DATE", ["PRATVALUE_ID"])},
    datetime_columns={"VW_P_PRATVALUES": ["CHANGE_DATE"]},
 )
 ```
 - **Opt-in per column.** Only the columns you name change; everything else keeps the default lossless `TEXT` storage.
 - ⚠️ **It changes the output contract for those columns** — `execute()` returns them as `int` (µs since epoch), not ISO strings, and a `WHERE` on them must compare against integer µs. Don't list a column your callers read as a string.
 - The delta watermark is handled transparently: it is persisted as the integer and bound back to a real `datetime` for the source query, so incremental refresh keeps working.
 - ⚠️ This is a **breaking on-disk change** (`SCHEMA_VERSION` 4): an existing cache is wiped and reloaded on first start after enabling it — schedule a maintenance window for a large reload.
 ## Manual cache control
 ```python
 engine.invalidate("orders")   # drop one table from cache; next query re-fetches it from DB
 engine.reset()                # wipe the whole cache (RAM + cache.db) — full clean slate
 engine.hard_reset()           # disk mode: delete the file and reopen with current pragmas/page_size
 engine.vacuum()               # disk mode: incremental VACUUM (reclaim free pages from delta churn)
 engine.refresh()              # pull deltas for all delta-tracked tables now
 engine.close()                # flush to disk and shut down background thread
 ```
 Use `reset()` after a **structural change** in the source (columns added/removed, values cleared in bulk without bumping the change timestamp) so the cache rebuilds from scratch. `invalidate(table)` is the targeted version for a single table.
 `hard_reset()` goes further than `reset()` in disk mode: it closes every connection, deletes `cache.db` (and its `-wal`/`-shm` sidecars) and reopens from scratch — the only way to change a baked-in `page_size`/`auto_vacuum`. In memory mode it falls back to `reset()`.
 `vacuum()` reclaims free pages left behind by delta `INSERT OR REPLACE` churn. Incremental (the default) is cheap and non-blocking but needs `auto_vacuum=INCREMENTAL`; `vacuum(incremental=False)` runs a full VACUUM that rewrites the file (~2× disk, blocks readers) — schedule it in a maintenance window. Both are no-ops in memory mode.
 ## Runtime statistics
 ```python
 stats = engine.stats          # Stats snapshot
 print(stats.hits, stats.misses, stats.refetches, stats.errors)
 for name, t in stats.tables.items():
    print(name, t.rows, t.state, t.tracking, t.last_upsert, t.last_refresh)
    if t.consecutive_failures:
        print(f"  {name} failing ×{t.consecutive_failures}: {t.last_error} ({t.last_error_at})")
 ```
 `Stats.errors` is the total number of load/refresh failures since start. Each `TableStats` also carries `last_error`, `last_error_at` and `consecutive_failures` (reset to 0 on the next success) — so a delta that fails *before* streaming (which otherwise leaves `state` looking `ready`) is still visible, and the table is marked `error`.
 Two timestamps distinguish *data freshness* from *liveness*:
 | field | meaning |
 |---|---|
 | `last_upsert` | wall-clock (UTC) of the last actual **data write** — full load or a delta cycle that wrote rows. Persisted, survives restarts. Answers *"when did the data last change?"* |
 | `last_refresh` | wall-clock (UTC) of the last time a **refresh cycle ran** for the table — bumped **even when it wrote nothing**. In-memory per process (`None` until the first cycle runs after start). Answers *"is the refresh loop alive?"* |
 A delta table that runs every cycle but finds no new rows keeps `last_refresh` ticking while `last_upsert` stays put — that's healthy, not stuck. (Both are UTC ISO strings; the default log timestamps are local time, so expect an offset.)
 Each `TableStats` reports a live processing **state** and how the table is kept fresh (**tracking**):
 | `state` | Meaning |
 |---|---|
 | `loading` | a full load is in progress |
 | `refreshing` | an incremental (delta) refresh is in progress |
 | `ready` | cached and idle (up to date) |
 | `stale` | a TTL table whose cache has expired; reloads on next access |
 | `error` | the last load failed |
 | `tracking` | Meaning |
 |---|---|
 | `delta` | kept in sync incrementally via a change column |
 | `ttl` | full-reloaded when older than its TTL |
 | `static` | loaded on demand, never auto-refreshed |
 ## Memory and very large tables
 By default the cache is **in-memory SQLite**, so a cached table lives in RAM — it must fit in available memory. To keep huge tables manageable:
 - **Use [disk-backed mode](#disk-backed-cache-no-ram-copy)** (`in_memory=False`) when the working set simply doesn't fit in RAM — queries then run against `cache.db` on disk instead of a memory copy.
 - **Loads are streamed in batches** (`SQLMEM_FETCH_BATCH` rows at a time, default 10 000) into a staging table and swapped in atomically. A multi-million-row table never gets fully materialized in Python at once, so the load doesn't spike memory or crash the process, and readers keep seeing the previous copy until the swap completes.
 - Use **[delta refresh](#incremental-delta-refresh)** for large tables that have a change column — after the first load only changed rows are pulled, so restarts and refreshes don't re-read the whole table.
 - A **single query that returns a huge result set** (e.g. `SELECT *` over a multi-million-row cached table) still materializes that result as a list of dicts; bound it with a `WHERE`/`LIMIT` rather than selecting everything.
 ## Configuration
 Set via environment variables or a `.env` file:
@@ -88,14 +373,37 @@ Set via environment variables or a `.env` file:
 |---|---|---|
 | `SQLMEM_DEBUG` | `false` | `true` enables DEBUG-level logging |
 | `SQLMEM_CACHE_DB` | `cache.db` | Path to the on-disk persistence file |
-| `SQLMEM_BACKUP_INTERVAL` | `3600` | Backup interval in seconds |
+| `SQLMEM_IN_MEMORY` | `true` | `false` queries `cache.db` on disk directly (no RAM copy); overridden by the `in_memory` constructor arg |
 | `SQLMEM_BACKUP_INTERVAL` | `3600` | Disk backup interval in seconds (in-memory mode only) |
 | `SQLMEM_SQL_DIALECT` | `tsql` | sqlglot dialect used to parse incoming SQL (e.g. `tsql`, `postgres`, `mysql`) |
 | `SQLMEM_REFRESH_INTERVAL` | `300` | background refresh tick (seconds) — delta pulls and proactive TTL reloads |
 | `SQLMEM_FETCH_BATCH` | `10000` | rows fetched per batch when loading a table — caps peak memory for huge tables |
 Most of these can also be passed **per engine** to the constructor, overriding the env default — handy for running two engines (with separate cache files) in one process, and for tests:
 ```python
 engine = CachingEngine(
    base_engine,
    cache_db_path="orders_cache.db",   # SQLMEM_CACHE_DB
    in_memory=False,                   # SQLMEM_IN_MEMORY
    backup_interval=3600,              # SQLMEM_BACKUP_INTERVAL
    refresh_interval=300,              # SQLMEM_REFRESH_INTERVAL
    fetch_batch=10000,                 # SQLMEM_FETCH_BATCH
    dialect="tsql",                    # SQLMEM_SQL_DIALECT
    pragmas={"mmap_size": 32 * 1024**3, "page_size": 8192},  # disk-mode SQLite tuning
    datetime_columns={"orders": ["created_at"]},  # store these as INTEGER µs (opt-in)
    blocking_startup_refresh=False,    # block startup until caught up? (default: no)
 )
 ```
 By default the **startup catch-up** (delta pulls and TTL reloads for tables restored from disk) runs on the background thread so it never blocks application startup; the cache may serve slightly stale data until the first refresh completes. Set `blocking_startup_refresh=True` to catch up synchronously before the engine starts serving.
 ## Exceptions
 | Exception | When raised |
 |---|---|
 | `ReadOnlyError` | INSERT, UPDATE, or DELETE statement |
-| `UnsupportedQueryError` | `SELECT *` or any JOIN |
+| `UnsupportedQueryError` | non-SELECT statement, `SELECT` without `FROM`, or an unqualified column in a multi-table query |
 ```python
 from sqlmem import ReadOnlyError, UnsupportedQueryError
@@ -103,15 +411,34 @@ from sqlmem import ReadOnlyError, UnsupportedQueryError
 ## Logging
-SQLmem uses [loguru](https://github.com/Delgan/loguru). Set `SQLMEM_DEBUG=true` for verbose output (every query, cache hit/miss, backup events). Default level is INFO.
+SQLmem is silent by default. Call `add_sink()` to opt in:
 ```python
 import sys
 from sqlmem import add_sink
 add_sink(sys.stderr)                      # INFO by default
 add_sink(sys.stderr, level="DEBUG")       # verbose: every query, cache hit/miss, backup
 add_sink("sqlmem.log", rotation="10 MB")  # to a file
 ```
 Set `SQLMEM_DEBUG=true` in `.env` to make the default level DEBUG when no explicit `level` is passed to `add_sink()`.
 ## Limitations
- `SELECT *` and JOIN queries are not supported.
+- In a multi-table (JOIN) query, every column must be qualified with its table or alias; unqualified columns raise `UnsupportedQueryError`.
 - Tables are keyed by their base name — two tables with the same name in different schemas share one cache entry.
 - No distributed cache backend (Redis etc.).
- No transactional consistency guarantees.
+- No transactional consistency guarantees; the cache trails the source DB.
 - Write operations (INSERT/UPDATE/DELETE) are always blocked.
 ## Roadmap
 - [x] **Incremental (delta) refresh** via per-table change-timestamp + key columns (see above) — the key feature for large tables.
 - [x] **Primary-key auto-discovery** from the source DB (`inspect(engine).get_pk_constraint`) so `key_columns` is only needed for views.
 - [x] **`engine.reset()`** — wipe RAM + `cache.db` for a clean rebuild after structural changes.
 - [x] **Per-table TTL** (time-to-live) — bounded-staleness full refresh for tables without a change column.
 ## Dependencies
 | Layer | Library |
@@ -47,23 +47,35 @@ with engine.connect() as conn:
 ## Cache backend
 Dva režimy (volí se `CachingEngine(engine, in_memory=...)` nebo env `SQLMEM_IN_MEMORY`):
 **In-memory (výchozí, `in_memory=True`)**
 - **SQLite in-memory** jako primární úložiště — veškeré dotazy běží v RAM.
 - **Persistence na disk** (`cache.db`) ve třech situacích:
  - **Při startu**: pokud soubor existuje, načte se do paměti (`ATTACH` + kopie).
  - **Periodicky každou hodinu**: snapshot in-memory SQLite se zapíše na disk (backup API).
  - **Při vypnutí**: finální zápis na disk před ukončením (signal handler + context manager).
- Celé tabulky se při cache miss načtou z databáze a drží v paměti.
+
 **Disk-backed (`in_memory=False`)**
 - Dotazy běží přímo nad on-disk souborem `cache.db` — **žádná kopie v RAM**, cache může přesáhnout dostupnou paměť.
 - Každý zápis se rovnou ukládá na disk (WAL + `synchronous=NORMAL`); odpadá hodinový backup thread i načítání do paměti při startu.
 - Při otevření se cache s nesedícím `schema_version` zahodí a postaví znovu; `engine.reset()` smaže tabulky a provede `VACUUM` (soubor neodlinkuje).
 Celé tabulky se při cache miss načtou z databáze (v obou režimech).
 ---
 ## Komponenty
 ### 1. SQL Parser
- Detekuje typ dotazu (SELECT / zápis).
+- Detekuje typ dotazu (SELECT / zápis); zápisy vyhodí `ReadOnlyError`.
- Extrahuje názvy tabulek z FROM a JOIN klauzulí.
+- Extrahuje názvy tabulek z FROM a JOIN klauzulí (podpora více tabulek).
- Extrahuje seznam požadovaných sloupců.
+- Mapuje požadované sloupce na tabulky přes aliasy (`columns_by_table`).
- Detekuje `SELECT *` (wildcard) a JOIN — vyhodí `UnsupportedQueryError`.
+- Detekuje `SELECT *` a `alias.*` → tabulka se načte celá (`wildcard_tables`).
- Rozhoduje, zda je dotaz obsloužitelný z cache.
+- Parsuje přes dialekt `SQLMEM_SQL_DIALECT` (default `tsql`) a renderuje in-memory dotaz do SQLite (stripuje catalog/schema prefixy).
 - Parametry (`?` / `:name`) předává beze změny do in-memory SQLite.
 ### 2. Column Registry
@@ -71,12 +83,12 @@ Modul se **za běhu učí**, jaké sloupce z každé tabulky aplikace potřebuje
 **Logika při každém příchozím dotazu:**
-1. Parser detekuje `SELECT *` nebo JOIN → vyhodí `UnsupportedQueryError` (není implementováno).
+1. Parser extrahuje `(tabulka, sloupce)` pro každou tabulku v dotazu (i přes JOIN).
-2. Parser extrahuje `(tabulka, sloupce)` z dotazu.
+2. Registry provede **union** nově požadovaných sloupců s již známými.
-3. Registry provede **union** nově požadovaných sloupců s již známými.
+3. Cache Manager zkontroluje, zda cache pro danou tabulku obsahuje všechny potřebné sloupce:
 4. Cache Manager zkontroluje, zda cache pro danou tabulku obsahuje všechny potřebné sloupce:
   - **Ano** → dotaz jde přímo do SQLite RAM (cache hit).
   - **Ne** → re-fetch tabulky z DB s rozšířenou sadou sloupců → přepíše cache → dotaz do SQLite RAM.
 4. `SELECT *` načte celou tabulku a označí ji jako `is_full` → další dotazy na libovolný sloupec jsou cache hit.
 **Příklad akumulace sloupců:**
@@ -84,8 +96,8 @@ Modul se **za běhu učí**, jaké sloupce z každé tabulky aplikace potřebuje
 Dotaz 1: SELECT A, B FROM T3   → Registry: T3 = {A, B}   → fetch T3(A,B) z DB
 Dotaz 2: SELECT A, D FROM T3   → Registry: T3 = {A, B, D} → re-fetch T3(A,B,D) z DB
 Dotaz 3: SELECT B FROM T3      → cache hit, žádný DB dotaz
-Dotaz 4: SELECT * FROM T3      → UnsupportedQueryError (wildcard není podporován)
+Dotaz 4: SELECT * FROM T3      → full load všech sloupců, tabulka označena is_full
-Dotaz 5: SELECT A FROM T3 JOIN T4 ... → UnsupportedQueryError (JOIN není podporován)
+Dotaz 5: SELECT A FROM T3 JOIN T4 ON … → každá tabulka cachována zvlášť, JOIN běží v RAM
 ```
 **Metadata tabulka `_sqlmem_columns`** (uložena v SQLite):
@@ -184,10 +196,33 @@ SQLMEM_DEBUG=true   # DEBUG level — podrobný výpis každého dotazu, cache o
 ---
 ## Hotové funkce (dříve TODO)
 - [x] **Parametrizované dotazy**: `execute(sql, params)` — poziční `?` i pojmenované `:name`.
 - [x] **Podpora `SELECT *` (wildcard)**: Načte celou tabulku do cache, označí ji jako `is_full` — další dotazy na libovolný sloupec jsou vždy cache hit bez re-fetch.
 - [x] **Podpora JOIN**: Parser extrahuje sloupce z každé joinované tabulky zvlášť, Column Registry je sleduje nezávisle. Cache Manager zajistí, že všechny potřebné tabulky jsou v paměti před spuštěním dotazu.
 - [x] **Třídílné názvy tabulek**: `[catalog].[schema].[table]` se cachuje pod base name, in-memory dotaz prefix stripuje.
 - [x] **Inkrementální (delta) refresh**: per-tabulku `DeltaConfig(change_column, key_columns)` — sync jen změněných řádků přes datový watermark `max(change_column)` (`>=` + idempotentní upsert podle klíče), catch-up na startu + background thread (`SQLMEM_REFRESH_INTERVAL`, default 300 s). PK se auto-zjistí ze zdrojové DB, pro views nutno zadat ručně.
 - [x] **`engine.reset()`**: smaže celou cache (RAM + `cache.db`) pro čistý rebuild po strukturální změně.
 - [x] **Sekundární indexy**: `indexes={"VW_X": ["col", ["a","b"]]}` — indexy na in-memory cache pro zrychlení `WHERE`/`JOIN`; index-sloupce se auto-dotáhnou, indexy se obnoví po každém (re)loadu.
 - [x] **TTL na úrovni tabulky**: `ttl={"VW_X": 300}` — pro tabulky bez timestamp sloupce. Garantuje, že cache není starší než interval (full reload při čtení po expiraci + proaktivně na pozadí).
 - [x] **Disk-backed cache**: `in_memory=False` (nebo `SQLMEM_IN_MEMORY=false`) — dotazy běží přímo nad on-disk `cache.db` (WAL), bez kopie v RAM; cache může přesáhnout paměť, zápisy se rovnou persistují.
  - V disk módu čtení běží přes **per-thread read-only WAL connection** → souběžné čtení neblokuje zápisy ani ostatní čtenáře.
 - [x] **Chyby refresh/load ve `stats`**: `TableStats.last_error` / `last_error_at` / `consecutive_failures` + `Stats.errors`. Delta, který selže před streamem, označí tabulku jako `error` (dřív zůstával `ready`).
 - [x] **`last_upsert` vs `last_refresh`**: `last_upsert` = perzistovaný čas posledního zápisu dat (přežije restart); `last_refresh` = in-memory čas posledního běhu refresh cyklu (liveness — tiká i když cyklus nic nezapsal, `None` do prvního běhu). Prázdný delta cyklus posune `last_refresh`, ne `last_upsert`.
 - [x] **Per-engine konfigurace**: `CachingEngine(..., cache_db_path=, backup_interval=, refresh_interval=, fetch_batch=, dialect=)` — každý parametr defaultuje na env/config; dva enginy s vlastními cache soubory v jednom procesu.
 - [x] **Neblokující startup catch-up**: výchozí chování — startup catch-up (delta/TTL po restartu) běží na pozadí, neblokuje start aplikace. `blocking_startup_refresh=True` pro synchronní dohnání před servírováním.
 - [x] **Quoting identifikátorů**: názvy tabulek/sloupců se kvótují (SQLite `"x"` pro cache, dialekt zdroje — T-SQL `[x]` — pro source), takže rezervovaná slova i mezery fungují.
 - [x] **Lazy source connection**: `execute()` neotevírá spojení ke zdroji při cache hitu (neobsazuje pool slot).
 - [x] **Idempotentní `add_sink`**: opakované volání pro stejný sink je no-op (žádné duplicitní logy).
 - [x] **Ladění SQLite vrstvy (`pragmas=`)**: `CachingEngine(..., pragmas={...})` aplikuje libovolné PRAGMA na cache spojení (`mmap_size`, `cache_size`, `temp_store`, `page_size`, `auto_vacuum`). `page_size` a `auto_vacuum` jsou layout-pragmata — platí jen na čerstvém souboru (page_size na existující cache se ignoruje s warningem). Neznámá pragmata SQLite tiše ignoruje.
 - [x] **`hard_reset()`**: smaže on-disk soubor (+ WAL/SHM) a otevře nový s aktuálními pragmaty — na rozdíl od `reset()` umožní změnit `page_size`/`auto_vacuum`. Jen disk mód (v memory módu fallback na `reset()`).
 - [x] **`vacuum(incremental=, pages=)`**: údržbový VACUUM cache souboru — inkrementální (uvolní volné stránky po delta `INSERT OR REPLACE`, vyžaduje `auto_vacuum=INCREMENTAL`) nebo plný (přepíše soubor, jen v maintenance okně). V memory módu no-op.
 - [x] **Nativní INTEGER ukládání datetime sloupců (`datetime_columns=`)**: `datetime_columns={"VW_X": ["CHANGE_DATE"]}` — vyjmenované datetime sloupce se ukládají jako INTEGER µs-od-epochy místo ~28 B ISO TEXT (úspora místa + nativní celočíselné porovnání indexu). Opt-in per sloupec → mění výstupní kontrakt jen u zvolených sloupců (vrací int, ne ISO string). Breaking: `SCHEMA_VERSION` 3→4, cache se při upgrade smaže a načte znovu. Watermark se persistuje jako int a `_bind_watermark(epoch_us=True)` ho rekonstruuje zpět na `datetime` pro zdroj.
 ## TODO — budoucí funkce
- **Podpora `SELECT *` (wildcard)**: Načte celou tabulku do cache, označí ji jako `full` — další dotazy na libovolný sloupec jsou vždy cache hit bez re-fetch.
+- _(zatím žádné otevřené položky)_
 - **Podpora JOIN**: Parser extrahuje sloupce z každé joinované tabulky zvlášť, Column Registry je sleduje nezávisle. Cache Manager zajistí, že všechny potřebné tabulky jsou v paměti před spuštěním dotazu.
 ---
@@ -1,6 +1,6 @@
 [project]
 name = "sqlmem"
-version = "0.4.0"
+version = "1.12.0"
 description = ""
 authors = [
    {name = "jan.doubravsky@gmail.com"}
@@ -25,3 +25,11 @@ dev = [
    "ruff (>=0.15.15,<0.16.0)",
    "mypy (>=2.1.0,<3.0.0)"
 ]
 [tool.pytest.ini_options]
 filterwarnings = [
    # The SQLite test source binds the delta watermark as a real datetime via
    # sqlite3's legacy adapter (deprecated in 3.12). Production sources are
    # pyodbc, which binds datetimes natively, so this only affects the tests.
    "ignore:The default datetime adapter is deprecated:DeprecationWarning",
 ]
@@ -1,10 +1,13 @@
 from pathlib import Path
 from typing import Any
 from loguru import logger
 from .config import DEBUG
 from .delta import DeltaConfig
 from .engine import CachingEngine
 from .exceptions import ReadOnlyError, UnsupportedQueryError
 from .stats import Stats, TableStats
 _DEFAULT_FORMAT = (
    "<green>{time:YYYY-MM-DD HH:mm:ss}</green> | "
@@ -13,13 +16,25 @@ _DEFAULT_FORMAT = (
    "<level>{message}</level>"
 )
 # Sinks already registered, keyed by a stable identity, so a repeated call (e.g.
 # a double import) doesn't add a second handler and duplicate every log line.
 _added_sinks: dict[object, int] = {}
 def _sink_key(sink: Any) -> object:
    """A stable identity for *sink* so the same destination isn't added twice."""
    if isinstance(sink, (str, Path)):
        return ("path", str(Path(sink).resolve()))
    return ("obj", id(sink))
 def add_sink(sink: Any, *, level: str | None = None, **kwargs: Any) -> None:
-    """Route sqlmem log records to *sink*.
+    """Route sqlmem log records to *sink* (idempotent).
    Accepts any sink supported by loguru (file path, stream, callable, …).
    *level* defaults to ``DEBUG`` when ``SQLMEM_DEBUG=true``, otherwise ``INFO``.
-    Extra keyword arguments are forwarded to :func:`loguru.logger.add`.
+    Extra keyword arguments are forwarded to :func:`loguru.logger.add`. Calling it
    again for the same sink is a no-op, so a double import won't duplicate logs.
    Example::
@@ -29,9 +44,23 @@ def add_sink(sink: Any, *, level: str | None = None, **kwargs: Any) -> None:
        add_sink("sqlmem.log", rotation="10 MB")
    """
    logger.enable("sqlmem")
    key = _sink_key(sink)
    if key in _added_sinks:
        return
    kwargs.setdefault("format", _DEFAULT_FORMAT)
    kwargs.setdefault("colorize", True)
-    logger.add(sink, level=level or ("DEBUG" if DEBUG else "INFO"), filter="sqlmem", **kwargs)
+    handler_id = logger.add(
        sink, level=level or ("DEBUG" if DEBUG else "INFO"), filter="sqlmem", **kwargs
    )
    _added_sinks[key] = handler_id
-__all__ = ["CachingEngine", "ReadOnlyError", "UnsupportedQueryError", "add_sink"]
+__all__ = [
    "CachingEngine",
    "DeltaConfig",
    "ReadOnlyError",
    "UnsupportedQueryError",
    "Stats",
    "TableStats",
    "add_sink",
 ]
@@ -0,0 +1,65 @@
 """Coerce source-DB values into types ``sqlite3`` can bind.
 pyodbc returns ``NUMERIC``/``DECIMAL``/``MONEY`` as :class:`decimal.Decimal` and
 date/time columns as :mod:`datetime` objects, none of which ``sqlite3`` binds
 natively. Cache columns are ``TEXT``, so stringifying is lossless and consistent
 with how the data is stored. This is done **locally** — never via a global
 ``sqlite3.register_adapter`` — so the host application's ``sqlite3`` behaviour is
 left untouched.
 """
 import datetime
 import decimal
 import uuid
 from typing import Any
 Params = tuple | list | dict | None
 _EPOCH = datetime.datetime(1970, 1, 1, tzinfo=datetime.timezone.utc)
 def to_sqlite(value: Any) -> Any:
    if isinstance(value, decimal.Decimal):
        return str(value)
    if isinstance(value, (datetime.datetime, datetime.date, datetime.time)):
        return value.isoformat()
    if isinstance(value, uuid.UUID):
        return str(value)
    if isinstance(value, bytearray):
        return bytes(value)
    return value
 def to_sqlite_datetime(value: Any) -> int | None:
    """Store a datetime as INTEGER microseconds since the Unix epoch (UTC).
    Used for columns the caller marks via ``datetime_columns``: 8 bytes as an
    INTEGER instead of a ~28-byte ISO ``TEXT`` string, and integer comparison on
    the change column instead of string collation. ``None`` passes through; a
    naive datetime is treated as UTC. A non-datetime value is parsed from its ISO
    string form (so ``date``/ISO-``str`` inputs work too); anything unparseable
    becomes ``None``.
    """
    if value is None:
        return None
    if isinstance(value, datetime.datetime):
        if value.tzinfo is None:
            value = value.replace(tzinfo=datetime.timezone.utc)
        delta = value - _EPOCH  # exact integer arithmetic (no float rounding)
        return delta.days * 86_400_000_000 + delta.seconds * 1_000_000 + delta.microseconds
    try:
        return to_sqlite_datetime(datetime.datetime.fromisoformat(str(value)))
    except (TypeError, ValueError):
        return None
 def coerce_row(row: tuple) -> tuple:
    return tuple(to_sqlite(v) for v in row)
 def coerce_params(params: Params) -> tuple | dict | None:
    if params is None:
        return None
    if isinstance(params, dict):
        return {key: to_sqlite(val) for key, val in params.items()}
    return tuple(to_sqlite(val) for val in params)
@@ -0,0 +1,27 @@
 """SQL identifier quoting.
 Table and column names are interpolated into statements as raw strings, so a
 name with a space, a reserved word, or an embedded quote would break the query
 (and is a latent injection vector). These helpers quote identifiers safely. The
 in-memory cache is SQLite, so it uses double-quote style; the source DB is quoted
 in its configured dialect (e.g. T-SQL ``[brackets]``).
 """
 from collections.abc import Iterable
 from sqlglot import exp
 def quote(name: str) -> str:
    """Quote an identifier for the in-memory SQLite cache."""
    return '"' + name.replace('"', '""') + '"'
 def quote_list(names: Iterable[str]) -> str:
    """Comma-join SQLite-quoted identifiers."""
    return ", ".join(quote(n) for n in names)
 def quote_source(name: str, dialect: str) -> str:
    """Quote an identifier for the source DB in its dialect (e.g. T-SQL ``[x]``)."""
    return exp.to_identifier(name, quoted=True).sql(dialect=dialect)
@@ -2,37 +2,142 @@ import atexit
 import signal
 import sqlite3
 import threading
 from dataclasses import dataclass
 from datetime import datetime, timezone
 from pathlib import Path
 from loguru import logger
 import sqlmem._meta as _meta
 from ._coerce import coerce_params, coerce_row, to_sqlite, to_sqlite_datetime
 from ._sql import quote, quote_list, quote_source
 from .config import FETCH_BATCH_SIZE, SQL_DIALECT
 from .stats import TableState
-SCHEMA_VERSION = 1
+SCHEMA_VERSION = 4
@dataclass(frozen=True)
 class _Index:
    name: str
    columns: tuple[str, ...]
@dataclass(frozen=True)
 class TableError:
    """Most recent load/refresh failure for a table (see ``CacheManager.get_errors``)."""
    message: str
    at: str
    consecutive: int
 class CacheManager:
-    def __init__(self, db_path: Path, backup_interval: int) -> None:
+    def __init__(
        self,
        db_path: Path,
        backup_interval: int,
        in_memory: bool = True,
        dialect: str = SQL_DIALECT,
        fetch_batch: int = FETCH_BATCH_SIZE,
        pragmas: dict[str, str | int] | None = None,
        datetime_columns: dict[str, list[str]] | None = None,
    ) -> None:
        self._db_path = db_path
        self._backup_interval = backup_interval
-        self._mem_conn = sqlite3.connect(":memory:", check_same_thread=False)
+        self._in_memory = in_memory
-        self._lock = threading.Lock()
+        self._dialect = dialect              # source-DB dialect, for identifier quoting
        self._fetch_batch = fetch_batch      # rows fetched per source batch
        self._pragmas = dict(pragmas or {})  # extra read/layout PRAGMAs (disk mode)
        # table → columns stored as INTEGER µs-since-epoch instead of ISO TEXT
        self._datetime_columns = {t: list(c) for t, c in (datetime_columns or {}).items()}
        self._lock = threading.Lock()       # serializes connection access
        self._load_lock = threading.Lock()  # serializes full table loads
        self._states: dict[str, str] = {}   # table → live processing state
        self._errors: dict[str, TableError] = {}  # table → last load/refresh failure
        self._error_total = 0                # process-wide failure counter
        self._last_run: dict[str, str] = {}  # table → last refresh-cycle run (this process)
        self._index_defs: dict[str, list[_Index]] = {}  # table → secondary indexes
        self._read_local = threading.local()  # per-thread read conn (disk mode)
        self._read_conns: list[sqlite3.Connection] = []  # read conns, for cleanup
        self._closed = False
-        self._ensure_meta_tables()
+        if in_memory:
-        self._load_from_disk()
+            self._conn = sqlite3.connect(":memory:", check_same_thread=False)
-        self._start_backup_thread()
+            self._apply_pragmas(self._conn)
        else:
            # Disk-backed: query the on-disk file directly — no RAM copy, every
            # write persists immediately, and the cache can exceed available RAM.
            db_existed = db_path.exists() and db_path.stat().st_size > 0
            self._conn = self._open_disk_connection(db_existed)
            self._discard_if_schema_mismatch()
-        atexit.register(self._backup_to_disk)
+        self._ensure_meta_tables()
-        signal.signal(signal.SIGTERM, self._on_sigterm)
+        if in_memory:
            self._load_from_disk()
        self._drop_orphan_staging()
        if in_memory:
            self._start_backup_thread()
            atexit.register(self._backup_to_disk)
            signal.signal(signal.SIGTERM, self._on_sigterm)
        else:
            atexit.register(self.close)
    @property
    def connection(self) -> sqlite3.Connection:
-        return self._mem_conn
+        return self._conn
    def _open_disk_connection(self, db_existed: bool) -> sqlite3.Connection:
        """Open the on-disk cache connection with WAL + the configured pragmas.
        ``page_size`` and ``auto_vacuum`` are layout pragmas that only take
        effect on a *fresh* file (before the first table exists), so they are
        applied conditionally on ``db_existed``; everything else is applied
        unconditionally. Used by both ``__init__`` and :meth:`hard_reset`.
        """
        conn = sqlite3.connect(str(self._db_path), check_same_thread=False)
        # page_size must be set before WAL/the first table on a brand-new file;
        # on an existing file it is silently ignored until the next VACUUM.
        if "page_size" in self._pragmas:
            wanted = int(self._pragmas["page_size"])
            if db_existed:
                actual = conn.execute("PRAGMA page_size").fetchone()[0]
                if actual != wanted:
                    logger.warning(
                        f"page_size={wanted} requested but the cache file already "
                        f"exists with page_size={actual}; the new value takes "
                        "effect only after the cache is wiped (hard_reset()) or "
                        "rebuilt from scratch."
                    )
            else:
                conn.execute(f"PRAGMA page_size = {wanted}")
        # auto_vacuum must be set before the database header is materialized,
        # i.e. before switching to WAL (which writes the header) — otherwise the
        # value silently reverts to 0/NONE and only a full VACUUM could apply it.
        if not db_existed and "auto_vacuum" in self._pragmas:
            conn.execute(f"PRAGMA auto_vacuum = {self._pragmas['auto_vacuum']}")
        conn.execute("PRAGMA journal_mode=WAL")
        conn.execute("PRAGMA synchronous=NORMAL")
        self._apply_pragmas(conn, exclude={"page_size", "auto_vacuum"})
        return conn
    def _apply_pragmas(
        self, conn: sqlite3.Connection, exclude: set[str] | None = None
    ) -> None:
        """Apply the user-supplied PRAGMAs to *conn*, skipping *exclude*.
        SQLite silently ignores unknown or inapplicable pragmas, so a bad value
        degrades gracefully (e.g. mmap unsupported) rather than crashing startup.
        """
        skip = exclude or set()
        for key, value in self._pragmas.items():
            if key in skip:
                continue
            conn.execute(f"PRAGMA {key} = {value}")
    def _ensure_meta_tables(self) -> None:
-        self._mem_conn.executescript("""
+        self._conn.executescript("""
            CREATE TABLE IF NOT EXISTS _sqlmem_meta (
                key   TEXT PRIMARY KEY,
                value TEXT NOT NULL
@@ -40,7 +145,9 @@ class CacheManager:
            CREATE TABLE IF NOT EXISTS _sqlmem_tables (
                table_name      TEXT PRIMARY KEY,
                last_refresh_at TEXT NOT NULL,
-                row_count       INTEGER
+                row_count       INTEGER,
                is_full         INTEGER NOT NULL DEFAULT 0,
                last_synced_at  TEXT
            );
            CREATE TABLE IF NOT EXISTS _sqlmem_columns (
                table_name  TEXT NOT NULL,
@@ -48,19 +155,52 @@ class CacheManager:
                PRIMARY KEY (table_name, column_name)
            );
        """)
-        self._mem_conn.execute(
+        self._conn.execute(
            "INSERT OR IGNORE INTO _sqlmem_meta (key, value) VALUES (?, ?)",
            ("app_version", _meta.__version__),
        )
-        self._mem_conn.execute(
+        self._conn.execute(
            "INSERT OR IGNORE INTO _sqlmem_meta (key, value) VALUES (?, ?)",
            ("schema_version", str(SCHEMA_VERSION)),
        )
-        self._mem_conn.execute(
+        self._conn.execute(
            "INSERT OR IGNORE INTO _sqlmem_meta (key, value) VALUES (?, ?)",
            ("created_at", _now()),
        )
-        self._mem_conn.commit()
+        self._conn.commit()
    def _discard_if_schema_mismatch(self) -> None:
        """Disk mode: wipe an existing cache file written by an incompatible schema.
        In memory mode the equivalent check lives in :meth:`_load_from_disk`; here
        we operate on the live on-disk connection, dropping every table so the
        meta tables are recreated fresh by :meth:`_ensure_meta_tables`.
        """
        meta_exists = self._conn.execute(
            "SELECT 1 FROM sqlite_master WHERE type = 'table' AND name = '_sqlmem_meta'"
        ).fetchone()
        if not meta_exists:
            return  # fresh file — nothing to validate
        row = self._conn.execute(
            "SELECT value FROM _sqlmem_meta WHERE key = 'schema_version'"
        ).fetchone()
        if row is not None and int(row[0]) == SCHEMA_VERSION:
            return
        logger.warning(
            "Cache schema version mismatch — wiping on-disk cache, starting fresh."
        )
        names = [
            r[0]
            for r in self._conn.execute(
                r"SELECT name FROM sqlite_master WHERE type = 'table' "
                r"AND name NOT LIKE 'sqlite\_%' ESCAPE '\'"
            ).fetchall()
        ]
        for name in names:
            self._conn.execute(f"DROP TABLE IF EXISTS {quote(name)}")
        self._conn.commit()
    def _load_from_disk(self) -> None:
        if not self._db_path.exists():
@@ -78,21 +218,41 @@ class CacheManager:
                disk_conn.close()
                return
-            disk_conn.backup(self._mem_conn)
+            disk_conn.backup(self._conn)
            logger.info("Cache loaded from disk successfully.")
        except Exception as e:
            logger.error(f"Failed to load cache from disk: {e} — starting fresh.")
        finally:
            disk_conn.close()
    def _drop_orphan_staging(self) -> None:
        """Drop staging tables left by a load that was interrupted (e.g. crash mid-load)."""
        orphans = [
            r[0]
            for r in self._conn.execute(
                r"SELECT name FROM sqlite_master "
                r"WHERE type = 'table' AND name LIKE '%\_\_sqlmem\_load' ESCAPE '\'"
            ).fetchall()
        ]
        for name in orphans:
            logger.warning(f"Dropping orphan staging table {name!r} from a previous interrupted load.")
            self._conn.execute(f"DROP TABLE IF EXISTS {quote(name)}")
        if orphans:
            self._conn.commit()
    def _backup_to_disk(self) -> None:
        if self._closed:
            return
        if not self._in_memory:
            # Disk-backed: every write already lands on disk; just flush the WAL.
            with self._lock:
                self._conn.commit()
            return
        logger.info(f"Backing up cache to {self._db_path}")
        try:
            with self._lock:
                disk_conn = sqlite3.connect(self._db_path)
-                self._mem_conn.backup(disk_conn)
+                self._conn.backup(disk_conn)
                disk_conn.close()
            logger.info("Cache backup complete.")
        except Exception as e:
@@ -112,46 +272,431 @@ class CacheManager:
        logger.info("SIGTERM received — flushing cache to disk.")
        self._backup_to_disk()
-    def mark_table_refreshed(self, table: str, row_count: int) -> None:
+    def mark_table_refreshed(self, table: str, row_count: int, full: bool = False) -> None:
        ts = _now()
        with self._lock:
-            self._mem_conn.execute(
+            self._conn.execute(
                """
-                INSERT INTO _sqlmem_tables (table_name, last_refresh_at, row_count)
+                INSERT INTO _sqlmem_tables (table_name, last_refresh_at, row_count, is_full)
-                VALUES (?, ?, ?)
+                VALUES (?, ?, ?, ?)
                ON CONFLICT(table_name) DO UPDATE SET
                    last_refresh_at = excluded.last_refresh_at,
-                    row_count = excluded.row_count
+                    row_count = excluded.row_count,
                    is_full = excluded.is_full
                """,
-                (table, _now(), row_count),
+                (table, ts, row_count, int(full)),
            )
-            self._mem_conn.commit()
+            self._conn.commit()
        self._last_run[table] = ts  # a write is also a refresh-cycle run
    def mark_refresh_ran(self, table: str) -> None:
        """Record that a refresh cycle ran for *table* now, even if it wrote nothing.
        In-memory only (like states/errors) — never persisted, never touches the
        schema. This is the liveness signal surfaced as ``TableStats.last_refresh``,
        distinct from the persisted last *write* time (``last_upsert``).
        """
        self._last_run[table] = _now()
    def get_last_runs(self) -> dict[str, str]:
        return dict(self._last_run)
    def is_table_cached(self, table: str) -> bool:
-        row = self._mem_conn.execute(
+        row = self._conn.execute(
            "SELECT 1 FROM _sqlmem_tables WHERE table_name = ?", (table,)
        ).fetchone()
        return row is not None
-    def load_table(self, table: str, columns: list[str], source_conn: sqlite3.Connection) -> None:
+    def is_table_full(self, table: str) -> bool:
-        cols = ", ".join(columns)
+        """True if the whole table (all columns) is cached — a SELECT * cache hit."""
-        logger.info(f"Fetching {table!r} columns [{cols}] from source DB")
+        row = self._conn.execute(
-        rows = source_conn.execute(f"SELECT {cols} FROM {table}").fetchall()
+            "SELECT is_full FROM _sqlmem_tables WHERE table_name = ?", (table,)
        ).fetchone()
        return bool(row and row[0])
    def seconds_since_refresh(self, table: str) -> float | None:
        """Age of a cached table in seconds, or None if it is not cached."""
        row = self._conn.execute(
            "SELECT last_refresh_at FROM _sqlmem_tables WHERE table_name = ?", (table,)
        ).fetchone()
        if not row or not row[0]:
            return None
        last = datetime.fromisoformat(row[0])
        return (datetime.now(timezone.utc) - last).total_seconds()
    def discover_columns(self, table: str, source_conn: sqlite3.Connection) -> list[str]:
        """Return all column names of *table* from the source DB without fetching rows."""
        logger.debug(f"Discovering columns of {table!r} from source DB")
        cursor = source_conn.execute(
            f"SELECT * FROM {quote_source(table, self._dialect)} WHERE 1 = 0"
        )
        columns = [desc[0] for desc in cursor.description]
        logger.debug(f"{table!r} has columns: {columns}")
        return columns
    def set_state(self, table: str, state: str) -> None:
        self._states[table] = state
    def get_states(self) -> dict[str, str]:
        return dict(self._states)
    def clear_state(self, table: str) -> None:
        self._states.pop(table, None)
        self._errors.pop(table, None)
        self._last_run.pop(table, None)
    def record_error(self, table: str, message: str) -> None:
        """Record a load/refresh failure for *table* (increments its failure streak)."""
        prev = self._errors.get(table)
        streak = (prev.consecutive if prev else 0) + 1
        self._errors[table] = TableError(message=message, at=_now(), consecutive=streak)
        self._error_total += 1
        logger.debug(f"Recorded error for {table!r} (streak {streak}): {message}")
    def record_success(self, table: str) -> None:
        """Reset *table*'s failure streak to 0 after a successful load/refresh."""
        prev = self._errors.get(table)
        if prev and prev.consecutive:
            self._errors[table] = TableError(prev.message, prev.at, 0)
    def get_errors(self) -> dict[str, TableError]:
        return dict(self._errors)
    @property
    def error_total(self) -> int:
        return self._error_total
    def add_index(self, table: str, columns: list[str]) -> None:
        """Register a secondary index to (re)create on *columns* after each load."""
        name = "sqlmem_idx_" + "_".join([table, *columns])
        defs = self._index_defs.setdefault(table, [])
        if all(d.name != name for d in defs):
            defs.append(_Index(name=name, columns=tuple(columns)))
    def _create_indexes(self, table: str, available: list[str]) -> None:
        """Create the registered secondary indexes whose columns are all cached."""
        available_set = set(available)
        for idx in self._index_defs.get(table, []):
            if not set(idx.columns) <= available_set:
                logger.warning(
                    f"Skipping index {idx.name!r}: columns {idx.columns} not all cached."
                )
                continue
            cols = quote_list(idx.columns)
            with self._lock:
                self._conn.execute(
                    f"CREATE INDEX IF NOT EXISTS {quote(idx.name)} ON {quote(table)} ({cols})"
                )
                self._conn.commit()
            logger.debug(f"Index {idx.name!r} ready on {table} ({cols})")
    def _row_coercer(self, table: str, columns: list[str]):
        """Return a per-row coercer for *columns* in source order.
        Columns registered in ``datetime_columns`` for *table* are coerced to
        INTEGER µs-since-epoch (``to_sqlite_datetime``); everything else keeps the
        default stringifying coercion (``to_sqlite``). With no datetime columns it
        is the plain :func:`coerce_row`, so the common path is unchanged.
        """
        dt_cols = set(self._datetime_columns.get(table, ()))
        dt_idx = {i for i, c in enumerate(columns) if c in dt_cols}
        if not dt_idx:
            return coerce_row
        def coerce(row: tuple) -> tuple:
            return tuple(
                to_sqlite_datetime(v) if i in dt_idx else to_sqlite(v)
                for i, v in enumerate(row)
            )
        return coerce
    def load_table(
        self,
        table: str,
        columns: list[str],
        source_conn: sqlite3.Connection,
        full: bool = False,
    ) -> None:
        """Stream the source table into the cache in batches.
        Rows are fetched ``FETCH_BATCH_SIZE`` at a time into a private staging
        table and swapped in atomically, so peak memory stays bounded (no
        ``fetchall`` of a huge table) and readers keep seeing the previous copy
        until the swap. Concurrent loads are serialized by ``_load_lock``; the
        connection lock is only held for the brief per-batch inserts and the swap.
        """
        src_cols = ", ".join(quote_source(c, self._dialect) for c in columns)
        dt_cols = set(self._datetime_columns.get(table, ()))
        col_defs = ", ".join(
            f"{quote(c)} {'INTEGER' if c in dt_cols else 'TEXT'}" for c in columns
        )
        coerce = self._row_coercer(table, columns)
        placeholders = ", ".join("?" * len(columns))
        staging = f"{table}__sqlmem_load"
        q_staging = quote(staging)
        q_table = quote(table)
        with self._load_lock:
            self.set_state(table, TableState.LOADING)
            logger.info(f"Fetching {table!r} columns {columns} from source DB (batch={self._fetch_batch})")
            try:
                cursor = source_conn.execute(
                    f"SELECT {src_cols} FROM {quote_source(table, self._dialect)}"
                )
                with self._lock:
                    self._conn.execute(f"DROP TABLE IF EXISTS {q_staging}")
                    self._conn.execute(f"CREATE TABLE {q_staging} ({col_defs})")
                    self._conn.commit()
                total = 0
                insert_sql = f"INSERT INTO {q_staging} VALUES ({placeholders})"
                while True:
                    batch = cursor.fetchmany(self._fetch_batch)  # network outside _lock
                    if not batch:
                        break
                    clean = [coerce(row) for row in batch]
                    with self._lock:
                        self._conn.executemany(insert_sql, clean)
                        self._conn.commit()
                    total += len(batch)
                with self._lock:  # atomic swap — readers see old or new, never partial
                    self._conn.execute(f"DROP TABLE IF EXISTS {q_table}")
                    self._conn.execute(f"ALTER TABLE {q_staging} RENAME TO {q_table}")
                    self._conn.commit()
            except BaseException as exc:
                with self._lock:
                    self._conn.execute(f"DROP TABLE IF EXISTS {q_staging}")
                    self._conn.commit()
                self.set_state(table, TableState.ERROR)
                self.record_error(table, f"{type(exc).__name__}: {exc}")
                raise
            self._create_indexes(table, columns)
            self.mark_table_refreshed(table, total, full)
            self.set_state(table, TableState.READY)
            self.record_success(table)
            logger.info(f"Table {table!r} cached ({total} rows, columns: {columns})")
    def _read_conn(self) -> sqlite3.Connection:
        """A per-thread, read-only connection used for cache reads in disk mode.
        Disk mode runs in WAL, which allows many concurrent readers alongside one
        writer. Giving each thread its own read connection (rather than sharing the
        single write connection under ``_lock``) means a slow ``SELECT`` no longer
        blocks writers (loads/upserts) or other readers. In-memory mode can't do
        this — each ``:memory:`` connection is a separate database — so it keeps
        using the single locked connection.
        """
        conn = getattr(self._read_local, "conn", None)
        if conn is None:
            conn = sqlite3.connect(str(self._db_path), check_same_thread=False)
            conn.execute("PRAGMA query_only=ON")  # read-only guard
            self._read_local.conn = conn
            with self._lock:
                self._read_conns.append(conn)
        return conn
    def execute_in_memory(
        self, sql: str, params: tuple | list | dict | None = None
    ) -> tuple[list[str], list[tuple]]:
        """Run a read query against the cache.
        In-memory mode serializes with writers on the single connection. Disk mode
        reads from a per-thread WAL connection, so reads run concurrently with
        writers and each other (see :meth:`_read_conn`).
        """
        bound = coerce_params(params)
        if self._in_memory:
            with self._lock:
                cursor = (
                    self._conn.execute(sql)
                    if bound is None
                    else self._conn.execute(sql, bound)
                )
                col_names = [desc[0] for desc in cursor.description]
                rows = cursor.fetchall()
            return col_names, rows
        conn = self._read_conn()
        cursor = conn.execute(sql) if bound is None else conn.execute(sql, bound)
        col_names = [desc[0] for desc in cursor.description]
        rows = cursor.fetchall()
        return col_names, rows
    # --- delta refresh support ---------------------------------------------
    def get_table_columns(self, table: str) -> list[str]:
        """Authoritative ordered column list of a cached table (via PRAGMA)."""
        rows = self._conn.execute(f"PRAGMA table_info({quote(table)})").fetchall()
        return [r[1] for r in rows]
    def create_unique_index(self, table: str, key_columns: list[str]) -> None:
        """Create the unique index on *key_columns* that makes upsert-by-key work."""
        cols = quote_list(key_columns)
        index = quote(f"idx_{table}_pk")
        with self._lock:
-            self._mem_conn.execute(f"DROP TABLE IF EXISTS {table}")
+            self._conn.execute(
-            col_defs = ", ".join(f"{c} TEXT" for c in columns)
+                f"CREATE UNIQUE INDEX IF NOT EXISTS {index} ON {quote(table)} ({cols})"
-            self._mem_conn.execute(f"CREATE TABLE {table} ({col_defs})")
+            )
-            placeholders = ", ".join("?" * len(columns))
+            self._conn.commit()
            self._mem_conn.executemany(f"INSERT INTO {table} VALUES ({placeholders})", rows)
            self._mem_conn.commit()
-        self.mark_table_refreshed(table, len(rows))
+    def get_last_synced_at(self, table: str) -> str | None:
-        logger.info(f"Table {table!r} cached ({len(rows)} rows, columns: {columns})")
+        row = self._conn.execute(
            "SELECT last_synced_at FROM _sqlmem_tables WHERE table_name = ?", (table,)
        ).fetchone()
        # Stored in a TEXT column: an INTEGER-µs watermark (datetime_columns) comes
        # back as its digit string; delta._bind_watermark reconstructs the datetime.
        return row[0] if row else None
    def set_last_synced_at(self, table: str, value: str | int | None) -> None:
        with self._lock:
            self._conn.execute(
                "UPDATE _sqlmem_tables SET last_synced_at = ? WHERE table_name = ?",
                (value, table),
            )
            self._conn.commit()
    def max_value(self, table: str, column: str) -> str | int | None:
        """Maximum value of *column* across cached rows (the delta watermark).
        Returns an ``int`` for a datetime column stored as INTEGER µs, else the
        ISO ``TEXT`` string."""
        row = self._conn.execute(
            f"SELECT MAX({quote(column)}) FROM {quote(table)}"
        ).fetchone()
        return row[0] if row else None
    def upsert_rows(self, table: str, columns: list[str], rows: list[tuple]) -> None:
        """Insert-or-replace one batch of *rows* by the table's unique key."""
        col_list = quote_list(columns)
        placeholders = ", ".join("?" * len(columns))
        coerce = self._row_coercer(table, columns)
        clean_rows = [coerce(row) for row in rows]
        with self._lock:
            self._conn.executemany(
                f"INSERT OR REPLACE INTO {quote(table)} ({col_list}) VALUES ({placeholders})",
                clean_rows,
            )
            self._conn.commit()
    def count_rows(self, table: str) -> int:
        row = self._conn.execute(f"SELECT COUNT(*) FROM {quote(table)}").fetchone()
        return int(row[0]) if row else 0
    def reset(self) -> None:
        """Wipe the entire cache — every cached table plus the on-disk data
        (the file is deleted in memory mode, VACUUMed in place in disk mode)."""
        logger.info("Resetting cache — dropping all cached tables.")
        with self._lock:
            user_tables = [
                r[0]
                for r in self._conn.execute(
                    "SELECT name FROM sqlite_master "
                    r"WHERE type = 'table' AND name NOT LIKE 'sqlite\_%' ESCAPE '\' "
                r"AND name NOT LIKE '\_sqlmem\_%' ESCAPE '\'"
                ).fetchall()
            ]
            for name in user_tables:
                self._conn.execute(f"DROP TABLE IF EXISTS {quote(name)}")
            self._conn.execute("DELETE FROM _sqlmem_tables")
            self._conn.execute("DELETE FROM _sqlmem_columns")
            self._conn.commit()
        self._states.clear()
        self._last_run.clear()
        if self._in_memory:
            try:
                if self._db_path.exists():
                    self._db_path.unlink()
            except OSError as e:
                logger.error(f"Failed to delete cache file {self._db_path}: {e}")
        else:
            # The open connection *is* the file — drop tables persisted the wipe;
            # VACUUM reclaims the freed pages on disk.
            try:
                with self._lock:
                    self._conn.execute("VACUUM")
            except sqlite3.Error as e:
                logger.error(f"Failed to VACUUM cache file {self._db_path}: {e}")
    def hard_reset(self) -> None:
        """Delete the on-disk cache file and reopen it from scratch.
        Unlike :meth:`reset` (which drops tables but keeps the open file, so the
        baked-in ``page_size``/``auto_vacuum`` cannot change), this closes every
        connection, removes the file plus its WAL/SHM sidecars, and reopens with
        all current pragmas applied — so layout pragmas take effect on the fresh
        file. Disk mode only; in memory mode it falls back to :meth:`reset`.
        Any read in flight on another thread will see its connection closed from
        under it; treat this as a maintenance operation.
        """
        if self._in_memory:
            self.reset()
            return
        logger.info(f"Hard reset: closing connections and deleting {self._db_path}")
        with self._lock:
            for conn in self._read_conns:
                try:
                    conn.close()
                except sqlite3.Error:
                    pass
            self._read_conns.clear()
            self._read_local = threading.local()  # force every thread to reopen
            self._conn.close()
            for suffix in ("", "-wal", "-shm"):
                p = Path(str(self._db_path) + suffix)
                if p.exists():
                    p.unlink()
            # Reopen fresh — page_size/auto_vacuum apply to the new empty file.
            self._conn = self._open_disk_connection(db_existed=False)
            self._ensure_meta_tables()
        self._states.clear()
        self._errors.clear()
        self._last_run.clear()
        self._error_total = 0
        logger.info(f"Hard reset complete — cache recreated at {self._db_path}.")
    def vacuum(self, incremental: bool = True, pages: int = 10_000) -> None:
        """Run maintenance VACUUM on the on-disk cache (no-op in memory mode).
        ``incremental=True`` (default) reclaims up to *pages* free pages without
        blocking readers or needing extra disk space — but requires the cache to
        have been created with ``auto_vacuum=INCREMENTAL`` (otherwise it is a
        no-op). ``incremental=False`` runs a full ``VACUUM``: it rewrites the
        whole file (needs ~2× disk space, blocks readers) — use only in a
        maintenance window.
        """
        if self._in_memory:
            logger.debug("vacuum() called in memory mode — no-op.")
            return
        if incremental:
            with self._lock:
                self._conn.execute(f"PRAGMA incremental_vacuum({pages})")
                self._conn.commit()
            logger.info(f"Incremental vacuum: reclaimed up to {pages} pages.")
        else:
            logger.info("Full VACUUM started — this may take several minutes.")
            with self._lock:
                self._conn.execute("VACUUM")
            logger.info("Full VACUUM complete.")
    def close(self) -> None:
        self._backup_to_disk()
        self._closed = True
-        self._mem_conn.close()
+        with self._lock:
            for conn in self._read_conns:
                try:
                    conn.close()
                except sqlite3.Error:
                    pass
            self._read_conns.clear()
        self._conn.close()
 def _now() -> str:
@@ -8,7 +8,17 @@ load_dotenv()
 DEBUG = os.getenv("SQLMEM_DEBUG", "false").lower() == "true"
 CACHE_DB_PATH = Path(os.getenv("SQLMEM_CACHE_DB", "cache.db"))
 # Cache backend: in-memory SQLite (default) backed up to disk periodically, or
 # query the on-disk SQLite file directly (no RAM copy, every write persists).
 IN_MEMORY = os.getenv("SQLMEM_IN_MEMORY", "true").lower() == "true"
 BACKUP_INTERVAL_SECONDS = int(os.getenv("SQLMEM_BACKUP_INTERVAL", "3600"))
 # How often (seconds) the background thread pulls deltas for delta-tracked tables.
 REFRESH_INTERVAL_SECONDS = int(os.getenv("SQLMEM_REFRESH_INTERVAL", "300"))
 # Rows fetched per batch when loading a table — caps peak memory for huge tables.
 FETCH_BATCH_SIZE = int(os.getenv("SQLMEM_FETCH_BATCH", "10000"))
 # Dialect used by sqlglot to parse incoming SQL. Defaults to T-SQL (SQL Server),
 # which also accepts ANSI SQL. In-memory queries are always rendered to SQLite.
 SQL_DIALECT = os.getenv("SQLMEM_SQL_DIALECT", "tsql")
 # Silent by default — callers opt in via add_sink().
 logger.disable("sqlmem")
@@ -0,0 +1,142 @@
 from dataclasses import dataclass, field
 from datetime import datetime, timedelta, timezone
 from typing import Any
 from loguru import logger
 from ._sql import quote_source
 from .cache import CacheManager
 from .stats import TableState
 _EPOCH = datetime(1970, 1, 1, tzinfo=timezone.utc)
 def _bind_watermark(watermark: str | int, epoch_us: bool = False) -> datetime | str:
    """Bind the delta watermark back to the source in its native type.
    The cache stores the change column as an ISO ``TEXT`` string (see
    ``_coerce.to_sqlite``), so ``max(change_column)`` comes back as a string such
    as ``'2026-06-05T14:54:24.823000'``. Sending that straight back to the source
    as an ``nvarchar`` makes SQL Server do an implicit ``varchar -> datetime``
    conversion, which **fails** on the ``T``-separated, 6-digit-microsecond ISO
    form (error 241 / SQLSTATE 22007 — ``datetime`` accepts at most 3 fractional
    digits). Parsing it back to a real :class:`~datetime.datetime` makes the
    driver send a typed timestamp, so the comparison happens natively with no
    string conversion. Non-datetime change columns (e.g. an integer rowversion)
    don't parse and are passed through unchanged.
    When the change column is stored as INTEGER µs-since-epoch (``datetime_columns``)
    *epoch_us* is set: the watermark is a microsecond count (an ``int`` or its digit
    string, since it round-trips through a TEXT column) and is reconstructed into a
    UTC :class:`~datetime.datetime` so the source still receives a typed timestamp.
    """
    if epoch_us:
        try:
            return _EPOCH + timedelta(microseconds=int(watermark))
        except (TypeError, ValueError):
            return watermark if isinstance(watermark, str) else str(watermark)
    try:
        return datetime.fromisoformat(watermark)  # type: ignore[arg-type]
    except (TypeError, ValueError):
        return watermark if isinstance(watermark, str) else str(watermark)
@dataclass(frozen=True)
 class DeltaConfig:
    """Per-table configuration for incremental (delta) refresh.
    *change_column* is the column the source DB updates on every insert/update
    (a non-decreasing timestamp / rowversion). *key_columns* uniquely identify a
    row and are used to upsert changed rows in place; leave them empty to let the
    engine auto-discover the primary key from the source DB (works for real
    tables, not views).
    """
    change_column: str
    key_columns: list[str] = field(default_factory=list)
@dataclass(frozen=True)
 class ResolvedDelta:
    """A :class:`DeltaConfig` with ``key_columns`` resolved to concrete columns."""
    change_column: str
    key_columns: list[str]
 class DeltaRefresher:
    """Pulls only changed rows for delta-tracked tables and upserts them.
    Uses a data-driven high-watermark (``max(change_column)`` actually cached)
    with a ``>=`` overlap and idempotent upsert by key, so no row is ever missed
    and boundary rows are harmlessly re-read.
    """
    def __init__(self, cache: CacheManager, delta: dict[str, ResolvedDelta]) -> None:
        self._cache = cache
        self._delta = delta
    def refresh(self, source_conn: Any) -> None:
        for table, cfg in self._delta.items():
            if not self._cache.is_table_cached(table):
                continue
            try:
                self._refresh_table(table, cfg, source_conn)
                self._cache.record_success(table)
            except Exception as e:  # one bad table must not stop the others
                logger.error(f"Delta refresh failed for {table!r}: {e}")
                # A delta can fail before streaming starts (e.g. a watermark the
                # source rejects), leaving state misleadingly READY — mark it and
                # record the error so stats reveal the stuck table.
                self._cache.set_state(table, TableState.ERROR)
                self._cache.record_error(table, f"{type(e).__name__}: {e}")
    def _refresh_table(
        self, table: str, cfg: ResolvedDelta, source_conn: Any
    ) -> None:
        columns = self._cache.get_table_columns(table)
        watermark = self._cache.get_last_synced_at(table)
        dialect = self._cache._dialect
        col_list = ", ".join(quote_source(c, dialect) for c in columns)
        q_table = quote_source(table, dialect)
        if watermark is None:
            cursor = source_conn.execute(f"SELECT {col_list} FROM {q_table}")
        else:
            change_col = quote_source(cfg.change_column, dialect)
            epoch_us = cfg.change_column in self._cache._datetime_columns.get(table, ())
            cursor = source_conn.execute(
                f"SELECT {col_list} FROM {q_table} WHERE {change_col} >= ?",
                (_bind_watermark(watermark, epoch_us),),
            )
        # Stream the delta in batches so a large catch-up never materializes at once.
        total = 0
        self._cache.set_state(table, TableState.REFRESHING)
        try:
            while True:
                batch = cursor.fetchmany(self._cache._fetch_batch)
                if not batch:
                    break
                self._cache.upsert_rows(table, columns, batch)
                total += len(batch)
        finally:
            self._cache.set_state(table, TableState.READY)
        if total == 0:
            # The cycle ran but wrote nothing — record liveness (last_refresh) without
            # touching the persisted last-write time (last_upsert).
            self._cache.mark_refresh_ran(table)
            logger.debug(f"Delta refresh {table!r}: no changes since {watermark!r}")
            return
        # Update row_count / last_refresh once (not per batch) and advance the watermark.
        self._cache.mark_table_refreshed(
            table, self._cache.count_rows(table), self._cache.is_table_full(table)
        )
        new_watermark = self._cache.max_value(table, cfg.change_column)
        self._cache.set_last_synced_at(table, new_watermark)
        logger.info(
            f"Delta refresh {table!r}: {total} row(s) upserted, "
            f"watermark {watermark!r} → {new_watermark!r}"
        )
@@ -1,35 +1,262 @@
-import sqlite3
+import threading
 from dataclasses import replace
 from pathlib import Path
 from typing import Any
 from loguru import logger
-from sqlalchemy.engine import Engine
+from sqlalchemy import inspect
 from sqlalchemy.engine import Connection, Engine
-from .cache import CacheManager
+from ._sql import quote
-from .config import BACKUP_INTERVAL_SECONDS, CACHE_DB_PATH
+from .cache import CacheManager, TableError
 from .config import (
    BACKUP_INTERVAL_SECONDS,
    CACHE_DB_PATH,
    FETCH_BATCH_SIZE,
    IN_MEMORY,
    REFRESH_INTERVAL_SECONDS,
    SQL_DIALECT,
 )
 from .delta import DeltaConfig, DeltaRefresher, ResolvedDelta
 from .executor import QueryExecutor
-from .parser import parse
+from .parser import Params, parse
 from .registry import ColumnRegistry
 from .stats import Stats, StatsCollector, TableState, TableStats
 class _LazySource:
    """A source connection opened on first ``execute`` and shared across one query.
    Most queries are cache hits that never touch the source, so opening it (and
    occupying a connection-pool slot) eagerly is wasteful. This proxy forwards
    ``execute`` to a real connection opened on demand, then released by ``close``.
    """
    def __init__(self, source_engine: Engine) -> None:
        self._source_engine = source_engine
        self._sa_conn: Connection | None = None
        self._raw: Any = None
    def execute(self, *args: Any, **kwargs: Any) -> Any:
        if self._raw is None:
            self._sa_conn = self._source_engine.connect()
            self._raw = self._sa_conn.connection.dbapi_connection
        return self._raw.execute(*args, **kwargs)
    def close(self) -> None:
        if self._sa_conn is not None:
            self._sa_conn.close()
            self._sa_conn = None
            self._raw = None
 class CachingEngine:
    """Transparent SQLAlchemy-compatible cache layer."""
-    def __init__(self, source_engine: Engine) -> None:
+    def __init__(
        self,
        source_engine: Engine,
        delta: dict[str, DeltaConfig] | None = None,
        ttl: dict[str, int] | None = None,
        indexes: dict[str, list[str | list[str]]] | None = None,
        in_memory: bool | None = None,
        cache_db_path: str | Path | None = None,
        backup_interval: int | None = None,
        refresh_interval: int | None = None,
        fetch_batch: int | None = None,
        dialect: str | None = None,
        pragmas: dict[str, str | int] | None = None,
        datetime_columns: dict[str, list[str]] | None = None,
        blocking_startup_refresh: bool = False,
    ) -> None:
        self._source_engine = source_engine
-        self._cache = CacheManager(CACHE_DB_PATH, BACKUP_INTERVAL_SECONDS)
+        use_memory = IN_MEMORY if in_memory is None else in_memory
        self._dialect = dialect if dialect is not None else SQL_DIALECT
        self._refresh_interval = (
            refresh_interval if refresh_interval is not None else REFRESH_INTERVAL_SECONDS
        )
        self._cache = CacheManager(
            Path(cache_db_path) if cache_db_path is not None else CACHE_DB_PATH,
            backup_interval if backup_interval is not None else BACKUP_INTERVAL_SECONDS,
            in_memory=use_memory,
            dialect=self._dialect,
            fetch_batch=fetch_batch if fetch_batch is not None else FETCH_BATCH_SIZE,
            pragmas=pragmas,
            datetime_columns=datetime_columns,
        )
        self._registry = ColumnRegistry(self._cache.connection)
        self._stats = StatsCollector()
        self._delta = self._resolve_delta(delta or {})
        self._ttl = dict(ttl or {})
        self._index_columns = self._register_indexes(indexes or {})
        self._refresher = DeltaRefresher(self._cache, self._delta)
        overlap = set(self._delta) & set(self._ttl)
        if overlap:
            raise ValueError(
                f"Tables {sorted(overlap)} are in both delta and ttl — a table is "
                "either delta-refreshed (has a change column) or TTL-refreshed (full "
                "reload), not both."
            )
        if self._delta or self._ttl:
            # The startup catch-up (deltas/TTL reloads for tables restored from
            # disk) can take a while on a cold start. By default it runs on the
            # background thread so it never blocks application startup; callers
            # who need the cache fully fresh before serving can opt back in.
            if blocking_startup_refresh:
                self._run_refresh()
            self._start_refresh_thread(initial_catch_up=not blocking_startup_refresh)
        logger.info("CachingEngine initialized.")
-    def execute(self, sql: str) -> list[dict]:
+    def _register_indexes(
-        parsed = parse(sql)
+        self, indexes: dict[str, list[str | list[str]]]
-        with self._source_engine.connect() as sa_conn:
+    ) -> dict[str, list[str]]:
-            raw_conn: sqlite3.Connection = sa_conn.connection.dbapi_connection
+        """Register secondary indexes on the cache; return columns to load per table."""
-            executor = QueryExecutor(self._cache, self._registry, raw_conn)
+        index_columns: dict[str, list[str]] = {}
        for table, specs in indexes.items():
            wanted: list[str] = []
            for spec in specs:
                columns = [spec] if isinstance(spec, str) else list(spec)
                self._cache.add_index(table, columns)
                for col in columns:
                    if col not in wanted:
                        wanted.append(col)
            index_columns[table] = wanted
        return index_columns
    def _resolve_delta(self, delta: dict[str, DeltaConfig]) -> dict[str, ResolvedDelta]:
        """Resolve each DeltaConfig, auto-discovering the primary key when omitted."""
        resolved: dict[str, ResolvedDelta] = {}
        inspector = None
        for table, cfg in delta.items():
            keys = list(cfg.key_columns)
            if not keys:
                inspector = inspector or inspect(self._source_engine)
                pk = inspector.get_pk_constraint(table)
                keys = list(pk.get("constrained_columns") or [])
                if not keys:
                    raise ValueError(
                        f"No primary key found for {table!r} in the source DB "
                        "(views have none) — set key_columns in its DeltaConfig."
                    )
                logger.info(f"Delta {table!r}: auto-discovered key columns {keys}")
            resolved[table] = ResolvedDelta(change_column=cfg.change_column, key_columns=keys)
        return resolved
    @property
    def stats(self) -> Stats:
        states = self._cache.get_states()
        errors = self._cache.get_errors()
        last_runs = self._cache.get_last_runs()
        with self._cache._lock:
            base = self._stats.snapshot(self._cache.connection, states)
        base = replace(base, errors=self._cache.error_total)
        return replace(
            base,
            tables={n: self._enrich(n, t, errors, last_runs) for n, t in base.tables.items()},
        )
    def _enrich(
        self,
        name: str,
        table_stats: TableStats,
        errors: dict[str, TableError],
        last_runs: dict[str, str],
    ) -> TableStats:
        """Annotate a TableStats with refresh tracking, TTL staleness, errors and run time."""
        if name in self._delta:
            tracking = "delta"
        elif name in self._ttl:
            tracking = "ttl"
        else:
            tracking = "static"
        state = table_stats.state
        if state == TableState.READY and name in self._ttl:
            age = self._cache.seconds_since_refresh(name)
            if age is not None and age > self._ttl[name]:
                state = TableState.STALE
        last_refresh = last_runs.get(name)
        err = errors.get(name)
        if err is not None:
            return replace(
                table_stats,
                tracking=tracking,
                state=state,
                last_refresh=last_refresh,
                last_error=err.message,
                last_error_at=err.at,
                consecutive_failures=err.consecutive,
            )
        return replace(table_stats, tracking=tracking, state=state, last_refresh=last_refresh)
    def execute(self, sql: str, params: Params = None) -> list[dict]:
        parsed = parse(sql, params, dialect=self._dialect)
        # The source connection is opened lazily — a pure cache hit never touches
        # the source and never occupies a pool slot.
        source = _LazySource(self._source_engine)
        try:
            executor = QueryExecutor(
                self._cache,
                self._registry,
                source,
                self._stats,
                self._delta,
                self._ttl,
                self._index_columns,
            )
            return executor.execute(parsed)
        finally:
            source.close()
    def refresh(self) -> None:
        """Pull deltas for all delta-tracked tables now (also runs on a timer)."""
        self._run_refresh()
    def _run_refresh(self) -> None:
        try:
            with self._source_engine.connect() as sa_conn:
                raw_conn = sa_conn.connection.dbapi_connection
                self._refresher.refresh(raw_conn)
                self._refresh_ttl(raw_conn)
        except Exception as e:
            logger.error(f"Refresh cycle failed: {e}")
    def _refresh_ttl(self, source_conn: Any) -> None:
        """Proactively full-reload TTL-tracked tables whose cache has expired."""
        for table, ttl in self._ttl.items():
            if not self._cache.is_table_cached(table):
                continue
            age = self._cache.seconds_since_refresh(table)
            if age is None or age <= ttl:
                continue
            try:
                columns = self._cache.get_table_columns(table)
                full = self._cache.is_table_full(table)
                self._cache.load_table(table, columns, source_conn, full=full)
                logger.info(f"TTL refresh {table!r}: reloaded (age {age:.0f}s > {ttl}s)")
            except Exception as e:
                logger.error(f"TTL refresh failed for {table!r}: {e}")
    def _start_refresh_thread(self, initial_catch_up: bool = True) -> None:
        def loop() -> None:
            if initial_catch_up:
                self._run_refresh()  # off-main-thread startup catch-up
            event = threading.Event()
            while not event.wait(self._refresh_interval):
                self._run_refresh()
        t = threading.Thread(target=loop, daemon=True, name="sqlmem-delta")
        t.start()
        logger.debug(f"Delta refresh thread started (interval={self._refresh_interval}s)")
    def invalidate(self, table: str) -> None:
        logger.info(f"Manually invalidating cache for table {table!r}")
        with self._cache._lock:
-            self._cache.connection.execute(f"DROP TABLE IF EXISTS {table}")
+            self._cache.connection.execute(f"DROP TABLE IF EXISTS {quote(table)}")
            self._cache.connection.execute(
                "DELETE FROM _sqlmem_tables WHERE table_name = ?", (table,)
            )
@@ -37,6 +264,34 @@ class CachingEngine:
                "DELETE FROM _sqlmem_columns WHERE table_name = ?", (table,)
            )
            self._cache.connection.commit()
        self._cache.clear_state(table)
    def reset(self) -> None:
        """Wipe the whole cache (RAM + cache.db). Use after structural source changes."""
        self._cache.reset()
        logger.info("Cache reset — all tables will be reloaded on next use.")
    def hard_reset(self) -> None:
        """Delete the on-disk cache file and reopen with current pragmas/page_size.
        Disk mode only (falls back to :meth:`reset` in memory mode). Use when a
        layout pragma — ``page_size`` or ``auto_vacuum`` — must change, since
        those are baked into the file at creation and :meth:`reset` keeps it.
        All tables reload on next use.
        """
        self._cache.hard_reset()
        # hard_reset swaps the cache connection — re-point the registry at it.
        self._registry.rebind(self._cache.connection)
        logger.info("Cache hard reset — file recreated; all tables reload on next use.")
    def vacuum(self, incremental: bool = True, pages: int = 10_000) -> None:
        """Run maintenance VACUUM on the on-disk cache (incremental by default).
        Incremental reclaims free pages left by delta ``INSERT OR REPLACE`` churn
        cheaply (requires ``auto_vacuum=INCREMENTAL``); a full VACUUM rewrites the
        whole file and should run only in a maintenance window.
        """
        self._cache.vacuum(incremental=incremental, pages=pages)
    def close(self) -> None:
        self._cache.close()
@@ -1,42 +1,122 @@
-import sqlite3
+from typing import Any
 from loguru import logger
 from .cache import CacheManager
 from .delta import ResolvedDelta
 from .parser import ParsedQuery
 from .registry import ColumnRegistry
 from .stats import StatsCollector
 class QueryExecutor:
-    def __init__(self, cache: CacheManager, registry: ColumnRegistry, source_conn: sqlite3.Connection) -> None:
+    def __init__(
        self,
        cache: CacheManager,
        registry: ColumnRegistry,
        source_conn: Any,  # raw DBAPI connection (pyodbc/sqlite3/…) — only .execute() is used
        stats: StatsCollector,
        delta: dict[str, ResolvedDelta] | None = None,
        ttl: dict[str, int] | None = None,
        index_columns: dict[str, list[str]] | None = None,
    ) -> None:
        self._cache = cache
        self._registry = registry
        self._source_conn = source_conn
        self._stats = stats
        self._delta = delta or {}
        self._ttl = ttl or {}
        self._index_columns = index_columns or {}
    def _ttl_expired(self, table: str) -> bool:
        """True if *table* has a TTL and its cached copy is older than that TTL."""
        ttl = self._ttl.get(table)
        if ttl is None:
            return False
        age = self._cache.seconds_since_refresh(table)
        return age is not None and age > ttl
    def execute(self, parsed: ParsedQuery) -> list[dict]:
-        table = parsed.table
+        for table in parsed.tables:
-        columns = parsed.columns
+            self._ensure_table(table, parsed)
        missing = self._registry.needs_refetch(table, columns)
        table_cached = self._cache.is_table_cached(table)
        if missing or not table_cached:
            if table_cached and missing:
                logger.warning(
                    f"Re-fetching {table!r} — new columns requested: {missing}. "
                    f"Expanding cache from {self._registry.get_columns(table)} + {missing}"
                )
            all_columns = list(self._registry.get_columns(table)) + missing
            self._cache.load_table(table, all_columns, self._source_conn)
            self._registry.update(table, all_columns)
        else:
            logger.debug(f"Cache hit: {table!r} columns={columns}")
        return self._run_in_memory(parsed)
    def _ensure_table(self, table: str, parsed: ParsedQuery) -> None:
        if table in parsed.wildcard_tables:
            self._ensure_full(table)
        else:
            self._ensure_columns(table, parsed.columns_by_table[table])
    def _ensure_full(self, table: str) -> None:
        """Load every column of *table* (SELECT * / t.*), refetching unless already full."""
        cached = self._cache.is_table_cached(table)
        stale = cached and self._ttl_expired(table)
        if cached and self._cache.is_table_full(table) and not stale:
            logger.debug(f"Cache hit (full): {table!r}")
            self._stats.record_hit()
            return
        if cached and stale:
            logger.info(f"Cache expired (ttl) — reloading {table!r} in full.")
            self._stats.record_refetch()
        elif cached:
            logger.warning(f"Re-fetching {table!r} in full — SELECT * requested.")
            self._stats.record_refetch()
        else:
            self._stats.record_miss()
        columns = self._cache.discover_columns(table, self._source_conn)
        self._load(table, columns, full=True)
    def _ensure_columns(self, table: str, columns: list[str]) -> None:
        """Load *table* with at least *columns*, refetching on new columns or TTL expiry."""
        missing = self._registry.needs_refetch(table, columns)
        table_cached = self._cache.is_table_cached(table)
        stale = table_cached and self._ttl_expired(table)
        if table_cached and not missing and not stale:
            logger.debug(f"Cache hit: {table!r} columns={columns}")
            self._stats.record_hit()
            return
        if stale:
            logger.info(f"Cache expired (ttl) — reloading {table!r}.")
            self._stats.record_refetch()
        elif table_cached and missing:
            logger.warning(
                f"Re-fetching {table!r} — new columns requested: {missing}. "
                f"Expanding cache from {self._registry.get_columns(table)} + {missing}"
            )
            self._stats.record_refetch()
        else:
            self._stats.record_miss()
        all_columns = list(self._registry.get_columns(table)) + missing
        # Preserve a fully-cached table's status across a TTL reload.
        full = table_cached and self._cache.is_table_full(table)
        self._load(table, all_columns, full=full)
    def _load(self, table: str, columns: list[str], full: bool) -> None:
        """Fetch *table* into cache, adding delta key/timestamp and index columns."""
        cfg = self._delta.get(table)
        extra = list(self._index_columns.get(table, []))
        if cfg:
            # The cache must always hold the key (to upsert) and the change column
            # (to compute the watermark), even if no query referenced them.
            extra += [*cfg.key_columns, cfg.change_column]
        if extra:
            columns = list(dict.fromkeys([*columns, *extra]))
        self._cache.load_table(table, columns, self._source_conn, full=full)
        self._registry.update(table, columns)
        if cfg:
            self._cache.create_unique_index(table, cfg.key_columns)
            watermark = self._cache.max_value(table, cfg.change_column)
            self._cache.set_last_synced_at(table, watermark)
    def _run_in_memory(self, parsed: ParsedQuery) -> list[dict]:
-        logger.debug(f"Executing in SQLite RAM: {parsed.original_sql!r}")
+        logger.debug(f"Executing in SQLite RAM: {parsed.sqlite_sql!r} params={parsed.params!r}")
-        cursor = self._cache.connection.execute(parsed.original_sql)
+        col_names, rows = self._cache.execute_in_memory(parsed.sqlite_sql, parsed.params)
        col_names = [desc[0] for desc in cursor.description]
        rows = cursor.fetchall()
        return [dict(zip(col_names, row)) for row in rows]
@@ -1,25 +1,34 @@
-from dataclasses import dataclass
+from dataclasses import dataclass, field
 import sqlglot
 import sqlglot.expressions as exp
 from loguru import logger
 from .config import SQL_DIALECT
 from .exceptions import ReadOnlyError, UnsupportedQueryError
 WRITE_TYPES = (exp.Insert, exp.Update, exp.Delete)
 SQLITE_DIALECT = "sqlite"
 # Parameters accepted by execute(): positional (tuple/list of ``?``) or named (dict of ``:name``).
 Params = tuple | list | dict | None
@dataclass
 class ParsedQuery:
-    table: str
+    tables: list[str]
-    columns: list[str]
+    columns_by_table: dict[str, list[str]]
    sqlite_sql: str
    original_sql: str
    params: Params = None
    # Tables that must be loaded in full (SELECT * / t.* / referenced without explicit columns).
    wildcard_tables: set[str] = field(default_factory=set)
-def parse(sql: str) -> ParsedQuery:
+def parse(sql: str, params: Params = None, dialect: str = SQL_DIALECT) -> ParsedQuery:
    logger.debug(f"Parsing SQL: {sql!r}")
-    statement = sqlglot.parse_one(sql)
+    statement = sqlglot.parse_one(sql, dialect=dialect)
    if isinstance(statement, WRITE_TYPES):
        raise ReadOnlyError(
@@ -29,47 +38,104 @@ def parse(sql: str) -> ParsedQuery:
    if not isinstance(statement, exp.Select):
        raise UnsupportedQueryError(f"Only SELECT statements are supported, got: {sql!r}")
-    _check_joins(statement)
+    tables, alias_map = _extract_tables(statement)
-    _check_wildcard(statement)
+    if not tables:
        raise UnsupportedQueryError("SELECT without FROM is not supported.")
-    table = _extract_table(statement)
+    wildcard_tables = _extract_wildcards(statement, tables, alias_map)
-    columns = _extract_columns(statement)
+    columns_by_table = _extract_columns(statement, tables, alias_map, wildcard_tables)
-    logger.debug(f"Parsed → table={table!r}, columns={columns}")
+    # A table that appears in FROM/JOIN but contributes no explicit column must
-    return ParsedQuery(table=table, columns=columns, original_sql=sql)
+    # still be present for the in-memory query — load it in full.
    for table in tables:
        if table not in wildcard_tables and not columns_by_table.get(table):
            wildcard_tables.add(table)
            columns_by_table.pop(table, None)
    sqlite_sql = _to_sqlite(statement)
    logger.debug(
        f"Parsed → tables={tables}, columns={columns_by_table}, "
        f"wildcard={wildcard_tables}, params={params!r}"
    )
    return ParsedQuery(
        tables=tables,
        columns_by_table=columns_by_table,
        sqlite_sql=sqlite_sql,
        original_sql=sql,
        params=params,
        wildcard_tables=wildcard_tables,
    )
-def _check_joins(statement: exp.Select) -> None:
+def _extract_tables(statement: exp.Select) -> tuple[list[str], dict[str, str]]:
-    if statement.find(exp.Join):
+    """Return real table names (first-seen order) and an alias→real-name map."""
-        raise UnsupportedQueryError("JOIN is not supported yet. Use simple single-table SELECT.")
+    real_names: list[str] = []
    alias_map: dict[str, str] = {}
    for table in statement.find_all(exp.Table):
        name = table.name
        if name not in real_names:
            real_names.append(name)
        alias_map[name] = name
        if table.alias:
            alias_map[table.alias] = name
    return real_names, alias_map
-def _check_wildcard(statement: exp.Select) -> None:
+def _extract_wildcards(
    statement: exp.Select, tables: list[str], alias_map: dict[str, str]
 ) -> set[str]:
    """Detect ``SELECT *`` (all tables) and ``alias.*`` (one table) in the projection."""
    wildcard: set[str] = set()
    for projection in statement.expressions:
        if isinstance(projection, exp.Star):
            return set(tables)
        if isinstance(projection, exp.Column) and isinstance(projection.this, exp.Star):
            qualifier = projection.table
            wildcard.add(alias_map.get(qualifier, qualifier))
    return wildcard
 def _extract_columns(
    statement: exp.Select,
    tables: list[str],
    alias_map: dict[str, str],
    wildcard_tables: set[str],
 ) -> dict[str, list[str]]:
    """Map each table to the deduplicated columns referenced anywhere in the query."""
    single = tables[0] if len(tables) == 1 else None
    columns: dict[str, list[str]] = {}
    seen: dict[str, set[str]] = {}
    for col in statement.find_all(exp.Column):
        if isinstance(col.this, exp.Star):
-            raise UnsupportedQueryError("SELECT * is not supported yet. Specify columns explicitly.")
+            continue
-    if statement.find(exp.Star):
+        qualifier = col.table
-        raise UnsupportedQueryError("SELECT * is not supported yet. Specify columns explicitly.")
+        if qualifier:
            table = alias_map.get(qualifier, qualifier)
        elif single is not None:
            table = single
        else:
            raise UnsupportedQueryError(
                f"Unqualified column {col.name!r} is ambiguous in a multi-table query; "
                "qualify it with its table or alias."
            )
        if table in wildcard_tables:
            continue
        bucket = seen.setdefault(table, set())
        if col.name not in bucket:
            bucket.add(col.name)
            columns.setdefault(table, []).append(col.name)
 def _extract_table(statement: exp.Select) -> str:
    from_clause = statement.find(exp.From)
    if not from_clause:
        raise UnsupportedQueryError("SELECT without FROM is not supported.")
    table = from_clause.find(exp.Table)
    if not table:
        raise UnsupportedQueryError("Could not extract table name from query.")
    return table.name
 def _extract_columns(statement: exp.Select) -> list[str]:
    seen: set[str] = set()
    columns: list[str] = []
    for col in statement.find_all(exp.Column):
        name = col.name
        if name not in seen:
            seen.add(name)
            columns.append(name)
    if not columns:
        raise UnsupportedQueryError("Could not extract column names from query.")
    return columns
 def _to_sqlite(statement: exp.Select) -> str:
    """Render the statement as SQLite SQL, stripping catalog/schema prefixes.
    Mutates *statement* in place; callers must extract metadata beforehand.
    """
    for table in statement.find_all(exp.Table):
        table.set("db", None)
        table.set("catalog", None)
    return statement.sql(dialect=SQLITE_DIALECT)
@@ -12,6 +12,16 @@ class ColumnRegistry:
        self._lock = Lock()
        self._ensure_table()
    def rebind(self, mem_conn: sqlite3.Connection) -> None:
        """Point the registry at a new cache connection (after a hard reset).
        ``CacheManager.hard_reset`` closes and reopens the cache connection, so the
        connection object the registry captured at construction becomes invalid.
        """
        with self._lock:
            self._conn = mem_conn
            self._ensure_table()
    def _ensure_table(self) -> None:
        self._conn.execute("""
            CREATE TABLE IF NOT EXISTS _sqlmem_columns (
@@ -0,0 +1,98 @@
 import sqlite3
 import threading
 from dataclasses import dataclass
 class TableState:
    """Live processing state of a cached table (value of ``TableStats.state``)."""
    LOADING = "loading"        # a full load is in progress
    REFRESHING = "refreshing"  # an incremental (delta) refresh is in progress
    READY = "ready"            # cached and idle
    STALE = "stale"            # TTL expired — will reload on next access
    ERROR = "error"            # the last load failed
@dataclass(frozen=True)
 class TableStats:
    rows: int
    columns: list[str]
    # Persisted wall-clock of the last actual data write (full load / delta with rows).
    # Survives restarts. Answers "when did the data last change?".
    last_upsert: str | None
    # In-memory (this process) wall-clock of the last time a refresh cycle ran for the
    # table — bumped even when the cycle wrote nothing. Liveness signal; ``None`` until
    # the first cycle runs after start. Answers "is the refresh loop alive?".
    last_refresh: str | None = None
    state: str = TableState.READY
    tracking: str = "static"  # "delta" | "ttl" | "static"
    # Most recent load/refresh failure for this table, if any. ``consecutive_failures``
    # resets to 0 on the next success, so > 0 means the table is currently failing.
    last_error: str | None = None
    last_error_at: str | None = None
    consecutive_failures: int = 0
@dataclass(frozen=True)
 class Stats:
    hits: int
    misses: int
    refetches: int
    tables: dict[str, TableStats]
    errors: int = 0  # total load/refresh failures since start
 class StatsCollector:
    def __init__(self) -> None:
        self._lock = threading.Lock()
        self.hits = 0
        self.misses = 0
        self.refetches = 0
    def record_hit(self) -> None:
        with self._lock:
            self.hits += 1
    def record_miss(self) -> None:
        with self._lock:
            self.misses += 1
    def record_refetch(self) -> None:
        with self._lock:
            self.refetches += 1
    def snapshot(
        self, conn: sqlite3.Connection, states: dict[str, str] | None = None
    ) -> Stats:
        states = states or {}
        with self._lock:
            hits, misses, refetches = self.hits, self.misses, self.refetches
        tables: dict[str, TableStats] = {}
        cached: set[str] = set()
        for table_name, row_count, last_upsert in conn.execute(
            "SELECT table_name, row_count, last_refresh_at FROM _sqlmem_tables"
        ).fetchall():
            cached.add(table_name)
            columns = [
                r[0]
                for r in conn.execute(
                    "SELECT column_name FROM _sqlmem_columns WHERE table_name = ? ORDER BY column_name",
                    (table_name,),
                ).fetchall()
            ]
            # last_refresh (run/liveness) is filled in by the engine from the
            # in-memory last-run map; only the persisted write time is read here.
            tables[table_name] = TableStats(
                rows=row_count or 0,
                columns=columns,
                last_upsert=last_upsert,
                state=states.get(table_name, TableState.READY),
            )
        # Surface tables that are mid-first-load (not yet in _sqlmem_tables) or failed.
        for name, state in states.items():
            if name not in cached and state in (TableState.LOADING, TableState.ERROR):
                tables[name] = TableStats(rows=0, columns=[], last_upsert=None, state=state)
        return Stats(hits=hits, misses=misses, refetches=refetches, tables=tables)
@@ -1,5 +1,5 @@
 import sqlite3
-from pathlib import Path
+import threading
 import pytest
@@ -59,3 +59,232 @@ def test_backup_and_reload(tmp_path, source_conn):
    c2 = CacheManager(db_path=db_path, backup_interval=9999)
    assert c2.is_table_cached("users") is True
    c2.close()
 # ---------------------------------------------------------------------------
 # Disk-backed mode (in_memory=False)
 # ---------------------------------------------------------------------------
 def test_disk_mode_persists_without_backup(tmp_path, source_conn):
    """Disk mode writes straight to the file — no explicit backup/close needed."""
    db_path = tmp_path / "cache.db"
    c = CacheManager(db_path=db_path, backup_interval=9999, in_memory=False)
    c.load_table("users", ["name"], source_conn)
    # Data is already on disk; a brand-new disk-mode manager sees it.
    c2 = CacheManager(db_path=db_path, backup_interval=9999, in_memory=False)
    assert c2.is_table_cached("users") is True
    c2.close()
    c.close()
 def test_disk_mode_file_created_immediately(tmp_path, source_conn):
    db_path = tmp_path / "cache.db"
    c = CacheManager(db_path=db_path, backup_interval=9999, in_memory=False)
    c.load_table("users", ["name"], source_conn)
    assert db_path.exists()
    c.close()
 def test_disk_mode_reload_in_new_instance(tmp_path, source_conn):
    db_path = tmp_path / "cache.db"
    c1 = CacheManager(db_path=db_path, backup_interval=9999, in_memory=False)
    c1.load_table("users", ["name", "email"], source_conn)
    c1.close()
    c2 = CacheManager(db_path=db_path, backup_interval=9999, in_memory=False)
    rows = c2.connection.execute("SELECT name FROM users").fetchall()
    assert {r[0] for r in rows} == {"alice", "bob"}
    c2.close()
 def test_quoted_reserved_and_spaced_identifiers(tmp_path):
    """Table/column names that are reserved words or contain spaces must work."""
    src = sqlite3.connect(":memory:")
    src.execute('CREATE TABLE "weird tbl" ("order" TEXT, "group by" TEXT)')
    src.executemany('INSERT INTO "weird tbl" VALUES (?, ?)', [("1", "a"), ("2", "b")])
    src.commit()
    c = CacheManager(db_path=tmp_path / "c.db", backup_interval=9999)
    c.load_table("weird tbl", ["order", "group by"], src)
    assert c.is_table_cached("weird tbl") is True
    _, rows = c.execute_in_memory('SELECT "order", "group by" FROM "weird tbl"')
    assert ("1", "a") in rows
    c.close()
    src.close()
 def test_disk_mode_uses_separate_read_connection(tmp_path, source_conn):
    """Disk-mode reads go through a per-thread read connection, not the writer."""
    c = CacheManager(db_path=tmp_path / "c.db", backup_interval=9999, in_memory=False)
    c.load_table("users", ["name", "email"], source_conn)
    _, rows = c.execute_in_memory("SELECT name FROM users ORDER BY name")
    assert [r[0] for r in rows] == ["alice", "bob"]
    assert len(c._read_conns) == 1
    assert c._read_conns[0] is not c.connection  # dedicated read conn
    c.close()
 def test_disk_mode_concurrent_reads(tmp_path, source_conn):
    """Several reader threads each get their own connection and correct results."""
    c = CacheManager(db_path=tmp_path / "c.db", backup_interval=9999, in_memory=False)
    c.load_table("users", ["name"], source_conn)
    results: list[int] = []
    errors: list[Exception] = []
    def reader() -> None:
        try:
            _, rows = c.execute_in_memory("SELECT name FROM users")
            results.append(len(rows))
        except Exception as e:  # noqa: BLE001
            errors.append(e)
    threads = [threading.Thread(target=reader) for _ in range(5)]
    for t in threads:
        t.start()
    for t in threads:
        t.join(5)
    assert not errors
    assert results == [2] * 5
    assert len(c._read_conns) == 5  # one read connection per reader thread
    c.close()
 def test_memory_mode_uses_shared_connection(cache, source_conn):
    """In-memory mode can't share :memory: across connections — no read conns."""
    cache.load_table("users", ["name"], source_conn)
    cache.execute_in_memory("SELECT name FROM users")
    assert cache._read_conns == []
 def test_disk_mode_reset_keeps_file(tmp_path, source_conn):
    db_path = tmp_path / "cache.db"
    c = CacheManager(db_path=db_path, backup_interval=9999, in_memory=False)
    c.load_table("users", ["name"], source_conn)
    c.reset()
    # File stays (the connection is still open) but the table is gone.
    assert db_path.exists()
    assert c.is_table_cached("users") is False
    c.close()
 # ---------------------------------------------------------------------------
 # Pragmas / layout tuning (1.11.0)
 # ---------------------------------------------------------------------------
 def test_pragmas_applied_on_fresh_disk_cache(tmp_path):
    """page_size, auto_vacuum and a generic pragma all take effect on a new file."""
    c = CacheManager(
        db_path=tmp_path / "cache.db",
        backup_interval=9999,
        in_memory=False,
        pragmas={"page_size": 8192, "auto_vacuum": "INCREMENTAL", "cache_size": -2000},
    )
    assert c.connection.execute("PRAGMA page_size").fetchone()[0] == 8192
    assert c.connection.execute("PRAGMA auto_vacuum").fetchone()[0] == 2  # INCREMENTAL
    assert c.connection.execute("PRAGMA cache_size").fetchone()[0] == -2000
    c.close()
 def test_page_size_ignored_on_existing_file_warns(tmp_path):
    """A page_size that differs from the existing file is ignored, with a warning."""
    db_path = tmp_path / "cache.db"
    c1 = CacheManager(db_path=db_path, backup_interval=9999, in_memory=False)
    assert c1.connection.execute("PRAGMA page_size").fetchone()[0] == 4096  # default
    c1.close()
    c2 = CacheManager(
        db_path=db_path,
        backup_interval=9999,
        in_memory=False,
        pragmas={"page_size": 16384},
    )
    # File keeps its original page size; the request is ignored (not an error).
    assert c2.connection.execute("PRAGMA page_size").fetchone()[0] == 4096
    c2.close()
 def test_unknown_pragma_does_not_crash(tmp_path):
    """SQLite ignores unknown/inapplicable pragmas — startup must not fail."""
    c = CacheManager(
        db_path=tmp_path / "cache.db",
        backup_interval=9999,
        in_memory=False,
        pragmas={"this_is_not_a_pragma": 1, "mmap_size": 1024 * 1024},
    )
    assert c.connection.execute("PRAGMA mmap_size").fetchone()[0] == 1024 * 1024
    c.close()
 # ---------------------------------------------------------------------------
 # hard_reset / vacuum (1.11.0)
 # ---------------------------------------------------------------------------
 def test_hard_reset_recreates_file_and_clears_tables(tmp_path, source_conn):
    db_path = tmp_path / "cache.db"
    c = CacheManager(db_path=db_path, backup_interval=9999, in_memory=False)
    c.load_table("users", ["name"], source_conn)
    assert c.is_table_cached("users") is True
    c.hard_reset()
    assert db_path.exists()  # reopened fresh
    assert c.is_table_cached("users") is False
    # The connection is usable again after the swap.
    c.load_table("users", ["name"], source_conn)
    assert c.is_table_cached("users") is True
    c.close()
 def test_hard_reset_applies_new_page_size(tmp_path, source_conn):
    """page_size can't change via reset() but does via hard_reset() (fresh file)."""
    db_path = tmp_path / "cache.db"
    # Existing file at the default 4096; request 8192 — ignored on open.
    CacheManager(db_path=db_path, backup_interval=9999, in_memory=False).close()
    c = CacheManager(
        db_path=db_path,
        backup_interval=9999,
        in_memory=False,
        pragmas={"page_size": 8192},
    )
    assert c.connection.execute("PRAGMA page_size").fetchone()[0] == 4096
    c.hard_reset()  # deletes the file → recreated with the requested page size
    assert c.connection.execute("PRAGMA page_size").fetchone()[0] == 8192
    c.close()
 def test_hard_reset_in_memory_falls_back_to_reset(tmp_path, source_conn):
    c = CacheManager(db_path=tmp_path / "cache.db", backup_interval=9999)
    c.load_table("users", ["name"], source_conn)
    c.hard_reset()  # memory mode → reset()
    assert c.is_table_cached("users") is False
    c.close()
 def test_full_vacuum_runs_on_disk(tmp_path, source_conn):
    db_path = tmp_path / "cache.db"
    c = CacheManager(db_path=db_path, backup_interval=9999, in_memory=False)
    c.load_table("users", ["name"], source_conn)
    c.vacuum(incremental=False)  # must not raise
    assert c.is_table_cached("users") is True
    c.close()
 def test_incremental_vacuum_runs_with_auto_vacuum(tmp_path, source_conn):
    c = CacheManager(
        db_path=tmp_path / "cache.db",
        backup_interval=9999,
        in_memory=False,
        pragmas={"auto_vacuum": "INCREMENTAL"},
    )
    c.load_table("users", ["name"], source_conn)
    c.vacuum(incremental=True, pages=100)  # must not raise
    assert c.is_table_cached("users") is True
    c.close()
 def test_vacuum_in_memory_is_noop(cache, source_conn):
    cache.load_table("users", ["name"], source_conn)
    cache.vacuum(incremental=False)  # no-op, no error
    assert cache.is_table_cached("users") is True
@@ -0,0 +1,191 @@
 import datetime
 import decimal
 import uuid
 import pytest
 from sqlmem._coerce import coerce_params, to_sqlite, to_sqlite_datetime
 from sqlmem.cache import CacheManager
 class _FakeCursor:
    def __init__(self, rows):
        self._rows = list(rows)
        self._pos = 0
        self.description = None
    def fetchall(self):
        out = self._rows[self._pos :]
        self._pos = len(self._rows)
        return out
    def fetchmany(self, size):
        out = self._rows[self._pos : self._pos + size]
        self._pos += len(out)
        return out
 class FakeSource:
    """Stand-in for a pyodbc connection that returns non-sqlite-native types."""
    def __init__(self, rows):
        self._rows = rows
    def execute(self, sql, *args):
        return _FakeCursor(self._rows)
@pytest.fixture
 def cache(tmp_path):
    c = CacheManager(db_path=tmp_path / "cache.db", backup_interval=9999)
    yield c
    c.close()
 # --- to_sqlite / coerce_params unit tests -----------------------------------
 def test_decimal_to_str():
    assert to_sqlite(decimal.Decimal("9.99")) == "9.99"
 def test_decimal_keeps_precision():
    assert to_sqlite(decimal.Decimal("123456789.123456789")) == "123456789.123456789"
 def test_datetime_to_iso():
    assert to_sqlite(datetime.datetime(2026, 6, 1, 10, 0, 0)) == "2026-06-01T10:00:00"
 def test_date_to_iso():
    assert to_sqlite(datetime.date(2026, 6, 1)) == "2026-06-01"
 def test_time_to_iso():
    assert to_sqlite(datetime.time(10, 30, 0)) == "10:30:00"
 def test_uuid_to_str():
    u = uuid.uuid4()
    assert to_sqlite(u) == str(u)
 def test_bytearray_to_bytes():
    assert to_sqlite(bytearray(b"abc")) == b"abc"
@pytest.mark.parametrize("value", [1, 1.5, "text", None, b"blob", True])
 def test_native_values_pass_through(value):
    assert to_sqlite(value) == value
 def test_coerce_params_tuple():
    assert coerce_params((decimal.Decimal("1.5"), "x")) == ("1.5", "x")
 def test_coerce_params_dict():
    assert coerce_params({"p": decimal.Decimal("2")}) == {"p": "2"}
 def test_coerce_params_none():
    assert coerce_params(None) is None
 # --- to_sqlite_datetime (INTEGER µs storage, 1.12.0) ------------------------
 def test_datetime_to_epoch_micros():
    # 2026-06-01T10:00:00Z -> microseconds since epoch
    dt = datetime.datetime(2026, 6, 1, 10, 0, 0, tzinfo=datetime.timezone.utc)
    expected = int(dt.timestamp() * 1_000_000)
    assert to_sqlite_datetime(dt) == expected
 def test_datetime_naive_treated_as_utc():
    naive = datetime.datetime(2026, 6, 1, 10, 0, 0)
    aware = naive.replace(tzinfo=datetime.timezone.utc)
    assert to_sqlite_datetime(naive) == to_sqlite_datetime(aware)
 def test_datetime_micros_are_exact():
    dt = datetime.datetime(2026, 6, 5, 14, 54, 24, 823000, tzinfo=datetime.timezone.utc)
    us = to_sqlite_datetime(dt)
    # round-trips back to the same instant with no rounding loss
    back = datetime.datetime(1970, 1, 1, tzinfo=datetime.timezone.utc) + datetime.timedelta(
        microseconds=us
    )
    assert back == dt
 def test_datetime_none_passes_through():
    assert to_sqlite_datetime(None) is None
 def test_datetime_iso_string_parsed():
    assert to_sqlite_datetime("2026-06-01T10:00:00+00:00") == to_sqlite_datetime(
        datetime.datetime(2026, 6, 1, 10, 0, 0, tzinfo=datetime.timezone.utc)
    )
 def test_datetime_unparseable_is_none():
    assert to_sqlite_datetime("not a date") is None
 # --- integration: datetime_columns are stored as INTEGER --------------------
 def test_datetime_column_stored_as_integer(tmp_path):
    c = CacheManager(
        db_path=tmp_path / "cache.db",
        backup_interval=9999,
        datetime_columns={"t": ["changed"]},
    )
    dt = datetime.datetime(2026, 6, 1, 10, 0, 0, tzinfo=datetime.timezone.utc)
    c.load_table("t", ["id", "changed"], FakeSource([("1", dt)]))
    # Column declared INTEGER, value stored as µs-since-epoch.
    coltype = c.connection.execute("PRAGMA table_info(t)").fetchall()
    types = {row[1]: row[2] for row in coltype}
    assert types["changed"] == "INTEGER"
    assert types["id"] == "TEXT"
    _, out = c.execute_in_memory("SELECT changed FROM t")
    assert out == [(to_sqlite_datetime(dt),)]
    c.close()
 def test_non_datetime_columns_unaffected_by_datetime_columns(tmp_path):
    c = CacheManager(
        db_path=tmp_path / "cache.db",
        backup_interval=9999,
        datetime_columns={"t": ["changed"]},
    )
    c.load_table("t", ["id", "price"], FakeSource([("1", decimal.Decimal("9.99"))]))
    _, out = c.execute_in_memory("SELECT id, price FROM t")
    assert out == [("1", "9.99")]  # still TEXT/ISO coercion
    c.close()
 # --- integration: values reach the cache through coercion -------------------
 def test_load_table_coerces_decimal_and_datetime(cache):
    rows = [("1", decimal.Decimal("9.99"), datetime.datetime(2026, 6, 1, 10, 0, 0))]
    cache.load_table("t", ["id", "price", "changed"], FakeSource(rows))
    _, out = cache.execute_in_memory("SELECT id, price, changed FROM t")
    assert out == [("1", "9.99", "2026-06-01T10:00:00")]
 def test_decimal_where_param_matches_text_value(cache):
    cache.load_table("t", ["price"], FakeSource([("9.99",)]))
    _, out = cache.execute_in_memory(
        "SELECT price FROM t WHERE price = ?", (decimal.Decimal("9.99"),)
    )
    assert out == [("9.99",)]
 def test_upsert_rows_coerces_decimal(cache):
    cache.load_table("t", ["id", "price"], FakeSource([("1", "0")]))
    cache.create_unique_index("t", ["id"])
    cache.upsert_rows("t", ["id", "price"], [("1", decimal.Decimal("12.50"))])
    _, out = cache.execute_in_memory("SELECT price FROM t WHERE id = '1'")
    assert out == [("12.50",)]
@@ -1,6 +1,5 @@
 import importlib
 import pytest
 import sqlmem.config as cfg
@@ -0,0 +1,396 @@
 import sqlite3
 import threading
 from datetime import datetime, timezone
 from types import SimpleNamespace
 import pytest
 from sqlalchemy import create_engine
 import sqlmem.engine as eng_mod
 from sqlmem import CachingEngine, DeltaConfig
 from sqlmem.cache import CacheManager
 from sqlmem.delta import DeltaRefresher, ResolvedDelta, _bind_watermark
 from sqlmem.executor import QueryExecutor
 from sqlmem.parser import parse
 from sqlmem.registry import ColumnRegistry
 from sqlmem.stats import StatsCollector
 def cached_rows(cache, sql):
    cols, rows = cache.execute_in_memory(sql)
    return [dict(zip(cols, row)) for row in rows]
 # ---------------------------------------------------------------------------
 # Refresher unit tests (in-memory source connection)
 # ---------------------------------------------------------------------------
@pytest.fixture
 def source_conn():
    conn = sqlite3.connect(":memory:")
    conn.executescript(
        """
        CREATE TABLE products (id TEXT PRIMARY KEY, name TEXT, price TEXT, changed TEXT);
        INSERT INTO products VALUES
            ('1', 'Widget', '9.99',  '2026-06-01 10:00:00'),
            ('2', 'Gadget', '19.99', '2026-06-01 10:05:00');
        """
    )
    conn.commit()
    yield conn
    conn.close()
@pytest.fixture
 def env(tmp_path, source_conn):
    cache = CacheManager(db_path=tmp_path / "cache.db", backup_interval=9999)
    registry = ColumnRegistry(cache.connection)
    stats = StatsCollector()
    delta = {"products": ResolvedDelta(change_column="changed", key_columns=["id"])}
    executor = QueryExecutor(cache, registry, source_conn, stats, delta)
    refresher = DeltaRefresher(cache, delta)
    # Initial load — caches id, name, price (+ augmented key/change columns).
    executor.execute(parse("SELECT id, name, price FROM products"))
    yield SimpleNamespace(cache=cache, source=source_conn, refresher=refresher)
    cache.close()
 def test_load_augments_key_and_change_columns(env):
    cols = env.cache.get_table_columns("products")
    assert {"id", "name", "price", "changed"}.issubset(set(cols))
 def test_initial_watermark_is_max_change(env):
    assert env.cache.get_last_synced_at("products") == "2026-06-01 10:05:00"
 def test_refresh_applies_updates(env):
    env.source.execute(
        "UPDATE products SET price = '7.77', changed = '2026-06-01 10:10:00' WHERE id = '1'"
    )
    env.source.commit()
    env.refresher.refresh(env.source)
    rows = {r["id"]: r for r in cached_rows(env.cache, "SELECT id, price FROM products")}
    assert rows["1"]["price"] == "7.77"
    assert env.cache.get_last_synced_at("products") == "2026-06-01 10:10:00"
 def test_refresh_inserts_new_rows(env):
    env.source.execute(
        "INSERT INTO products VALUES ('3', 'Sprocket', '5.00', '2026-06-01 10:20:00')"
    )
    env.source.commit()
    env.refresher.refresh(env.source)
    ids = {r["id"] for r in cached_rows(env.cache, "SELECT id FROM products")}
    assert ids == {"1", "2", "3"}
 def test_boundary_timestamp_not_missed_and_idempotent(env):
    # New row sharing the exact watermark timestamp must still be picked up (>=),
    # and the row already at that timestamp must not be duplicated.
    env.source.execute(
        "INSERT INTO products VALUES ('3', 'Sprocket', '5.00', '2026-06-01 10:05:00')"
    )
    env.source.commit()
    env.refresher.refresh(env.source)
    env.refresher.refresh(env.source)  # idempotent — running twice changes nothing
    rows = cached_rows(env.cache, "SELECT id FROM products")
    assert sorted(r["id"] for r in rows) == ["1", "2", "3"]
 def test_delete_by_nulling(env):
    env.source.execute(
        "UPDATE products SET name = NULL, changed = '2026-06-01 10:30:00' WHERE id = '1'"
    )
    env.source.commit()
    env.refresher.refresh(env.source)
    rows = {r["id"]: r for r in cached_rows(env.cache, "SELECT id, name FROM products")}
    assert rows["1"]["name"] is None
 def test_refresh_without_changes_is_noop(env):
    before = cached_rows(env.cache, "SELECT id, name, price FROM products")
    env.refresher.refresh(env.source)
    after = cached_rows(env.cache, "SELECT id, name, price FROM products")
    assert before == after
 # ---------------------------------------------------------------------------
 # Watermark binding — regression for the datetime-as-string delta bug
 # (SQL Server error 241: 'T'-separated 6-digit-microsecond ISO string can't be
 #  implicitly converted varchar->datetime, freezing the delta watermark).
 # ---------------------------------------------------------------------------
 def test_bind_watermark_parses_iso_datetime():
    assert _bind_watermark("2026-06-05T14:54:24.823000") == datetime(
        2026, 6, 5, 14, 54, 24, 823000
    )
 def test_bind_watermark_parses_space_separated():
    assert _bind_watermark("2026-06-01 10:05:00") == datetime(2026, 6, 1, 10, 5, 0)
 def test_bind_watermark_passes_through_non_datetime():
    # Integer rowversion / non-datetime change column — left untouched.
    assert _bind_watermark("12345") == "12345"
 # --- INTEGER µs watermark binding (datetime_columns, 1.12.0) ----------------
 def test_bind_watermark_epoch_us_reconstructs_datetime():
    dt = datetime(2026, 6, 5, 14, 54, 24, 823000, tzinfo=timezone.utc)
    us = int(dt.timestamp() * 1_000_000)
    # Whether the watermark is an int or its digit string (it round-trips through
    # the TEXT last_synced_at column), it binds back to the same UTC datetime.
    assert _bind_watermark(us, epoch_us=True) == dt
    assert _bind_watermark(str(us), epoch_us=True) == dt
 class _SpyCursor:
    def __init__(self, rows):
        self._rows = list(rows)
    def fetchmany(self, n):
        batch, self._rows = self._rows[:n], self._rows[n:]
        return batch
 class _SpySource:
    """Records the parameters bound to each query (stands in for the pyodbc source)."""
    def __init__(self, rows):
        self._rows = rows
        self.bound = []
    def execute(self, sql, params=()):
        self.bound.append((sql, params))
        return _SpyCursor(self._rows)
 def test_refresh_binds_watermark_as_datetime(env):
    """The watermark must reach the source as a real datetime, not a raw ISO
    string — otherwise SQL Server raises error 241 and the delta freezes."""
    env.cache.set_last_synced_at("products", "2026-06-05T14:54:24.823000")
    spy = _SpySource(rows=[("1", "Widget", "9.99", "2026-06-05T14:54:24.823000")])
    env.refresher.refresh(spy)
    assert spy.bound, "source query was never issued"
    _, params = spy.bound[-1]
    assert params == (datetime(2026, 6, 5, 14, 54, 24, 823000),)
 class _RowSource:
    """Returns fixed rows for any query (for loading datetime-typed source data)."""
    def __init__(self, rows):
        self._rows = rows
    def execute(self, sql, params=()):
        return _SpyCursor(self._rows)
 def test_datetime_column_watermark_stored_as_int_and_bound_back(tmp_path):
    """A change column declared in datetime_columns is stored as INTEGER µs; the
    watermark is bound back to a real datetime for the source query."""
    cache = CacheManager(
        db_path=tmp_path / "c.db",
        backup_interval=9999,
        datetime_columns={"products": ["changed"]},
    )
    dt1 = datetime(2026, 6, 1, 10, 0, 0, tzinfo=timezone.utc)
    dt2 = datetime(2026, 6, 1, 10, 5, 0, tzinfo=timezone.utc)
    cache.load_table("products", ["id", "changed"], _RowSource([("1", dt1), ("2", dt2)]))
    cache.create_unique_index("products", ["id"])
    cache.set_last_synced_at("products", cache.max_value("products", "changed"))
    # Watermark persisted as the max INTEGER µs (digit string out of the TEXT col).
    wm = cache.get_last_synced_at("products")
    assert wm == str(int(dt2.timestamp() * 1_000_000))
    refresher = DeltaRefresher(
        cache, {"products": ResolvedDelta("changed", ["id"])}
    )
    spy = _SpySource(rows=[])  # no new rows — just capture the bound watermark
    refresher.refresh(spy)
    assert spy.bound, "source query was never issued"
    _, params = spy.bound[-1]
    assert params == (dt2,)  # bound back as datetime, not an int/string
    cache.close()
 # ---------------------------------------------------------------------------
 # Refresh failures are recorded (4.3) so a stuck delta is visible in stats
 # ---------------------------------------------------------------------------
 class _RaisingSource:
    def execute(self, sql, params=()):
        raise RuntimeError("boom 241")
 def test_failed_delta_refresh_records_error(env):
    env.refresher.refresh(_RaisingSource())
    err = env.cache.get_errors()["products"]
    assert err.consecutive == 1
    assert "boom 241" in err.message
    assert env.cache.error_total == 1
    # State is marked error even though the cache still holds the last-good data.
    assert env.cache.get_states()["products"] == "error"
 def test_delta_success_resets_failure_streak(env):
    env.refresher.refresh(_RaisingSource())
    assert env.cache.get_errors()["products"].consecutive == 1
    env.refresher.refresh(env.source)  # real source — succeeds
    assert env.cache.get_errors()["products"].consecutive == 0
 # ---------------------------------------------------------------------------
 # last_upsert (persisted write) vs last_refresh (in-memory run/liveness)
 # ---------------------------------------------------------------------------
 def _persisted_last_upsert(cache, table):
    row = cache.connection.execute(
        "SELECT last_refresh_at FROM _sqlmem_tables WHERE table_name = ?", (table,)
    ).fetchone()
    return row[0] if row else None
 def test_empty_delta_records_run_but_not_write(env):
    """An empty delta cycle bumps last_refresh (liveness) but not the persisted
    last_upsert (write time)."""
    before = _persisted_last_upsert(env.cache, "products")
    # Push the watermark past every source row so the next cycle returns 0 rows.
    env.cache.set_last_synced_at("products", "2099-01-01 00:00:00")
    env.refresher.refresh(env.source)
    # No rows written → persisted write time unchanged.
    assert _persisted_last_upsert(env.cache, "products") == before
    # But the cycle ran → in-memory run time recorded (and at/after the last write).
    runs = env.cache.get_last_runs()
    assert runs["products"] is not None
    assert runs["products"] >= before
 # ---------------------------------------------------------------------------
 # Engine-level: PK auto-discovery, reset, end-to-end refresh
 # ---------------------------------------------------------------------------
@pytest.fixture
 def source_db(tmp_path):
    db_path = tmp_path / "source.db"
    conn = sqlite3.connect(db_path)
    conn.executescript(
        """
        CREATE TABLE products (id TEXT PRIMARY KEY, name TEXT, changed TEXT);
        INSERT INTO products VALUES ('1', 'Widget', '2026-06-01 10:00:00');
        CREATE VIEW vw_products AS SELECT id, name FROM products;
        """
    )
    conn.commit()
    conn.close()
    return db_path
@pytest.fixture
 def source_engine(source_db):
    engine = create_engine(f"sqlite:///{source_db}")
    yield engine
    engine.dispose()
@pytest.fixture
 def patched_cache(tmp_path, monkeypatch):
    monkeypatch.setattr(eng_mod, "CACHE_DB_PATH", tmp_path / "cache.db")
    monkeypatch.setattr(eng_mod, "BACKUP_INTERVAL_SECONDS", 9999)
 def test_pk_auto_discovery(source_engine, patched_cache):
    engine = CachingEngine(source_engine, delta={"products": DeltaConfig(change_column="changed")})
    assert engine._delta["products"].key_columns == ["id"]
    engine.close()
 def test_view_without_key_raises(source_engine, patched_cache):
    with pytest.raises(ValueError):
        CachingEngine(source_engine, delta={"vw_products": DeltaConfig(change_column="name")})
 def test_engine_reset(source_engine, patched_cache):
    engine = CachingEngine(source_engine)
    engine.execute("SELECT id, name FROM products")
    assert engine._cache.is_table_cached("products") is True
    engine.reset()
    assert engine._cache.is_table_cached("products") is False
    engine.close()
 def test_startup_catch_up_is_non_blocking_by_default(source_engine, patched_cache, monkeypatch):
    """By default the startup catch-up runs on the background thread, not the
    main thread, so it never blocks application startup."""
    threads: list[str] = []
    started = threading.Event()
    real = eng_mod.CachingEngine._run_refresh
    def spy(self):
        threads.append(threading.current_thread().name)
        started.set()
        return real(self)
    monkeypatch.setattr(eng_mod.CachingEngine, "_run_refresh", spy)
    engine = CachingEngine(
        source_engine, delta={"products": DeltaConfig("changed", ["id"])}
    )
    # __init__ has returned; the main thread must not have run the catch-up.
    assert "MainThread" not in threads
    assert started.wait(2), "background catch-up never ran"
    assert threads == ["sqlmem-delta"]
    engine.close()
 def test_blocking_startup_refresh_runs_synchronously(source_engine, patched_cache, monkeypatch):
    threads: list[str] = []
    real = eng_mod.CachingEngine._run_refresh
    def spy(self):
        threads.append(threading.current_thread().name)
        return real(self)
    monkeypatch.setattr(eng_mod.CachingEngine, "_run_refresh", spy)
    engine = CachingEngine(
        source_engine,
        delta={"products": DeltaConfig("changed", ["id"])},
        blocking_startup_refresh=True,
    )
    # Opt-in: the catch-up ran on the main thread before __init__ returned.
    assert "MainThread" in threads
    engine.close()
 def test_engine_delta_refresh_end_to_end(source_engine, source_db, patched_cache):
    engine = CachingEngine(
        source_engine, delta={"products": DeltaConfig(change_column="changed", key_columns=["id"])}
    )
    engine.execute("SELECT id, name FROM products")  # caches, watermark = 10:00
    conn = sqlite3.connect(source_db)
    conn.execute("UPDATE products SET name = 'Widget2', changed = '2026-06-01 10:06:00' WHERE id = '1'")
    conn.execute("INSERT INTO products VALUES ('2', 'Gadget', '2026-06-01 10:05:00')")
    conn.commit()
    conn.close()
    engine.refresh()
    rows = {r["id"]: r for r in engine.execute("SELECT id, name FROM products")}
    assert rows["1"]["name"] == "Widget2"
    assert rows["2"]["name"] == "Gadget"
    engine.close()
@@ -1,5 +1,4 @@
 import sqlite3
 from pathlib import Path
 import pytest
 from sqlalchemy import create_engine
@@ -125,6 +124,22 @@ def test_second_query_same_columns_is_cache_hit(engine):
    assert len(rows) == 3
 def test_cache_hit_does_not_open_source(engine, source_engine, monkeypatch):
    """A pure cache hit must not open a source connection (lazy source)."""
    engine.execute("SELECT id, name FROM products")  # miss → caches
    calls = {"n": 0}
    original_connect = source_engine.connect
    def counting_connect(*args, **kwargs):
        calls["n"] += 1
        return original_connect(*args, **kwargs)
    monkeypatch.setattr(source_engine, "connect", counting_connect)
    engine.execute("SELECT id, name FROM products")  # hit → no source access
    assert calls["n"] == 0
 # ---------------------------------------------------------------------------
 # SQL file creation — backup to disk
 # ---------------------------------------------------------------------------
@@ -215,16 +230,60 @@ def test_delete_raises_readonly(engine):
        engine.execute("DELETE FROM products WHERE id = '1'")
-def test_join_raises_unsupported(engine):
+def test_ambiguous_unqualified_join_column_raises(engine):
    with pytest.raises(UnsupportedQueryError):
        engine.execute(
-            "SELECT p.name, o.qty FROM products p JOIN orders o ON p.id = o.product_id"
+            "SELECT name FROM products p JOIN orders o ON p.id = o.product_id"
        )
-def test_select_star_raises_unsupported(engine):
+# ---------------------------------------------------------------------------
-    with pytest.raises(UnsupportedQueryError):
+# R1 — parametrized queries
-        engine.execute("SELECT * FROM products")
+# ---------------------------------------------------------------------------
 def test_positional_param(engine):
    rows = engine.execute("SELECT id, name FROM products WHERE id = ?", ("1",))
    assert rows == [{"id": "1", "name": "Widget"}]
 def test_named_param(engine):
    rows = engine.execute("SELECT name FROM products WHERE id = :id", {"id": "2"})
    assert rows == [{"name": "Gadget"}]
 # ---------------------------------------------------------------------------
 # R2 — JOIN support
 # ---------------------------------------------------------------------------
 def test_join_two_tables(engine):
    rows = engine.execute(
        "SELECT p.name, o.qty FROM products p "
        "JOIN orders o ON p.id = o.product_id WHERE p.id = ?",
        ("1",),
    )
    assert rows == [{"name": "Widget", "qty": "2"}]
 def test_join_caches_both_tables(engine):
    engine.execute(
        "SELECT p.name, o.qty FROM products p JOIN orders o ON p.id = o.product_id"
    )
    assert engine._cache.is_table_cached("products") is True
    assert engine._cache.is_table_cached("orders") is True
 # ---------------------------------------------------------------------------
 # R3 — SELECT *
 # ---------------------------------------------------------------------------
 def test_select_star_returns_all_columns(engine):
    rows = engine.execute("SELECT * FROM products WHERE id = '1'")
    assert rows == [{"id": "1", "name": "Widget", "price": "9.99"}]
 def test_select_star_marks_table_full(engine):
    engine.execute("SELECT * FROM products")
    assert engine._cache.is_table_full("products") is True
 # ---------------------------------------------------------------------------
@@ -246,3 +305,119 @@ def test_invalidate_then_refetch_works(engine):
 def test_invalidate_unknown_table_is_noop(engine):
    engine.invalidate("nonexistent_table")  # must not raise
 # ---------------------------------------------------------------------------
 # Disk-backed cache (in_memory=False)
 # ---------------------------------------------------------------------------
 def test_disk_mode_query_works(source_engine, cache_path, monkeypatch):
    monkeypatch.setattr(eng_mod, "CACHE_DB_PATH", cache_path)
    monkeypatch.setattr(eng_mod, "BACKUP_INTERVAL_SECONDS", 9999)
    ce = CachingEngine(source_engine, in_memory=False)
    rows = ce.execute("SELECT id, name FROM products")
    assert {r["name"] for r in rows} == {"Widget", "Gadget", "Doohickey"}
    assert ce._cache._in_memory is False
    ce.close()
 def test_disk_mode_persists_across_instances(source_engine, cache_path, monkeypatch):
    monkeypatch.setattr(eng_mod, "CACHE_DB_PATH", cache_path)
    monkeypatch.setattr(eng_mod, "BACKUP_INTERVAL_SECONDS", 9999)
    ce1 = CachingEngine(source_engine, in_memory=False)
    ce1.execute("SELECT id, name FROM products")
    ce1.close()
    # Second instance opens the same on-disk cache and finds the table already there.
    ce2 = CachingEngine(source_engine, in_memory=False)
    assert ce2._cache.is_table_cached("products") is True
    rows = ce2.execute("SELECT id, name FROM products")
    assert {r["name"] for r in rows} == {"Widget", "Gadget", "Doohickey"}
    ce2.close()
 def test_in_memory_override_respects_config(source_engine, cache_path, monkeypatch):
    """in_memory=None falls back to the IN_MEMORY config default."""
    monkeypatch.setattr(eng_mod, "CACHE_DB_PATH", cache_path)
    monkeypatch.setattr(eng_mod, "BACKUP_INTERVAL_SECONDS", 9999)
    monkeypatch.setattr(eng_mod, "IN_MEMORY", False)
    ce = CachingEngine(source_engine)  # no explicit in_memory
    assert ce._cache._in_memory is False
    ce.close()
 # ---------------------------------------------------------------------------
 # Per-engine configuration (constructor overrides env defaults)
 # ---------------------------------------------------------------------------
 def test_constructor_config_overrides(source_engine, tmp_path):
    p = tmp_path / "explicit_cache.db"
    ce = CachingEngine(
        source_engine,
        cache_db_path=p,
        fetch_batch=3,
        dialect="sqlite",
        backup_interval=12345,
        refresh_interval=42,
        in_memory=False,
    )
    ce.execute("SELECT id, name FROM products")
    assert p.exists()
    assert ce._cache._fetch_batch == 3
    assert ce._cache._dialect == "sqlite"
    assert ce._dialect == "sqlite"
    assert ce._cache._backup_interval == 12345
    assert ce._refresh_interval == 42
    ce.close()
 def test_two_engines_separate_cache_files(source_engine, tmp_path):
    """Two engines in one process can target different cache files."""
    a = CachingEngine(source_engine, cache_db_path=tmp_path / "a.db", in_memory=False)
    b = CachingEngine(source_engine, cache_db_path=tmp_path / "b.db", in_memory=False)
    a.execute("SELECT id FROM products")
    assert (tmp_path / "a.db").exists()
    assert a._cache.is_table_cached("products") is True
    assert b._cache.is_table_cached("products") is False  # independent cache
    a.close()
    b.close()
 # ---------------------------------------------------------------------------
 # Pragmas / hard_reset / vacuum (1.11.0)
 # ---------------------------------------------------------------------------
 def test_engine_passes_pragmas_to_cache(source_engine, tmp_path):
    ce = CachingEngine(
        source_engine,
        cache_db_path=tmp_path / "cache.db",
        in_memory=False,
        pragmas={"page_size": 8192, "auto_vacuum": "INCREMENTAL"},
    )
    assert ce._cache.connection.execute("PRAGMA page_size").fetchone()[0] == 8192
    assert ce._cache.connection.execute("PRAGMA auto_vacuum").fetchone()[0] == 2
    ce.close()
 def test_engine_hard_reset_reloads(source_engine, tmp_path):
    ce = CachingEngine(source_engine, cache_db_path=tmp_path / "cache.db", in_memory=False)
    ce.execute("SELECT id FROM products")
    assert ce._cache.is_table_cached("products") is True
    ce.hard_reset()
    assert ce._cache.is_table_cached("products") is False
    rows = ce.execute("SELECT id, name FROM products")  # reloads on next use
    assert len(rows) == 3
    ce.close()
 def test_engine_vacuum_runs(source_engine, tmp_path):
    ce = CachingEngine(source_engine, cache_db_path=tmp_path / "cache.db", in_memory=False)
    ce.execute("SELECT id FROM products")
    ce.vacuum(incremental=False)  # must not raise
    assert ce._cache.is_table_cached("products") is True
    ce.close()
@@ -0,0 +1,122 @@
 import sqlite3
 import pytest
 from sqlmem.cache import CacheManager
 from sqlmem.executor import QueryExecutor
 from sqlmem.parser import parse
 from sqlmem.registry import ColumnRegistry
 from sqlmem.stats import StatsCollector
@pytest.fixture
 def source_conn():
    conn = sqlite3.connect(":memory:")
    conn.executescript(
        """
        CREATE TABLE users (id TEXT, name TEXT, status TEXT);
        INSERT INTO users VALUES ('1', 'alice', 'active'), ('2', 'bob', 'inactive');
        CREATE TABLE orders (id TEXT, user_id TEXT, total TEXT, title TEXT);
        INSERT INTO orders VALUES ('10', '1', '99', 'first'), ('11', '2', '5', 'second');
        """
    )
    conn.commit()
    yield conn
    conn.close()
@pytest.fixture
 def executor(tmp_path, source_conn):
    cache = CacheManager(db_path=tmp_path / "cache.db", backup_interval=9999)
    registry = ColumnRegistry(cache.connection)
    stats = StatsCollector()
    ex = QueryExecutor(cache, registry, source_conn, stats)
    yield ex
    cache.close()
 def run(executor, sql, params=None):
    return executor.execute(parse(sql, params))
 # --- R1: parameters ---------------------------------------------------------
 def test_param_filters_in_memory(executor):
    rows = run(executor, "SELECT id, name FROM users WHERE id = ?", ("1",))
    assert rows == [{"id": "1", "name": "alice"}]
 def test_param_no_match(executor):
    rows = run(executor, "SELECT name FROM users WHERE id = ?", ("999",))
    assert rows == []
 def test_named_params(executor):
    rows = run(executor, "SELECT name FROM users WHERE id = :id", {"id": "2"})
    assert rows == [{"name": "bob"}]
 # --- cache hit / miss / refetch --------------------------------------------
 def test_cache_hit_does_not_refetch(executor):
    run(executor, "SELECT name FROM users")
    run(executor, "SELECT name FROM users")
    assert executor._stats.hits == 1
    assert executor._stats.misses == 1
 def test_new_column_triggers_refetch(executor):
    run(executor, "SELECT name FROM users")
    run(executor, "SELECT name, status FROM users")
    assert executor._stats.misses == 1
    assert executor._stats.refetches == 1
 # --- R2: JOINs --------------------------------------------------------------
 def test_join_across_two_tables(executor):
    rows = run(
        executor,
        "SELECT u.name, o.title FROM users u "
        "JOIN orders o ON o.user_id = u.id WHERE u.id = ?",
        ("1",),
    )
    assert rows == [{"name": "alice", "title": "first"}]
 def test_join_caches_each_table_independently(executor):
    run(
        executor,
        "SELECT u.name, o.title FROM users u JOIN orders o ON o.user_id = u.id",
    )
    # two distinct tables loaded → two misses
    assert executor._stats.misses == 2
    assert executor._cache.is_table_cached("users")
    assert executor._cache.is_table_cached("orders")
 # --- R3: SELECT * -----------------------------------------------------------
 def test_select_star_returns_all_columns(executor):
    rows = run(executor, "SELECT * FROM users WHERE id = ?", ("1",))
    assert rows == [{"id": "1", "name": "alice", "status": "active"}]
 def test_select_star_marks_table_full_and_hits(executor):
    run(executor, "SELECT * FROM users")
    run(executor, "SELECT * FROM users")
    assert executor._cache.is_table_full("users")
    assert executor._stats.misses == 1
    assert executor._stats.hits == 1
 def test_column_query_after_star_is_a_hit(executor):
    run(executor, "SELECT * FROM users")
    run(executor, "SELECT name FROM users")
    # full table already cached → specific column is a hit, no refetch
    assert executor._stats.refetches == 0
    assert executor._stats.hits == 1
@@ -0,0 +1,116 @@
 import sqlite3
 import pytest
 from sqlalchemy import create_engine
 import sqlmem.engine as eng_mod
 from sqlmem import CachingEngine
 from sqlmem.cache import CacheManager
 def index_names(conn, table=None):
    sql = "SELECT name FROM sqlite_master WHERE type = 'index'"
    return {r[0] for r in conn.execute(sql).fetchall()}
 # --- cache level ------------------------------------------------------------
@pytest.fixture
 def source_conn():
    conn = sqlite3.connect(":memory:")
    conn.execute("CREATE TABLE big (id TEXT, val TEXT)")
    conn.executemany(
        "INSERT INTO big VALUES (?, ?)", [(str(i), f"v{i}") for i in range(100)]
    )
    conn.commit()
    yield conn
    conn.close()
@pytest.fixture
 def cache(tmp_path):
    c = CacheManager(db_path=tmp_path / "cache.db", backup_interval=9999)
    yield c
    c.close()
 def test_index_created_on_load(cache, source_conn):
    cache.add_index("big", ["val"])
    cache.load_table("big", ["id", "val"], source_conn)
    assert "sqlmem_idx_big_val" in index_names(cache.connection)
 def test_index_used_by_query_planner(cache, source_conn):
    cache.add_index("big", ["val"])
    cache.load_table("big", ["id", "val"], source_conn)
    plan = cache.connection.execute(
        "EXPLAIN QUERY PLAN SELECT id FROM big WHERE val = 'v50'"
    ).fetchall()
    assert any("sqlmem_idx_big_val" in str(row) for row in plan)
 def test_index_skipped_when_columns_not_cached(cache, source_conn):
    cache.add_index("big", ["missing_col"])
    cache.load_table("big", ["id", "val"], source_conn)  # must not raise
    assert "sqlmem_idx_big_missing_col" not in index_names(cache.connection)
 def test_index_recreated_on_reload(cache, source_conn):
    cache.add_index("big", ["val"])
    cache.load_table("big", ["id", "val"], source_conn)
    cache.load_table("big", ["id", "val"], source_conn)  # reload (staging swap)
    assert "sqlmem_idx_big_val" in index_names(cache.connection)
 # --- engine level -----------------------------------------------------------
@pytest.fixture
 def source_engine(tmp_path):
    db_path = tmp_path / "source.db"
    conn = sqlite3.connect(db_path)
    conn.execute("CREATE TABLE products (id TEXT, name TEXT, price TEXT)")
    conn.executemany(
        "INSERT INTO products VALUES (?, ?, ?)",
        [(str(i), f"n{i}", f"{i}.00") for i in range(20)],
    )
    conn.commit()
    conn.close()
    engine = create_engine(f"sqlite:///{db_path}")
    yield engine
    engine.dispose()
@pytest.fixture
 def patched_cache(tmp_path, monkeypatch):
    monkeypatch.setattr(eng_mod, "CACHE_DB_PATH", tmp_path / "cache.db")
    monkeypatch.setattr(eng_mod, "BACKUP_INTERVAL_SECONDS", 9999)
 def test_index_column_auto_loaded_even_if_not_selected(source_engine, patched_cache):
    engine = CachingEngine(source_engine, indexes={"products": ["name"]})
    engine.execute("SELECT id FROM products")  # does not select 'name'
    cols = {
        r[1]
        for r in engine._cache.connection.execute("PRAGMA table_info(products)").fetchall()
    }
    assert "name" in cols  # pulled in so the index can be built
    assert "sqlmem_idx_products_name" in index_names(engine._cache.connection)
    engine.close()
 def test_composite_index(source_engine, patched_cache):
    engine = CachingEngine(source_engine, indexes={"products": [["name", "price"]]})
    engine.execute("SELECT id FROM products")
    assert "sqlmem_idx_products_name_price" in index_names(engine._cache.connection)
    engine.close()
 def test_index_survives_invalidate_and_reload(source_engine, patched_cache):
    engine = CachingEngine(source_engine, indexes={"products": ["name"]})
    engine.execute("SELECT id, name FROM products")
    engine.invalidate("products")
    engine.execute("SELECT id, name FROM products")
    assert "sqlmem_idx_products_name" in index_names(engine._cache.connection)
    engine.close()
@@ -0,0 +1,105 @@
 import sqlite3
 import pytest
 from sqlmem.cache import CacheManager
@pytest.fixture
 def source_conn():
    conn = sqlite3.connect(":memory:")
    conn.execute("CREATE TABLE big (id TEXT, val TEXT)")
    conn.executemany(
        "INSERT INTO big VALUES (?, ?)", [(str(i), f"v{i}") for i in range(5)]
    )
    conn.commit()
    yield conn
    conn.close()
@pytest.fixture
 def cache(tmp_path):
    c = CacheManager(db_path=tmp_path / "cache.db", backup_interval=9999)
    yield c
    c.close()
@pytest.fixture
 def small_batches(monkeypatch):
    # Force multiple fetch batches over the 5 source rows.
    monkeypatch.setattr("sqlmem.cache.FETCH_BATCH_SIZE", 2)
 def test_batched_load_loads_all_rows(cache, source_conn, small_batches):
    cache.load_table("big", ["id", "val"], source_conn)
    _, rows = cache.execute_in_memory(
        "SELECT id, val FROM big ORDER BY CAST(id AS INTEGER)"
    )
    assert len(rows) == 5
    assert rows[0] == ("0", "v0")
    assert rows[-1] == ("4", "v4")
 def test_no_staging_table_left_behind(cache, source_conn, small_batches):
    cache.load_table("big", ["id", "val"], source_conn)
    names = {
        r[0]
        for r in cache.connection.execute(
            "SELECT name FROM sqlite_master WHERE type = 'table'"
        ).fetchall()
    }
    assert "big" in names
    assert not any(n.endswith("__sqlmem_load") for n in names)
 def test_reload_replaces_data_atomically(cache, source_conn, small_batches):
    cache.load_table("big", ["id", "val"], source_conn)
    source_conn.execute("DELETE FROM big")
    source_conn.execute("INSERT INTO big VALUES ('99', 'new')")
    source_conn.commit()
    cache.load_table("big", ["id", "val"], source_conn)
    _, rows = cache.execute_in_memory("SELECT id, val FROM big")
    assert rows == [("99", "new")]
 def test_load_sets_ready_state(cache, source_conn):
    cache.load_table("big", ["id", "val"], source_conn)
    assert cache.get_states()["big"] == "ready"
 def test_orphan_staging_dropped_on_startup(tmp_path, source_conn):
    # Simulate a crash mid-load: a staging table persisted into cache.db.
    db_path = tmp_path / "cache.db"
    c1 = CacheManager(db_path=db_path, backup_interval=9999)
    c1.load_table("big", ["id", "val"], source_conn)
    c1.connection.execute("CREATE TABLE big__sqlmem_load (id TEXT, val TEXT)")
    c1.connection.commit()
    c1.close()  # backup writes the staging table to disk
    c2 = CacheManager(db_path=db_path, backup_interval=9999)
    names = {
        r[0]
        for r in c2.connection.execute(
            "SELECT name FROM sqlite_master WHERE type = 'table'"
        ).fetchall()
    }
    c2.close()
    assert "big" in names  # real table survives
    assert not any(n.endswith("__sqlmem_load") for n in names)  # orphan cleaned
 def test_failed_load_sets_error_state_and_cleans_staging(cache):
    empty_source = sqlite3.connect(":memory:")  # has no 'big' table
    try:
        with pytest.raises(sqlite3.OperationalError):
            cache.load_table("big", ["id"], empty_source)
        assert cache.get_states()["big"] == "error"
        names = {
            r[0]
            for r in cache.connection.execute(
                "SELECT name FROM sqlite_master WHERE type = 'table'"
            ).fetchall()
        }
        assert not any(n.endswith("__sqlmem_load") for n in names)
    finally:
        empty_source.close()
@@ -0,0 +1,24 @@
 from loguru import logger
 import sqlmem
 def test_add_sink_idempotent_no_duplicate_lines():
    """Calling add_sink twice for the same sink must not duplicate log lines."""
    sqlmem._added_sinks.clear()
    msgs: list[str] = []
    sink = lambda message: msgs.append(str(message))  # noqa: E731
    try:
        sqlmem.add_sink(sink, level="DEBUG", colorize=False)
        sqlmem.add_sink(sink, level="DEBUG", colorize=False)  # second call: no-op
        assert len(sqlmem._added_sinks) == 1
        # Emit one record that passes the "sqlmem" name filter.
        logger.patch(lambda r: r.update(name="sqlmem")).info("hello sqlmem")
        assert sum("hello sqlmem" in m for m in msgs) == 1
    finally:
        for handler_id in sqlmem._added_sinks.values():
            logger.remove(handler_id)
        sqlmem._added_sinks.clear()
        logger.disable("sqlmem")  # restore the default-silent state for other tests
@@ -6,16 +6,22 @@ from sqlmem.parser import parse
 def test_simple_select():
    result = parse("SELECT name, email FROM users WHERE status = 'active'")
-    assert result.table == "users"
+    assert result.tables == ["users"]
    cols = result.columns_by_table["users"]
    # WHERE columns are also extracted — needed for in-memory SQLite filtering
-    assert {"name", "email"}.issubset(set(result.columns))
+    assert {"name", "email"}.issubset(set(cols))
-    assert "status" in result.columns
+    assert "status" in cols
 def test_multiple_columns():
    result = parse("SELECT a, b, c FROM orders")
-    assert result.table == "orders"
+    assert result.tables == ["orders"]
-    assert set(result.columns) == {"a", "b", "c"}
+    assert set(result.columns_by_table["orders"]) == {"a", "b", "c"}
 def test_columns_deduplicated_in_order():
    result = parse("SELECT a, a, b FROM t WHERE a > 1")
    assert result.columns_by_table["t"] == ["a", "b"]
 def test_insert_raises_readonly():
@@ -33,11 +39,77 @@ def test_delete_raises_readonly():
        parse("DELETE FROM users WHERE id = 1")
-def test_wildcard_raises_unsupported():
+def test_select_without_from_raises():
    with pytest.raises(UnsupportedQueryError):
-        parse("SELECT * FROM users")
+        parse("SELECT 1")
-def test_join_raises_unsupported():
+# --- R1: parameters ---------------------------------------------------------
 def test_params_stored():
    result = parse("SELECT name FROM users WHERE id = ?", ("7189790",))
    assert result.params == ("7189790",)
    assert "?" in result.sqlite_sql
 def test_named_params_preserved():
    result = parse("SELECT name FROM users WHERE id = :id", {"id": 1})
    assert ":id" in result.sqlite_sql
 # --- R2: JOINs --------------------------------------------------------------
 def test_join_extracts_all_tables():
    result = parse(
        "SELECT a.id, b.title FROM users a "
        "JOIN orders b ON a.id = b.user_id WHERE a.id = ?",
        (1,),
    )
    assert set(result.tables) == {"users", "orders"}
    assert "id" in result.columns_by_table["users"]
    assert "title" in result.columns_by_table["orders"]
    # join + where columns resolved to their tables via alias
    assert "user_id" in result.columns_by_table["orders"]
 def test_join_unqualified_column_is_ambiguous():
    with pytest.raises(UnsupportedQueryError):
-        parse("SELECT a.name, b.title FROM users a JOIN orders b ON a.id = b.user_id")
+        parse("SELECT name FROM users a JOIN orders b ON a.id = b.user_id")
 # --- R3: SELECT * -----------------------------------------------------------
 def test_wildcard_marks_table_full():
    result = parse("SELECT * FROM users")
    assert result.wildcard_tables == {"users"}
    assert result.columns_by_table == {}
 def test_qualified_wildcard_marks_only_that_table():
    result = parse(
        "SELECT u.*, o.total FROM users u JOIN orders o ON u.id = o.user_id"
    )
    assert "users" in result.wildcard_tables
    assert "orders" not in result.wildcard_tables
    assert "total" in result.columns_by_table["orders"]
 # --- R4: three-part names (MSSQL brackets) ----------------------------------
 def test_three_part_name_uses_base_table():
    result = parse(
        "SELECT [PRODUCT_PRODUCTNR], [PRAT_NAME] "
        "FROM [DP_PIM].[dbo].[VW_P_PRATVALUES] WHERE PRODUCT_PRODUCTNR = ?",
        ("7189790",),
    )
    assert result.tables == ["VW_P_PRATVALUES"]
    cols = result.columns_by_table["VW_P_PRATVALUES"]
    assert {"PRODUCT_PRODUCTNR", "PRAT_NAME"}.issubset(set(cols))
    # in-memory SQL must drop the catalog/schema prefix
    assert "DP_PIM" not in result.sqlite_sql
    assert "dbo" not in result.sqlite_sql
    assert "VW_P_PRATVALUES" in result.sqlite_sql
@@ -0,0 +1,158 @@
 import sqlite3
 import threading
 import pytest
 from sqlalchemy import create_engine
 import sqlmem.engine as eng_mod
 from sqlmem import CachingEngine, DeltaConfig
 from sqlmem.cache import CacheManager
 from sqlmem.stats import StatsCollector
@pytest.fixture
 def source_engine(tmp_path):
    db_path = tmp_path / "source.db"
    conn = sqlite3.connect(db_path)
    conn.executescript(
        """
        CREATE TABLE products (id TEXT PRIMARY KEY, name TEXT, changed TEXT);
        INSERT INTO products VALUES ('1', 'Widget', '2026-06-01 10:00:00');
        """
    )
    conn.commit()
    conn.close()
    engine = create_engine(f"sqlite:///{db_path}")
    yield engine
    engine.dispose()
@pytest.fixture
 def patched_cache(tmp_path, monkeypatch):
    monkeypatch.setattr(eng_mod, "CACHE_DB_PATH", tmp_path / "cache.db")
    monkeypatch.setattr(eng_mod, "BACKUP_INTERVAL_SECONDS", 9999)
 def test_static_table_state_and_tracking(source_engine, patched_cache):
    engine = CachingEngine(source_engine)
    engine.execute("SELECT id, name FROM products")
    s = engine.stats.tables["products"]
    assert s.state == "ready"
    assert s.tracking == "static"
    assert s.rows == 1
    engine.close()
 def test_delta_table_tracking(source_engine, patched_cache):
    engine = CachingEngine(
        source_engine, delta={"products": DeltaConfig("changed", ["id"])}
    )
    engine.execute("SELECT id, name FROM products")
    s = engine.stats.tables["products"]
    assert s.tracking == "delta"
    assert s.state == "ready"
    engine.close()
 def test_ttl_table_reports_stale(source_engine, patched_cache):
    engine = CachingEngine(source_engine, ttl={"products": 0})
    engine.execute("SELECT id, name FROM products")
    s = engine.stats.tables["products"]
    assert s.tracking == "ttl"
    assert s.state == "stale"  # ttl=0 → already past its max age
    engine.close()
 def test_counters_still_reported(source_engine, patched_cache):
    engine = CachingEngine(source_engine)
    engine.execute("SELECT id, name FROM products")
    engine.execute("SELECT id, name FROM products")
    stats = engine.stats
    assert stats.misses == 1
    assert stats.hits == 1
    engine.close()
 def test_stats_exposes_table_error(source_engine, patched_cache):
    engine = CachingEngine(source_engine)
    engine.execute("SELECT id, name FROM products")
    engine._cache.record_error("products", "ValueError: boom")
    s = engine.stats
    assert s.errors == 1
    assert s.tables["products"].consecutive_failures == 1
    assert s.tables["products"].last_error == "ValueError: boom"
    assert s.tables["products"].last_error_at is not None
    engine.close()
 def test_stats_no_error_by_default(source_engine, patched_cache):
    engine = CachingEngine(source_engine)
    engine.execute("SELECT id, name FROM products")
    s = engine.stats
    assert s.errors == 0
    assert s.tables["products"].consecutive_failures == 0
    assert s.tables["products"].last_error is None
    engine.close()
 def test_stats_exposes_last_upsert_and_last_refresh(source_engine, patched_cache):
    engine = CachingEngine(source_engine)
    engine.execute("SELECT id, name FROM products")
    s = engine.stats.tables["products"]
    assert s.last_upsert is not None   # the load wrote rows (persisted)
    assert s.last_refresh is not None  # the load also counts as a refresh-cycle run
    engine.close()
 # --- a table being loaded for the first time shows up as "loading" ----------
 def test_snapshot_surfaces_a_loading_table(tmp_path):
    cache = CacheManager(db_path=tmp_path / "cache.db", backup_interval=9999)
    snap = StatsCollector().snapshot(cache.connection, {"pending": "loading"})
    assert "pending" in snap.tables
    assert snap.tables["pending"].state == "loading"
    assert snap.tables["pending"].rows == 0
    cache.close()
 def test_loading_state_visible_from_another_thread_during_load(tmp_path):
    """A first load in progress is observable as 'loading' from another thread."""
    cache = CacheManager(db_path=tmp_path / "cache.db", backup_interval=9999)
    started = threading.Event()
    release = threading.Event()
    class BlockingCursor:
        def __init__(self, rows):
            self._rows = list(rows)
            self._done = False
        def fetchmany(self, size):
            if self._done:
                return []
            started.set()
            release.wait(5)  # hold the load open until the test releases it
            self._done = True
            return self._rows
    class BlockingSource:
        def execute(self, sql):
            return BlockingCursor([("1", "alice")])
    loader = threading.Thread(
        target=cache.load_table, args=("users", ["id", "name"], BlockingSource())
    )
    loader.start()
    try:
        assert started.wait(5), "load did not start"
        # mid-load: not yet in _sqlmem_tables, but surfaced as loading
        assert cache.get_states()["users"] == "loading"
        snap = StatsCollector().snapshot(cache.connection, cache.get_states())
        assert snap.tables["users"].state == "loading"
    finally:
        release.set()
    loader.join(5)
    assert not loader.is_alive()
    assert cache.get_states()["users"] == "ready"
    cache.close()
@@ -0,0 +1,137 @@
 import sqlite3
 import pytest
 from sqlalchemy import create_engine
 import sqlmem.engine as eng_mod
 from sqlmem import CachingEngine, DeltaConfig
 from sqlmem.cache import CacheManager
 from sqlmem.executor import QueryExecutor
 from sqlmem.parser import parse
 from sqlmem.registry import ColumnRegistry
 from sqlmem.stats import StatsCollector
@pytest.fixture
 def source_conn():
    conn = sqlite3.connect(":memory:")
    conn.executescript(
        """
        CREATE TABLE products (id TEXT, name TEXT, price TEXT);
        INSERT INTO products VALUES ('1', 'Widget', '9.99'), ('2', 'Gadget', '19.99');
        """
    )
    conn.commit()
    yield conn
    conn.close()
 def make_executor(tmp_path, source_conn, ttl):
    cache = CacheManager(db_path=tmp_path / "cache.db", backup_interval=9999)
    registry = ColumnRegistry(cache.connection)
    stats = StatsCollector()
    executor = QueryExecutor(cache, registry, source_conn, stats, None, ttl)
    return executor
 def run(executor, sql, params=None):
    return executor.execute(parse(sql, params))
 # --- lazy (read-time) guarantee --------------------------------------------
 def test_ttl_zero_reloads_every_access(tmp_path, source_conn):
    executor = make_executor(tmp_path, source_conn, ttl={"products": 0})
    run(executor, "SELECT id, price FROM products")  # miss → load
    source_conn.execute("UPDATE products SET price = '1.11' WHERE id = '1'")
    source_conn.commit()
    rows = {r["id"]: r for r in run(executor, "SELECT id, price FROM products")}
    assert rows["1"]["price"] == "1.11"  # stale → reloaded, sees new value
    assert executor._stats.refetches == 1
    assert executor._stats.misses == 1
 def test_ttl_fresh_is_cache_hit(tmp_path, source_conn):
    executor = make_executor(tmp_path, source_conn, ttl={"products": 9999})
    run(executor, "SELECT id, price FROM products")
    source_conn.execute("UPDATE products SET price = '1.11' WHERE id = '1'")
    source_conn.commit()
    rows = {r["id"]: r for r in run(executor, "SELECT id, price FROM products")}
    assert rows["1"]["price"] == "9.99"  # still fresh → old cached value served
    assert executor._stats.hits == 1
    assert executor._stats.refetches == 0
 def test_ttl_preserves_full_status(tmp_path, source_conn):
    executor = make_executor(tmp_path, source_conn, ttl={"products": 0})
    run(executor, "SELECT * FROM products")  # full load
    run(executor, "SELECT * FROM products")  # stale → full reload
    assert executor._cache.is_table_full("products") is True
 def test_untracked_table_never_expires(tmp_path, source_conn):
    executor = make_executor(tmp_path, source_conn, ttl={"other": 0})
    run(executor, "SELECT id, name FROM products")
    source_conn.execute("UPDATE products SET name = 'X' WHERE id = '1'")
    source_conn.commit()
    rows = {r["id"]: r for r in run(executor, "SELECT id, name FROM products")}
    assert rows["1"]["name"] == "Widget"  # no TTL on this table → cache hit
    assert executor._stats.hits == 1
 # --- engine-level: background refresh + config validation -------------------
@pytest.fixture
 def source_db(tmp_path):
    db_path = tmp_path / "source.db"
    conn = sqlite3.connect(db_path)
    conn.executescript(
        """
        CREATE TABLE products (id TEXT PRIMARY KEY, name TEXT, changed TEXT);
        INSERT INTO products VALUES ('1', 'Widget', '2026-06-01 10:00:00');
        """
    )
    conn.commit()
    conn.close()
    return db_path
@pytest.fixture
 def source_engine(source_db):
    engine = create_engine(f"sqlite:///{source_db}")
    yield engine
    engine.dispose()
@pytest.fixture
 def patched_cache(tmp_path, monkeypatch):
    monkeypatch.setattr(eng_mod, "CACHE_DB_PATH", tmp_path / "cache.db")
    monkeypatch.setattr(eng_mod, "BACKUP_INTERVAL_SECONDS", 9999)
 def test_background_ttl_refresh(source_engine, source_db, patched_cache):
    engine = CachingEngine(source_engine, ttl={"products": 0})
    engine.execute("SELECT id, name FROM products")
    conn = sqlite3.connect(source_db)
    conn.execute("UPDATE products SET name = 'Widget2' WHERE id = '1'")
    conn.commit()
    conn.close()
    engine.refresh()  # background-style full reload of the expired table
    rows = engine.execute("SELECT id, name FROM products")
    assert rows[0]["name"] == "Widget2"
    engine.close()
 def test_delta_and_ttl_overlap_raises(source_engine, patched_cache):
    with pytest.raises(ValueError):
        CachingEngine(
            source_engine,
            delta={"products": DeltaConfig(change_column="changed", key_columns=["id"])},
            ttl={"products": 300},
        )
Author	SHA1	Message	Date
Jan Doubravský	8e46ee3547	Store named datetime columns as INTEGER microseconds (datetime_columns)	2026-06-09 18:18:38 +02:00
Jan Doubravský	a21b5a2a04	Add pragmas, hard_reset, and vacuum for tuning disk-backed caches	2026-06-09 17:58:41 +02:00
Jan Doubravský	8744f458cc	Split last_upsert (persisted write) and last_refresh (run liveness) in stats	2026-06-09 08:48:29 +02:00
Jan Doubravský	6dc85e4f3c	Fix frozen delta watermark and add error stats, lazy source, concurrent disk reads, and per-engine config	2026-06-08 19:35:33 +02:00
Jan Doubravský	209ae667ab	Add disk-backed SQLite cache mode as an alternative to in-memory	2026-06-08 11:39:04 +02:00
Jan Doubravský	757a8f4eba	Add secondary indexes to accelerate cache lookups	2026-06-05 18:17:55 +02:00
Jan Doubravský	286a5f207d	Batch large-table loads to bound memory and add per-table state to stats	2026-06-05 14:44:07 +02:00
Jan Doubravský	85bb84a1a6	Add per-table TTL refresh for tables without a change column	2026-06-05 12:12:57 +02:00
Jan Doubravský	33aa126ff6	Add incremental delta refresh and fix Decimal/datetime cache binding	2026-06-05 11:09:16 +02:00
Jan Doubravský	530c2618cf	Add support for query parameters, JOINs, SELECT * and three-part table names	2026-06-04 18:25:47 +02:00
Honza	b044ca43f8	Add runtime statistics via engine.stats	2026-06-03 09:48:33 +02:00