26 KiB
26 KiB
Changelog
All notable changes to this project will be documented in this file.
[Unreleased]
[1.16.0] - 2026-06-11
Added
- Declarative table specs —
CachingEngine(tables=[TableSpec(...)])— declare each cached table up front (columns, indexes, refresh strategy, datetime columns, preload) instead of letting the engine learn columns lazily from queries. New public typesTableSpec,TTL,Delta(a friendly alias ofDeltaConfig) and exceptionUndeclaredError.- Background preload —
preload=Truetables are loaded at startup (on the background thread by default, so startup isn't blocked;blocking_startup_refresh=Trueloads them synchronously). A copy already fresh in the persistent cache is skipped via the same double-checked locking added in 1.15.0, so a warm restart is instant. - Fail-fast on undeclared access — in declarative mode a query referencing a table that has no
TableSpec, or a column outside a spec's declaredcolumns(includingSELECT *on a column-restricted table), raisesUndeclaredErrorinstead of silently triggering an expensive lazy load / column-expansion. Declarecolumns=Noneto cache the whole table and allow any column. - Solves the lazy second-reload — because columns are declared, a first query for a previously unseen column no longer forces a full re-fetch.
- Background preload —
executor.ensure_loaded(table, columns)— preloads a table into the cache (reusing the full load path: delta/index augmentation, registry, watermark, double-checked locking) without materializing any rows.
Fixed
- Race on the shared cache connection — the metadata reads (
is_table_cached,is_table_full,seconds_since_refresh,get_table_columns,get_last_synced_at,max_value,count_rows) touched the single shared SQLite connection without the connection lock, so a query thread reading while the background refresh/preload thread wrote could raisesqlite3.InterfaceError. These reads now take the lock. More likely to surface now that startup preload adds background-thread activity.
Changed
pyproject.toml— bumped version to1.16.0.- Fully backward compatible — omit
tables=and the legacydelta=/ttl=/indexes=/datetime_columns=kwargs behave exactly as before (lazy mode, no fail-fast). Passing bothtables=and any of those kwargs raisesValueError;tables=is internally converted to the same config.
[1.15.0] - 2026-06-11
Fixed
- Cache stampede (thundering herd) on cold loads — the decision to load a table was made before the load lock was taken, and
load_tablenever re-checked after acquiring it. During a slow cold load of a large table (observed: 212M rows, ~2 h), a second query for the same table passed the pre-lock "not cached" check, queued on the load lock, and then ran a redundant second full reload instead of seeing the first had finished — doubling a multi-hour load.load_tablenow does double-checked locking: after acquiring the load lock it re-evaluates a caller-supplied predicate (table cached, all needed columns present, not TTL-expired) and skips the load when it is already satisfied. Invisible on small tables; on large ones it removes hours of redundant indexing under concurrent cold-start traffic.
Changed
pyproject.toml— bumped version to1.15.0.CacheManager.load_tablegained an optionalrecheckcallback (the double-check predicate);QueryExecutorsupplies it for both column andSELECT *loads.
[1.14.0] - 2026-06-10
Follow-up to 1.12.0 from running datetime_columns in production: the feature was only half-wired (writes were coerced, reads and query params were not).
Fixed
WHEREon an INTEGER-µsdatetime_columnscolumn silently returned 0 rows —execute_in_memory()coerced query params withto_sqlite(), which leaves an ISO string a string. Comparing the storedINTEGERagainst aTEXTparam is always false under SQLite affinity, soWHERE CHANGE_DATE > '2026-05-01T…'matched nothing. Params for a query that touches adatetime_columnstable are now coerced to epoch µs (datetime objects and ISO-datetime strings alike), so the comparison matches the stored integer. Scoped to the query's tables, so non-datetime queries are unaffected.
Added
- Read-time coercion —
datetime_columnscome back asdatetime—execute()now returns those columns as realdatetimeobjects (UTC) instead of the raw INTEGER µs, restoring the transparent-proxy contract (you get the same type a direct source query would give). Opt out withCachingEngine(..., return_datetime=False)to get the raw integers. Stats.db_size_bytes— on-disk size of the cache file (0 in memory mode), soengine.statsexposes cache growth for monitoring without an external file check.- Public
datetime_to_epoch_ushelper —from sqlmem import datetime_to_epoch_usexposes the same datetime→epoch-µs conversion used internally, so callers buildingWHERE change_col > ?params don't have to re-implement it.
Changed
pyproject.toml— bumped version to1.14.0.vacuum(incremental=True)now warns instead of silently doing nothing when the cache was not created withauto_vacuum=INCREMENTAL(the only mode in which incremental vacuum can reclaim pages); it logs how to fix it (hard_reset()with the pragma, or a fullvacuum(incremental=False)) and returns.CacheManager.execute_in_memory()gained an optionaltablesargument (the query's tables) used to scope datetime param/result coercion;CacheManager/CachingEnginegained areturn_datetimeflag.
[1.12.0] - 2026-06-09
⚠️ Breaking
SCHEMA_VERSIONbumped3→4— on upgrade the existing cache is wiped automatically (disk mode wipes the file in place, in-memory discards the backup) and reloaded from the source on next use. For a large cache (e.g. a multi-hundred-million-row table) the full reload can take a while; deploy in a maintenance window.datetime_columnschange the public output contract for the chosen columns — a column listed indatetime_columnsis stored and returned as an INTEGER (microseconds since the Unix epoch, UTC), not an ISOTEXTstring. This is opt-in per column, so no table is affected unless you name its columns; consumers that read or filter such a column must adapt (compare against integer µs, or convert on read).
Added
datetime_columns=parameter onCachingEngine/CacheManager—datetime_columns={"VW_X": ["CHANGE_DATE"]}stores the named datetime columns as INTEGER µs-since-epoch instead of ~28-byte ISOTEXT. Saves ~20 bytes per row and makes index comparisons on the column operate on native integers instead of string collation — worthwhile for a pure datetime column on a very large table (e.g. a delta change column that is also range-scanned)._coerce.to_sqlite_datetime()converts datetimes (and ISO/datevalues) to exact integer microseconds via integer arithmetic (no float rounding); a naive datetime is treated as UTC,Nonepasses through.load_tabledeclares those columnsINTEGERandupsert_rowscoerces them the same way, so full loads and delta upserts agree on the on-disk representation.- The delta high-watermark for such a column is the stored integer;
delta._bind_watermark(..., epoch_us=True)reconstructs a real UTCdatetimebefore binding, so the source still receives a typed timestamp (and the watermark fix from 1.8.0 keeps holding).
Changed
pyproject.toml— bumped version to1.12.0.CacheManager.max_value/set_last_synced_atnow accept/returnintwatermarks alongsidestr(the INTEGER-µs watermark round-trips through thelast_synced_atTEXT column as its digit string).
[1.11.0] - 2026-06-09
Added
pragmas=parameter onCachingEngine/CacheManager— pass a dict of SQLite PRAGMAs (e.g.mmap_size,cache_size,temp_store,page_size,auto_vacuum) applied to the cache connection at open time, so disk-backed caches can be tuned for the host's I/O profile without bypassingCacheManager. Unknown/inapplicable pragmas are silently ignored by SQLite (graceful degradation, no startup crash).page_sizeis a layout pragma: it is applied only on a fresh file (set before WAL / the first table). On an existing cache with a different page size the request is ignored and a one-time warning is logged — the new value takes effect only afterhard_reset()or a rebuild.auto_vacuumis set before the database header is materialized (before switching to WAL) on a fresh file, soINCREMENTAL/FULLactually stick instead of silently reverting toNONE.
CachingEngine.hard_reset()/CacheManager.hard_reset()— close every connection, delete the on-disk cache file (and its-wal/-shmsidecars) and reopen from scratch with all current pragmas applied. Unlikereset()(which drops tables but keeps the open file), this letspage_size/auto_vacuumchange, since those are baked into the file at creation. Disk mode only — falls back toreset()in memory mode. All tables reload on next use.CachingEngine.vacuum(incremental=True, pages=10_000)/CacheManager.vacuum(...)— run maintenance VACUUM on the on-disk cache to reclaim free pages left by deltaINSERT OR REPLACEchurn. Incremental (default) reclaims up topagespages without blocking readers or extra disk (requiresauto_vacuum=INCREMENTAL);incremental=Falseruns a full VACUUM (rewrites the file, ~2× disk, blocks readers — maintenance window only). No-op in memory mode.
Changed
pyproject.toml— bumped version to1.11.0.ColumnRegistrygainedrebind()so it follows the cache connection swap performed byhard_reset()(the registry previously captured the connection for the process lifetime).
[1.10.0] - 2026-06-09
Added
last_upsert(persisted write) vslast_refresh(run/liveness) instats—TableStats.last_refreshpreviously came from the persistedlast_refresh_atcolumn, which is only written when rows are actually written (a delta cycle withtotal == 0early-returns and leaves it unchanged). A healthy delta that keeps finding no new rows therefore looked frozen. The single value is now split:last_upsert— wall-clock (UTC) of the last actual data write (full load / delta with rows). Persisted, survives restarts (this is the existinglast_refresh_atcolumn, surfaced under a clearer name).last_refresh— wall-clock (UTC) of the last time a refresh cycle ran for the table, even when it wrote nothing. In-memory per process (Noneuntil the first cycle after start), tracked like_states/_errors— so no schema change and no cache wipe.CacheManagergainedmark_refresh_ran()/get_last_runs(); an empty delta cycle now records a run. TTL staleness still uses the last write (seconds_since_refreshreadslast_refresh_at), so behaviour is unchanged.
Changed
pyproject.toml— bumped version to1.10.0.TableStats.last_refreshis nowstr | None(wasstr) and a new requiredlast_upsert: str | Nonefield is added. Consumers readinglast_refreshfor "when did data change?" should switch tolast_upsert.
[1.8.0] - 2026-06-08
Fixed
- Frozen delta watermark on
datetimechange columns — the delta high-watermark is read back from the cache as an ISOTEXTstring (e.g.'2026-06-05T14:54:24.823000') and was bound straight back to the source. SQL Server then had to implicitly convert thatnvarchartodatetimeand failed (T-separated ISO with 6 fractional digits exceedsdatetime's 3 — error 241 / SQLSTATE 22007), so every delta refresh and the startup catch-up died before streaming and the watermark never advanced (the cache silently froze at the last full load). The watermark is now parsed back to a realdatetime(delta._bind_watermark) so the driver sends a typed timestamp and the comparison runs natively; non-datetime change columns (e.g. integer rowversions) pass through unchanged. Regression tests added.
Added
- Refresh/load failures are now visible in
stats—TableStatsgainedlast_error,last_error_atandconsecutive_failures, andStatsgained a totalerrorscounter. A delta that fails before streaming (e.g. the watermark bug above) previously leftstate = ready, hiding the problem; it now also marks the tableerrorand records the message.consecutive_failuresresets to 0 on the next success. - Per-engine configuration —
CachingEngineacceptscache_db_path,backup_interval,refresh_interval,fetch_batchanddialect(each defaults to its env var / config global when omitted), so two engines with independent cache files can run in one process and config is testable without env vars. blocking_startup_refreshflag (defaultFalse) — the startup catch-up (deltas/TTL reloads for tables restored from disk) now runs on the background thread by default, so it never blocks application startup. Passblocking_startup_refresh=Trueto catch up synchronously before serving.
Changed
- SQL identifiers are quoted — table/column names are now quoted everywhere they are interpolated into statements (SQLite double-quote for the cache, the configured dialect — e.g. T-SQL
[brackets]— for the source), so reserved words or names with spaces work and the f-string interpolation is hardened. - Source connection opened lazily —
execute()no longer opens a source connection on every call; a pure cache hit never touches the source (and never occupies a pool slot). The misleadingcast(sqlite3.Connection, …)on the source handle was removed (it is a pyodbc connection in production). - Concurrent reads in disk mode — disk-backed reads now use a per-thread read-only WAL connection instead of sharing the single write connection under a lock, so a slow
SELECTno longer blocks writers (loads/upserts) or other readers. In-memory mode is unchanged (a:memory:database can't be shared across connections). add_sinkis idempotent — calling it again for the same sink is a no-op, so a double import no longer duplicates every log line.pyproject.toml— bumped version to1.8.0; added a scoped pytestfilterwarningsfor the SQLite test source's legacy datetime-adapter deprecation.
Note
- Cache type fidelity (returning real
datetime/Decimal/numeric types fromexecute()instead ofTEXTstrings, and giving numeric columns proper affinity) was evaluated but deferred — it changes the public output contract that consumers currently rely on (and thattest_coerce.pypins). Decimal/datetime stay stored as exact, losslessTEXT.
[1.7.0] - 2026-06-08
Added
- Disk-backed cache mode —
CachingEngine(engine, in_memory=False)(or envSQLMEM_IN_MEMORY=false) queries the on-diskcache.dbdirectly instead of loading it into an in-memory SQLite. Every write persists immediately (no hourly backup thread, no load-on-startup copy, noatexit/SIGTERMflush needed), and the cache may exceed available RAM. The disk connection uses WAL +synchronous=NORMALfor write throughput. In-memory mode (backed up to disk periodically) remains the default.in_memorydefaults to theSQLMEM_IN_MEMORYconfig when omitted.- On open, a disk cache with a mismatched
schema_versionis wiped in place and rebuilt. engine.reset()in disk mode drops the cached tables andVACUUMs the file (it does not unlink the open file).
- On open, a disk cache with a mismatched
SQLMEM_IN_MEMORYenv var (defaulttrue).
Changed
pyproject.toml— bumped version to1.7.0cache.py—CacheManagergained anin_memoryflag; the cache connection (_mem_conn→_conn) is opened either on:memory:or directly on the on-disk file. Disk mode skips the load-on-startup copy, backup thread, and shutdown flush, andreset()VACUUMs in place instead of unlinking the open file..gitignore— ignorecache.dband its WAL sidecars (cache.db-wal,cache.db-shm).
[1.6.0] - 2026-06-05
Added
- Secondary indexes —
CachingEngine(engine, indexes={"VW_X": ["col", ["a", "b"]]})creates indexes on the in-memory cache to accelerateWHERE/JOINlookups. Index columns are auto-loaded so the index exists from the first load, and indexes are recreated after every (re)load and persist incache.db. Combines freely withdeltaandttl.
Changed
pyproject.toml— bumped version to1.6.0
[1.5.0] - 2026-06-05
Added
- Per-table processing state in
stats—TableStatsnow carriesstate(loading/refreshing/ready/stale/error) andtracking(delta/ttl/static), so callers can see whether each table is up to date or being processed. In-progress first loads and failed loads also surface instats.tables. SQLMEM_FETCH_BATCHenv var (default10000) — rows fetched per batch when loading a table.
Changed
pyproject.toml— bumped version to1.5.0- Large-table loads are streamed in batches —
load_tableno longerfetchall()s the whole table (which double-buffered every row in Python and could OOM/crash on tens of millions of rows). Rows are now fetchedSQLMEM_FETCH_BATCHat a time into a staging table and swapped in atomically, so peak memory stays bounded, the previous copy stays queryable during a reload, and the network fetch no longer holds the cache lock. Delta catch-ups are streamed the same way. - Orphan staging tables left by an interrupted load (crash/backup mid-load) are dropped on startup.
- Delta upserts compute
row_countonce per refresh instead of a fullCOUNT(*)after every batch (avoids O(rows×batches) work on large catch-ups).
[1.4.0] - 2026-06-05
Fixed
decimal.Decimal(anddatetime) binding error —NUMERIC/DECIMAL/MONEYcolumns from SQL Server (pyodbc) arrive asdecimal.Decimal, whichsqlite3cannot bind, crashing the cache load withtype 'decimal.Decimal' is not supported. Values are now coerced to sqlite-bindable types (Decimal→str,datetime/date/time→ISO,uuid.UUID→str,bytearray→bytes) at the cache boundary — on full load, on delta upsert, and for WHERE parameters. Coercion is local (no globalsqlite3.register_adapter), so the host application'ssqlite3behaviour is untouched. Cache columns areTEXT, so the conversion is lossless and exact (no rounding).
Added
- Incremental (delta) refresh —
CachingEngine(engine, delta={...})withDeltaConfig(change_column, key_columns). Delta-tracked tables are kept in sync by pulling only changed rows (WHERE change_column >= watermark) and upserting them by key, instead of full reloads.- Data-driven high-watermark =
max(change_column)cached, persisted incache.db;>=overlap + idempotent upsert so no row is missed and boundary rows are harmlessly re-read. - Catch-up on startup (since last shutdown) and a background thread refreshing every
SQLMEM_REFRESH_INTERVALseconds (default 300);engine.refresh()triggers a pull on demand. - Primary key is auto-discovered from the source DB (
inspect(engine).get_pk_constraint) whenkey_columnsis omitted; required explicitly for views (raisesValueError).
- Data-driven high-watermark =
- Per-table TTL (time-based refresh) —
CachingEngine(engine, ttl={"VW_X": 300})for tables with no change column that can't be delta-synced. The cached copy is guaranteed never older than the TTL: a query touching an expired table triggers a full reload before it is answered (read-time guarantee), and the background thread proactively reloads expired tables. TTL age uses the persistedlast_refresh_at, so the bound holds across restarts. A table in bothdeltaandttlraisesValueError. DeltaConfigexported from the public API.engine.reset()— wipes the whole cache (RAM +cache.db) for a clean rebuild after structural source changes.SQLMEM_REFRESH_INTERVALenv var (default300) — background refresh tick for delta pulls and proactive TTL reloads.
Changed
pyproject.toml— bumped version to1.4.0cache.py— schema version bumped to3;_sqlmem_tablesgained alast_synced_atwatermark column. New methods:execute_in_memory(lock-serialized read),get_table_columns,create_unique_index,get/set_last_synced_at,max_value,upsert_rows,seconds_since_refresh,reset. Existing on-disk caches are discarded and rebuilt on load.executor.py— delta-tracked tables augment their column set with key/change columns (unique key index + initial watermark); TTL-tracked tables full-reload at read time when expired; in-memory reads go through the cache lock.
[1.2.0] - 2026-06-04
Added
- Parametrized queries (R1) —
execute(sql, params)accepts positional (?tuple/list) and named (:namedict) parameters; passed straight to SQLite during in-memory filtering. Cache loads still fetch the full table (parameters are not applied to source fetches). - JOIN support (R2) — multi-table SELECTs are parsed into per-table column sets; each table is cached independently and the JOIN runs in the in-memory SQLite. Columns in a multi-table query must be qualified by table or alias.
SELECT *support (R3) — wildcard (andalias.*) queries discover all columns from the source DB, cache the whole table, and mark itis_fullso later column queries are guaranteed cache hits without re-fetch.- Three-part table names (R4) —
[catalog].[schema].[table]is parsed to its base name for caching; the in-memory query is rewritten to strip catalog/schema prefixes so it runs under SQLite. SQLMEM_SQL_DIALECTenv var (defaulttsql) — sqlglot dialect used to parse incoming SQL; T-SQL also accepts ANSI SQL and MSSQL bracket quoting.CacheManager.discover_columns()andCacheManager.is_table_full();load_table()gained afullflag.
Changed
pyproject.toml— bumped version to1.2.0parser.py—ParsedQuery.table: strreplaced bytables: list[str]pluscolumns_by_table,sqlite_sql,params, andwildcard_tables; SQL is parsed with the configured dialect and rendered to SQLite for execution.executor.py— loads each referenced table independently and applies query parameters during in-memory execution.cache.py— schema version bumped to2;_sqlmem_tablesgained anis_fullcolumn (existing on-disk caches are discarded and rebuilt on load).
[1.1.0] - 2026-06-03
Added
StatsandTableStatsfrozen dataclasses — snapshot of runtime cache statistics (hit/miss/refetch counts, per-table row count, columns, last refresh timestamp)StatsCollector— internal thread-safe counter; increments on every cache hit, miss, and re-fetchengine.statsproperty — returns aStatssnapshot at any point in timeStatsandTableStatsexported from the public API
Changed
pyproject.toml— bumped version to1.1.0
[1.0.0] - 2026-06-03
Changed
pyproject.toml— bumped version to1.0.0
[0.4.0] - 2026-06-03
Added
add_sink(sink, *, level, **kwargs)— public API for routing sqlmem log records to any loguru-compatible sink (stream, file, callable); supports all logurulogger.add()kwargs includingrotation,retention, etc.
Changed
pyproject.toml— bumped version to0.4.0config.py— replaced destructivelogger.remove()+ forced default sink withlogger.disable("sqlmem"); sqlmem is now silent by default and does not interfere with the host application's logging setup
[0.3.0] - 2026-06-03
Added
README.md— full project documentation: architecture overview, quick start, cache behaviour, persistence, configuration, exceptions, logging, and limitations
Changed
pyproject.toml— bumped version to0.3.0parser.py—_extract_columnsnow deduplicates column names while preserving order.gitignore— added.envand.env.*to prevent accidental commit of environment files
Security
- Removed
.envfrom git tracking (git rm --cached)
[0.2.0] - 2026-06-01
Added
- Project specification in
project.md— architecture, API design, cache backend, metadata schema, logging strategy, and TODO for future features (JOIN, SELECT * support) .gitignorefor Python/Poetry projectpyproject.tomldependencies:sqlglot,sqlalchemy,loguru,python-dotenv; dev dependencies:pytest,ruff,mypysrc/sqlmem/package structure with src layoutsrc/sqlmem/exceptions.py—ReadOnlyError(blocks INSERT/UPDATE/DELETE),UnsupportedQueryError(blocks JOIN and SELECT *)src/sqlmem/config.py— loads.env, configuresloguruwith DEBUG/INFO level based onSQLMEM_DEBUGsrc/sqlmem/_meta.py— package version constantsrc/sqlmem/parser.py— SQL Parser usingsqlglot; extracts table and columns from SELECT, raises on writes/JOIN/wildcardsrc/sqlmem/registry.py— Column Registry; accumulates requested columns per table, detects missing columns requiring re-fetchsrc/sqlmem/cache.py— Cache Manager; SQLite in-memory storage, load fromcache.dbon startup (with schema version check), hourly backup thread,atexit/SIGTERM flush, metadata tables (_sqlmem_meta,_sqlmem_tables,_sqlmem_columns)src/sqlmem/executor.py— Query Executor; cache hit/miss logic, re-fetch on new columns with WARNING logsrc/sqlmem/engine.py—CachingEnginewrapper; public API compatible with SQLAlchemy,invalidate(table)for manual cache clearingsrc/sqlmem/__init__.py— public exports:CachingEngine,ReadOnlyError,UnsupportedQueryErrortests/test_parser.py— parser tests: SELECT parsing, ReadOnlyError, UnsupportedQueryErrortests/test_cache.py— cache tests: load, data correctness, metadata, disk backup/reloadtests/test_registry.py— registry tests: accumulation, needs_refetch, table isolation