23 KiB
23 KiB
Changelog
All notable changes to this project will be documented in this file.
[Unreleased]
[1.15.0] - 2026-06-11
Fixed
- Cache stampede (thundering herd) on cold loads — the decision to load a table was made before the load lock was taken, and
load_tablenever re-checked after acquiring it. During a slow cold load of a large table (observed: 212M rows, ~2 h), a second query for the same table passed the pre-lock "not cached" check, queued on the load lock, and then ran a redundant second full reload instead of seeing the first had finished — doubling a multi-hour load.load_tablenow does double-checked locking: after acquiring the load lock it re-evaluates a caller-supplied predicate (table cached, all needed columns present, not TTL-expired) and skips the load when it is already satisfied. Invisible on small tables; on large ones it removes hours of redundant indexing under concurrent cold-start traffic.
Changed
pyproject.toml— bumped version to1.15.0.CacheManager.load_tablegained an optionalrecheckcallback (the double-check predicate);QueryExecutorsupplies it for both column andSELECT *loads.
[1.14.0] - 2026-06-10
Follow-up to 1.12.0 from running datetime_columns in production: the feature was only half-wired (writes were coerced, reads and query params were not).
Fixed
WHEREon an INTEGER-µsdatetime_columnscolumn silently returned 0 rows —execute_in_memory()coerced query params withto_sqlite(), which leaves an ISO string a string. Comparing the storedINTEGERagainst aTEXTparam is always false under SQLite affinity, soWHERE CHANGE_DATE > '2026-05-01T…'matched nothing. Params for a query that touches adatetime_columnstable are now coerced to epoch µs (datetime objects and ISO-datetime strings alike), so the comparison matches the stored integer. Scoped to the query's tables, so non-datetime queries are unaffected.
Added
- Read-time coercion —
datetime_columnscome back asdatetime—execute()now returns those columns as realdatetimeobjects (UTC) instead of the raw INTEGER µs, restoring the transparent-proxy contract (you get the same type a direct source query would give). Opt out withCachingEngine(..., return_datetime=False)to get the raw integers. Stats.db_size_bytes— on-disk size of the cache file (0 in memory mode), soengine.statsexposes cache growth for monitoring without an external file check.- Public
datetime_to_epoch_ushelper —from sqlmem import datetime_to_epoch_usexposes the same datetime→epoch-µs conversion used internally, so callers buildingWHERE change_col > ?params don't have to re-implement it.
Changed
pyproject.toml— bumped version to1.14.0.vacuum(incremental=True)now warns instead of silently doing nothing when the cache was not created withauto_vacuum=INCREMENTAL(the only mode in which incremental vacuum can reclaim pages); it logs how to fix it (hard_reset()with the pragma, or a fullvacuum(incremental=False)) and returns.CacheManager.execute_in_memory()gained an optionaltablesargument (the query's tables) used to scope datetime param/result coercion;CacheManager/CachingEnginegained areturn_datetimeflag.
[1.12.0] - 2026-06-09
⚠️ Breaking
SCHEMA_VERSIONbumped3→4— on upgrade the existing cache is wiped automatically (disk mode wipes the file in place, in-memory discards the backup) and reloaded from the source on next use. For a large cache (e.g. a multi-hundred-million-row table) the full reload can take a while; deploy in a maintenance window.datetime_columnschange the public output contract for the chosen columns — a column listed indatetime_columnsis stored and returned as an INTEGER (microseconds since the Unix epoch, UTC), not an ISOTEXTstring. This is opt-in per column, so no table is affected unless you name its columns; consumers that read or filter such a column must adapt (compare against integer µs, or convert on read).
Added
datetime_columns=parameter onCachingEngine/CacheManager—datetime_columns={"VW_X": ["CHANGE_DATE"]}stores the named datetime columns as INTEGER µs-since-epoch instead of ~28-byte ISOTEXT. Saves ~20 bytes per row and makes index comparisons on the column operate on native integers instead of string collation — worthwhile for a pure datetime column on a very large table (e.g. a delta change column that is also range-scanned)._coerce.to_sqlite_datetime()converts datetimes (and ISO/datevalues) to exact integer microseconds via integer arithmetic (no float rounding); a naive datetime is treated as UTC,Nonepasses through.load_tabledeclares those columnsINTEGERandupsert_rowscoerces them the same way, so full loads and delta upserts agree on the on-disk representation.- The delta high-watermark for such a column is the stored integer;
delta._bind_watermark(..., epoch_us=True)reconstructs a real UTCdatetimebefore binding, so the source still receives a typed timestamp (and the watermark fix from 1.8.0 keeps holding).
Changed
pyproject.toml— bumped version to1.12.0.CacheManager.max_value/set_last_synced_atnow accept/returnintwatermarks alongsidestr(the INTEGER-µs watermark round-trips through thelast_synced_atTEXT column as its digit string).
[1.11.0] - 2026-06-09
Added
pragmas=parameter onCachingEngine/CacheManager— pass a dict of SQLite PRAGMAs (e.g.mmap_size,cache_size,temp_store,page_size,auto_vacuum) applied to the cache connection at open time, so disk-backed caches can be tuned for the host's I/O profile without bypassingCacheManager. Unknown/inapplicable pragmas are silently ignored by SQLite (graceful degradation, no startup crash).page_sizeis a layout pragma: it is applied only on a fresh file (set before WAL / the first table). On an existing cache with a different page size the request is ignored and a one-time warning is logged — the new value takes effect only afterhard_reset()or a rebuild.auto_vacuumis set before the database header is materialized (before switching to WAL) on a fresh file, soINCREMENTAL/FULLactually stick instead of silently reverting toNONE.
CachingEngine.hard_reset()/CacheManager.hard_reset()— close every connection, delete the on-disk cache file (and its-wal/-shmsidecars) and reopen from scratch with all current pragmas applied. Unlikereset()(which drops tables but keeps the open file), this letspage_size/auto_vacuumchange, since those are baked into the file at creation. Disk mode only — falls back toreset()in memory mode. All tables reload on next use.CachingEngine.vacuum(incremental=True, pages=10_000)/CacheManager.vacuum(...)— run maintenance VACUUM on the on-disk cache to reclaim free pages left by deltaINSERT OR REPLACEchurn. Incremental (default) reclaims up topagespages without blocking readers or extra disk (requiresauto_vacuum=INCREMENTAL);incremental=Falseruns a full VACUUM (rewrites the file, ~2× disk, blocks readers — maintenance window only). No-op in memory mode.
Changed
pyproject.toml— bumped version to1.11.0.ColumnRegistrygainedrebind()so it follows the cache connection swap performed byhard_reset()(the registry previously captured the connection for the process lifetime).
[1.10.0] - 2026-06-09
Added
last_upsert(persisted write) vslast_refresh(run/liveness) instats—TableStats.last_refreshpreviously came from the persistedlast_refresh_atcolumn, which is only written when rows are actually written (a delta cycle withtotal == 0early-returns and leaves it unchanged). A healthy delta that keeps finding no new rows therefore looked frozen. The single value is now split:last_upsert— wall-clock (UTC) of the last actual data write (full load / delta with rows). Persisted, survives restarts (this is the existinglast_refresh_atcolumn, surfaced under a clearer name).last_refresh— wall-clock (UTC) of the last time a refresh cycle ran for the table, even when it wrote nothing. In-memory per process (Noneuntil the first cycle after start), tracked like_states/_errors— so no schema change and no cache wipe.CacheManagergainedmark_refresh_ran()/get_last_runs(); an empty delta cycle now records a run. TTL staleness still uses the last write (seconds_since_refreshreadslast_refresh_at), so behaviour is unchanged.
Changed
pyproject.toml— bumped version to1.10.0.TableStats.last_refreshis nowstr | None(wasstr) and a new requiredlast_upsert: str | Nonefield is added. Consumers readinglast_refreshfor "when did data change?" should switch tolast_upsert.
[1.8.0] - 2026-06-08
Fixed
- Frozen delta watermark on
datetimechange columns — the delta high-watermark is read back from the cache as an ISOTEXTstring (e.g.'2026-06-05T14:54:24.823000') and was bound straight back to the source. SQL Server then had to implicitly convert thatnvarchartodatetimeand failed (T-separated ISO with 6 fractional digits exceedsdatetime's 3 — error 241 / SQLSTATE 22007), so every delta refresh and the startup catch-up died before streaming and the watermark never advanced (the cache silently froze at the last full load). The watermark is now parsed back to a realdatetime(delta._bind_watermark) so the driver sends a typed timestamp and the comparison runs natively; non-datetime change columns (e.g. integer rowversions) pass through unchanged. Regression tests added.
Added
- Refresh/load failures are now visible in
stats—TableStatsgainedlast_error,last_error_atandconsecutive_failures, andStatsgained a totalerrorscounter. A delta that fails before streaming (e.g. the watermark bug above) previously leftstate = ready, hiding the problem; it now also marks the tableerrorand records the message.consecutive_failuresresets to 0 on the next success. - Per-engine configuration —
CachingEngineacceptscache_db_path,backup_interval,refresh_interval,fetch_batchanddialect(each defaults to its env var / config global when omitted), so two engines with independent cache files can run in one process and config is testable without env vars. blocking_startup_refreshflag (defaultFalse) — the startup catch-up (deltas/TTL reloads for tables restored from disk) now runs on the background thread by default, so it never blocks application startup. Passblocking_startup_refresh=Trueto catch up synchronously before serving.
Changed
- SQL identifiers are quoted — table/column names are now quoted everywhere they are interpolated into statements (SQLite double-quote for the cache, the configured dialect — e.g. T-SQL
[brackets]— for the source), so reserved words or names with spaces work and the f-string interpolation is hardened. - Source connection opened lazily —
execute()no longer opens a source connection on every call; a pure cache hit never touches the source (and never occupies a pool slot). The misleadingcast(sqlite3.Connection, …)on the source handle was removed (it is a pyodbc connection in production). - Concurrent reads in disk mode — disk-backed reads now use a per-thread read-only WAL connection instead of sharing the single write connection under a lock, so a slow
SELECTno longer blocks writers (loads/upserts) or other readers. In-memory mode is unchanged (a:memory:database can't be shared across connections). add_sinkis idempotent — calling it again for the same sink is a no-op, so a double import no longer duplicates every log line.pyproject.toml— bumped version to1.8.0; added a scoped pytestfilterwarningsfor the SQLite test source's legacy datetime-adapter deprecation.
Note
- Cache type fidelity (returning real
datetime/Decimal/numeric types fromexecute()instead ofTEXTstrings, and giving numeric columns proper affinity) was evaluated but deferred — it changes the public output contract that consumers currently rely on (and thattest_coerce.pypins). Decimal/datetime stay stored as exact, losslessTEXT.
[1.7.0] - 2026-06-08
Added
- Disk-backed cache mode —
CachingEngine(engine, in_memory=False)(or envSQLMEM_IN_MEMORY=false) queries the on-diskcache.dbdirectly instead of loading it into an in-memory SQLite. Every write persists immediately (no hourly backup thread, no load-on-startup copy, noatexit/SIGTERMflush needed), and the cache may exceed available RAM. The disk connection uses WAL +synchronous=NORMALfor write throughput. In-memory mode (backed up to disk periodically) remains the default.in_memorydefaults to theSQLMEM_IN_MEMORYconfig when omitted.- On open, a disk cache with a mismatched
schema_versionis wiped in place and rebuilt. engine.reset()in disk mode drops the cached tables andVACUUMs the file (it does not unlink the open file).
- On open, a disk cache with a mismatched
SQLMEM_IN_MEMORYenv var (defaulttrue).
Changed
pyproject.toml— bumped version to1.7.0cache.py—CacheManagergained anin_memoryflag; the cache connection (_mem_conn→_conn) is opened either on:memory:or directly on the on-disk file. Disk mode skips the load-on-startup copy, backup thread, and shutdown flush, andreset()VACUUMs in place instead of unlinking the open file..gitignore— ignorecache.dband its WAL sidecars (cache.db-wal,cache.db-shm).
[1.6.0] - 2026-06-05
Added
- Secondary indexes —
CachingEngine(engine, indexes={"VW_X": ["col", ["a", "b"]]})creates indexes on the in-memory cache to accelerateWHERE/JOINlookups. Index columns are auto-loaded so the index exists from the first load, and indexes are recreated after every (re)load and persist incache.db. Combines freely withdeltaandttl.
Changed
pyproject.toml— bumped version to1.6.0
[1.5.0] - 2026-06-05
Added
- Per-table processing state in
stats—TableStatsnow carriesstate(loading/refreshing/ready/stale/error) andtracking(delta/ttl/static), so callers can see whether each table is up to date or being processed. In-progress first loads and failed loads also surface instats.tables. SQLMEM_FETCH_BATCHenv var (default10000) — rows fetched per batch when loading a table.
Changed
pyproject.toml— bumped version to1.5.0- Large-table loads are streamed in batches —
load_tableno longerfetchall()s the whole table (which double-buffered every row in Python and could OOM/crash on tens of millions of rows). Rows are now fetchedSQLMEM_FETCH_BATCHat a time into a staging table and swapped in atomically, so peak memory stays bounded, the previous copy stays queryable during a reload, and the network fetch no longer holds the cache lock. Delta catch-ups are streamed the same way. - Orphan staging tables left by an interrupted load (crash/backup mid-load) are dropped on startup.
- Delta upserts compute
row_countonce per refresh instead of a fullCOUNT(*)after every batch (avoids O(rows×batches) work on large catch-ups).
[1.4.0] - 2026-06-05
Fixed
decimal.Decimal(anddatetime) binding error —NUMERIC/DECIMAL/MONEYcolumns from SQL Server (pyodbc) arrive asdecimal.Decimal, whichsqlite3cannot bind, crashing the cache load withtype 'decimal.Decimal' is not supported. Values are now coerced to sqlite-bindable types (Decimal→str,datetime/date/time→ISO,uuid.UUID→str,bytearray→bytes) at the cache boundary — on full load, on delta upsert, and for WHERE parameters. Coercion is local (no globalsqlite3.register_adapter), so the host application'ssqlite3behaviour is untouched. Cache columns areTEXT, so the conversion is lossless and exact (no rounding).
Added
- Incremental (delta) refresh —
CachingEngine(engine, delta={...})withDeltaConfig(change_column, key_columns). Delta-tracked tables are kept in sync by pulling only changed rows (WHERE change_column >= watermark) and upserting them by key, instead of full reloads.- Data-driven high-watermark =
max(change_column)cached, persisted incache.db;>=overlap + idempotent upsert so no row is missed and boundary rows are harmlessly re-read. - Catch-up on startup (since last shutdown) and a background thread refreshing every
SQLMEM_REFRESH_INTERVALseconds (default 300);engine.refresh()triggers a pull on demand. - Primary key is auto-discovered from the source DB (
inspect(engine).get_pk_constraint) whenkey_columnsis omitted; required explicitly for views (raisesValueError).
- Data-driven high-watermark =
- Per-table TTL (time-based refresh) —
CachingEngine(engine, ttl={"VW_X": 300})for tables with no change column that can't be delta-synced. The cached copy is guaranteed never older than the TTL: a query touching an expired table triggers a full reload before it is answered (read-time guarantee), and the background thread proactively reloads expired tables. TTL age uses the persistedlast_refresh_at, so the bound holds across restarts. A table in bothdeltaandttlraisesValueError. DeltaConfigexported from the public API.engine.reset()— wipes the whole cache (RAM +cache.db) for a clean rebuild after structural source changes.SQLMEM_REFRESH_INTERVALenv var (default300) — background refresh tick for delta pulls and proactive TTL reloads.
Changed
pyproject.toml— bumped version to1.4.0cache.py— schema version bumped to3;_sqlmem_tablesgained alast_synced_atwatermark column. New methods:execute_in_memory(lock-serialized read),get_table_columns,create_unique_index,get/set_last_synced_at,max_value,upsert_rows,seconds_since_refresh,reset. Existing on-disk caches are discarded and rebuilt on load.executor.py— delta-tracked tables augment their column set with key/change columns (unique key index + initial watermark); TTL-tracked tables full-reload at read time when expired; in-memory reads go through the cache lock.
[1.2.0] - 2026-06-04
Added
- Parametrized queries (R1) —
execute(sql, params)accepts positional (?tuple/list) and named (:namedict) parameters; passed straight to SQLite during in-memory filtering. Cache loads still fetch the full table (parameters are not applied to source fetches). - JOIN support (R2) — multi-table SELECTs are parsed into per-table column sets; each table is cached independently and the JOIN runs in the in-memory SQLite. Columns in a multi-table query must be qualified by table or alias.
SELECT *support (R3) — wildcard (andalias.*) queries discover all columns from the source DB, cache the whole table, and mark itis_fullso later column queries are guaranteed cache hits without re-fetch.- Three-part table names (R4) —
[catalog].[schema].[table]is parsed to its base name for caching; the in-memory query is rewritten to strip catalog/schema prefixes so it runs under SQLite. SQLMEM_SQL_DIALECTenv var (defaulttsql) — sqlglot dialect used to parse incoming SQL; T-SQL also accepts ANSI SQL and MSSQL bracket quoting.CacheManager.discover_columns()andCacheManager.is_table_full();load_table()gained afullflag.
Changed
pyproject.toml— bumped version to1.2.0parser.py—ParsedQuery.table: strreplaced bytables: list[str]pluscolumns_by_table,sqlite_sql,params, andwildcard_tables; SQL is parsed with the configured dialect and rendered to SQLite for execution.executor.py— loads each referenced table independently and applies query parameters during in-memory execution.cache.py— schema version bumped to2;_sqlmem_tablesgained anis_fullcolumn (existing on-disk caches are discarded and rebuilt on load).
[1.1.0] - 2026-06-03
Added
StatsandTableStatsfrozen dataclasses — snapshot of runtime cache statistics (hit/miss/refetch counts, per-table row count, columns, last refresh timestamp)StatsCollector— internal thread-safe counter; increments on every cache hit, miss, and re-fetchengine.statsproperty — returns aStatssnapshot at any point in timeStatsandTableStatsexported from the public API
Changed
pyproject.toml— bumped version to1.1.0
[1.0.0] - 2026-06-03
Changed
pyproject.toml— bumped version to1.0.0
[0.4.0] - 2026-06-03
Added
add_sink(sink, *, level, **kwargs)— public API for routing sqlmem log records to any loguru-compatible sink (stream, file, callable); supports all logurulogger.add()kwargs includingrotation,retention, etc.
Changed
pyproject.toml— bumped version to0.4.0config.py— replaced destructivelogger.remove()+ forced default sink withlogger.disable("sqlmem"); sqlmem is now silent by default and does not interfere with the host application's logging setup
[0.3.0] - 2026-06-03
Added
README.md— full project documentation: architecture overview, quick start, cache behaviour, persistence, configuration, exceptions, logging, and limitations
Changed
pyproject.toml— bumped version to0.3.0parser.py—_extract_columnsnow deduplicates column names while preserving order.gitignore— added.envand.env.*to prevent accidental commit of environment files
Security
- Removed
.envfrom git tracking (git rm --cached)
[0.2.0] - 2026-06-01
Added
- Project specification in
project.md— architecture, API design, cache backend, metadata schema, logging strategy, and TODO for future features (JOIN, SELECT * support) .gitignorefor Python/Poetry projectpyproject.tomldependencies:sqlglot,sqlalchemy,loguru,python-dotenv; dev dependencies:pytest,ruff,mypysrc/sqlmem/package structure with src layoutsrc/sqlmem/exceptions.py—ReadOnlyError(blocks INSERT/UPDATE/DELETE),UnsupportedQueryError(blocks JOIN and SELECT *)src/sqlmem/config.py— loads.env, configuresloguruwith DEBUG/INFO level based onSQLMEM_DEBUGsrc/sqlmem/_meta.py— package version constantsrc/sqlmem/parser.py— SQL Parser usingsqlglot; extracts table and columns from SELECT, raises on writes/JOIN/wildcardsrc/sqlmem/registry.py— Column Registry; accumulates requested columns per table, detects missing columns requiring re-fetchsrc/sqlmem/cache.py— Cache Manager; SQLite in-memory storage, load fromcache.dbon startup (with schema version check), hourly backup thread,atexit/SIGTERM flush, metadata tables (_sqlmem_meta,_sqlmem_tables,_sqlmem_columns)src/sqlmem/executor.py— Query Executor; cache hit/miss logic, re-fetch on new columns with WARNING logsrc/sqlmem/engine.py—CachingEnginewrapper; public API compatible with SQLAlchemy,invalidate(table)for manual cache clearingsrc/sqlmem/__init__.py— public exports:CachingEngine,ReadOnlyError,UnsupportedQueryErrortests/test_parser.py— parser tests: SELECT parsing, ReadOnlyError, UnsupportedQueryErrortests/test_cache.py— cache tests: load, data correctness, metadata, disk backup/reloadtests/test_registry.py— registry tests: accumulation, needs_refetch, table isolation