15 KiB
15 KiB
Changelog
All notable changes to this project will be documented in this file.
[Unreleased]
[1.8.0] - 2026-06-08
Fixed
- Frozen delta watermark on
datetimechange columns — the delta high-watermark is read back from the cache as an ISOTEXTstring (e.g.'2026-06-05T14:54:24.823000') and was bound straight back to the source. SQL Server then had to implicitly convert thatnvarchartodatetimeand failed (T-separated ISO with 6 fractional digits exceedsdatetime's 3 — error 241 / SQLSTATE 22007), so every delta refresh and the startup catch-up died before streaming and the watermark never advanced (the cache silently froze at the last full load). The watermark is now parsed back to a realdatetime(delta._bind_watermark) so the driver sends a typed timestamp and the comparison runs natively; non-datetime change columns (e.g. integer rowversions) pass through unchanged. Regression tests added.
Added
- Refresh/load failures are now visible in
stats—TableStatsgainedlast_error,last_error_atandconsecutive_failures, andStatsgained a totalerrorscounter. A delta that fails before streaming (e.g. the watermark bug above) previously leftstate = ready, hiding the problem; it now also marks the tableerrorand records the message.consecutive_failuresresets to 0 on the next success. - Per-engine configuration —
CachingEngineacceptscache_db_path,backup_interval,refresh_interval,fetch_batchanddialect(each defaults to its env var / config global when omitted), so two engines with independent cache files can run in one process and config is testable without env vars. blocking_startup_refreshflag (defaultFalse) — the startup catch-up (deltas/TTL reloads for tables restored from disk) now runs on the background thread by default, so it never blocks application startup. Passblocking_startup_refresh=Trueto catch up synchronously before serving.
Changed
- SQL identifiers are quoted — table/column names are now quoted everywhere they are interpolated into statements (SQLite double-quote for the cache, the configured dialect — e.g. T-SQL
[brackets]— for the source), so reserved words or names with spaces work and the f-string interpolation is hardened. - Source connection opened lazily —
execute()no longer opens a source connection on every call; a pure cache hit never touches the source (and never occupies a pool slot). The misleadingcast(sqlite3.Connection, …)on the source handle was removed (it is a pyodbc connection in production). - Concurrent reads in disk mode — disk-backed reads now use a per-thread read-only WAL connection instead of sharing the single write connection under a lock, so a slow
SELECTno longer blocks writers (loads/upserts) or other readers. In-memory mode is unchanged (a:memory:database can't be shared across connections). add_sinkis idempotent — calling it again for the same sink is a no-op, so a double import no longer duplicates every log line.pyproject.toml— bumped version to1.8.0; added a scoped pytestfilterwarningsfor the SQLite test source's legacy datetime-adapter deprecation.
Note
- Cache type fidelity (returning real
datetime/Decimal/numeric types fromexecute()instead ofTEXTstrings, and giving numeric columns proper affinity) was evaluated but deferred — it changes the public output contract that consumers currently rely on (and thattest_coerce.pypins). Decimal/datetime stay stored as exact, losslessTEXT.
[1.7.0] - 2026-06-08
Added
- Disk-backed cache mode —
CachingEngine(engine, in_memory=False)(or envSQLMEM_IN_MEMORY=false) queries the on-diskcache.dbdirectly instead of loading it into an in-memory SQLite. Every write persists immediately (no hourly backup thread, no load-on-startup copy, noatexit/SIGTERMflush needed), and the cache may exceed available RAM. The disk connection uses WAL +synchronous=NORMALfor write throughput. In-memory mode (backed up to disk periodically) remains the default.in_memorydefaults to theSQLMEM_IN_MEMORYconfig when omitted.- On open, a disk cache with a mismatched
schema_versionis wiped in place and rebuilt. engine.reset()in disk mode drops the cached tables andVACUUMs the file (it does not unlink the open file).
- On open, a disk cache with a mismatched
SQLMEM_IN_MEMORYenv var (defaulttrue).
Changed
pyproject.toml— bumped version to1.7.0cache.py—CacheManagergained anin_memoryflag; the cache connection (_mem_conn→_conn) is opened either on:memory:or directly on the on-disk file. Disk mode skips the load-on-startup copy, backup thread, and shutdown flush, andreset()VACUUMs in place instead of unlinking the open file..gitignore— ignorecache.dband its WAL sidecars (cache.db-wal,cache.db-shm).
[1.6.0] - 2026-06-05
Added
- Secondary indexes —
CachingEngine(engine, indexes={"VW_X": ["col", ["a", "b"]]})creates indexes on the in-memory cache to accelerateWHERE/JOINlookups. Index columns are auto-loaded so the index exists from the first load, and indexes are recreated after every (re)load and persist incache.db. Combines freely withdeltaandttl.
Changed
pyproject.toml— bumped version to1.6.0
[1.5.0] - 2026-06-05
Added
- Per-table processing state in
stats—TableStatsnow carriesstate(loading/refreshing/ready/stale/error) andtracking(delta/ttl/static), so callers can see whether each table is up to date or being processed. In-progress first loads and failed loads also surface instats.tables. SQLMEM_FETCH_BATCHenv var (default10000) — rows fetched per batch when loading a table.
Changed
pyproject.toml— bumped version to1.5.0- Large-table loads are streamed in batches —
load_tableno longerfetchall()s the whole table (which double-buffered every row in Python and could OOM/crash on tens of millions of rows). Rows are now fetchedSQLMEM_FETCH_BATCHat a time into a staging table and swapped in atomically, so peak memory stays bounded, the previous copy stays queryable during a reload, and the network fetch no longer holds the cache lock. Delta catch-ups are streamed the same way. - Orphan staging tables left by an interrupted load (crash/backup mid-load) are dropped on startup.
- Delta upserts compute
row_countonce per refresh instead of a fullCOUNT(*)after every batch (avoids O(rows×batches) work on large catch-ups).
[1.4.0] - 2026-06-05
Fixed
decimal.Decimal(anddatetime) binding error —NUMERIC/DECIMAL/MONEYcolumns from SQL Server (pyodbc) arrive asdecimal.Decimal, whichsqlite3cannot bind, crashing the cache load withtype 'decimal.Decimal' is not supported. Values are now coerced to sqlite-bindable types (Decimal→str,datetime/date/time→ISO,uuid.UUID→str,bytearray→bytes) at the cache boundary — on full load, on delta upsert, and for WHERE parameters. Coercion is local (no globalsqlite3.register_adapter), so the host application'ssqlite3behaviour is untouched. Cache columns areTEXT, so the conversion is lossless and exact (no rounding).
Added
- Incremental (delta) refresh —
CachingEngine(engine, delta={...})withDeltaConfig(change_column, key_columns). Delta-tracked tables are kept in sync by pulling only changed rows (WHERE change_column >= watermark) and upserting them by key, instead of full reloads.- Data-driven high-watermark =
max(change_column)cached, persisted incache.db;>=overlap + idempotent upsert so no row is missed and boundary rows are harmlessly re-read. - Catch-up on startup (since last shutdown) and a background thread refreshing every
SQLMEM_REFRESH_INTERVALseconds (default 300);engine.refresh()triggers a pull on demand. - Primary key is auto-discovered from the source DB (
inspect(engine).get_pk_constraint) whenkey_columnsis omitted; required explicitly for views (raisesValueError).
- Data-driven high-watermark =
- Per-table TTL (time-based refresh) —
CachingEngine(engine, ttl={"VW_X": 300})for tables with no change column that can't be delta-synced. The cached copy is guaranteed never older than the TTL: a query touching an expired table triggers a full reload before it is answered (read-time guarantee), and the background thread proactively reloads expired tables. TTL age uses the persistedlast_refresh_at, so the bound holds across restarts. A table in bothdeltaandttlraisesValueError. DeltaConfigexported from the public API.engine.reset()— wipes the whole cache (RAM +cache.db) for a clean rebuild after structural source changes.SQLMEM_REFRESH_INTERVALenv var (default300) — background refresh tick for delta pulls and proactive TTL reloads.
Changed
pyproject.toml— bumped version to1.4.0cache.py— schema version bumped to3;_sqlmem_tablesgained alast_synced_atwatermark column. New methods:execute_in_memory(lock-serialized read),get_table_columns,create_unique_index,get/set_last_synced_at,max_value,upsert_rows,seconds_since_refresh,reset. Existing on-disk caches are discarded and rebuilt on load.executor.py— delta-tracked tables augment their column set with key/change columns (unique key index + initial watermark); TTL-tracked tables full-reload at read time when expired; in-memory reads go through the cache lock.
[1.2.0] - 2026-06-04
Added
- Parametrized queries (R1) —
execute(sql, params)accepts positional (?tuple/list) and named (:namedict) parameters; passed straight to SQLite during in-memory filtering. Cache loads still fetch the full table (parameters are not applied to source fetches). - JOIN support (R2) — multi-table SELECTs are parsed into per-table column sets; each table is cached independently and the JOIN runs in the in-memory SQLite. Columns in a multi-table query must be qualified by table or alias.
SELECT *support (R3) — wildcard (andalias.*) queries discover all columns from the source DB, cache the whole table, and mark itis_fullso later column queries are guaranteed cache hits without re-fetch.- Three-part table names (R4) —
[catalog].[schema].[table]is parsed to its base name for caching; the in-memory query is rewritten to strip catalog/schema prefixes so it runs under SQLite. SQLMEM_SQL_DIALECTenv var (defaulttsql) — sqlglot dialect used to parse incoming SQL; T-SQL also accepts ANSI SQL and MSSQL bracket quoting.CacheManager.discover_columns()andCacheManager.is_table_full();load_table()gained afullflag.
Changed
pyproject.toml— bumped version to1.2.0parser.py—ParsedQuery.table: strreplaced bytables: list[str]pluscolumns_by_table,sqlite_sql,params, andwildcard_tables; SQL is parsed with the configured dialect and rendered to SQLite for execution.executor.py— loads each referenced table independently and applies query parameters during in-memory execution.cache.py— schema version bumped to2;_sqlmem_tablesgained anis_fullcolumn (existing on-disk caches are discarded and rebuilt on load).
[1.1.0] - 2026-06-03
Added
StatsandTableStatsfrozen dataclasses — snapshot of runtime cache statistics (hit/miss/refetch counts, per-table row count, columns, last refresh timestamp)StatsCollector— internal thread-safe counter; increments on every cache hit, miss, and re-fetchengine.statsproperty — returns aStatssnapshot at any point in timeStatsandTableStatsexported from the public API
Changed
pyproject.toml— bumped version to1.1.0
[1.0.0] - 2026-06-03
Changed
pyproject.toml— bumped version to1.0.0
[0.4.0] - 2026-06-03
Added
add_sink(sink, *, level, **kwargs)— public API for routing sqlmem log records to any loguru-compatible sink (stream, file, callable); supports all logurulogger.add()kwargs includingrotation,retention, etc.
Changed
pyproject.toml— bumped version to0.4.0config.py— replaced destructivelogger.remove()+ forced default sink withlogger.disable("sqlmem"); sqlmem is now silent by default and does not interfere with the host application's logging setup
[0.3.0] - 2026-06-03
Added
README.md— full project documentation: architecture overview, quick start, cache behaviour, persistence, configuration, exceptions, logging, and limitations
Changed
pyproject.toml— bumped version to0.3.0parser.py—_extract_columnsnow deduplicates column names while preserving order.gitignore— added.envand.env.*to prevent accidental commit of environment files
Security
- Removed
.envfrom git tracking (git rm --cached)
[0.2.0] - 2026-06-01
Added
- Project specification in
project.md— architecture, API design, cache backend, metadata schema, logging strategy, and TODO for future features (JOIN, SELECT * support) .gitignorefor Python/Poetry projectpyproject.tomldependencies:sqlglot,sqlalchemy,loguru,python-dotenv; dev dependencies:pytest,ruff,mypysrc/sqlmem/package structure with src layoutsrc/sqlmem/exceptions.py—ReadOnlyError(blocks INSERT/UPDATE/DELETE),UnsupportedQueryError(blocks JOIN and SELECT *)src/sqlmem/config.py— loads.env, configuresloguruwith DEBUG/INFO level based onSQLMEM_DEBUGsrc/sqlmem/_meta.py— package version constantsrc/sqlmem/parser.py— SQL Parser usingsqlglot; extracts table and columns from SELECT, raises on writes/JOIN/wildcardsrc/sqlmem/registry.py— Column Registry; accumulates requested columns per table, detects missing columns requiring re-fetchsrc/sqlmem/cache.py— Cache Manager; SQLite in-memory storage, load fromcache.dbon startup (with schema version check), hourly backup thread,atexit/SIGTERM flush, metadata tables (_sqlmem_meta,_sqlmem_tables,_sqlmem_columns)src/sqlmem/executor.py— Query Executor; cache hit/miss logic, re-fetch on new columns with WARNING logsrc/sqlmem/engine.py—CachingEnginewrapper; public API compatible with SQLAlchemy,invalidate(table)for manual cache clearingsrc/sqlmem/__init__.py— public exports:CachingEngine,ReadOnlyError,UnsupportedQueryErrortests/test_parser.py— parser tests: SELECT parsing, ReadOnlyError, UnsupportedQueryErrortests/test_cache.py— cache tests: load, data correctness, metadata, disk backup/reloadtests/test_registry.py— registry tests: accumulation, needs_refetch, table isolation