Build a reusable wrapper that combines SQLite's shared-cache in-memory database with D-MemFS-backed persistence. Multiple connections share a single live database; when the last connection closes, the database is automatically serialized to D-MemFS for later restoration.
SQLite's shared-cache in-memory database (file:name?mode=memory&cache=shared) allows multiple connections within the same process to share a single database. This is useful for ETL pipelines, multi-component applications, and test harnesses that need concurrent read/write access to a common dataset.
However, there is a catch: when the last connection closes, the database is destroyed. If your application opens and closes connections dynamically (e.g., per-request, per-task, per-test), you must manually serialize/deserialize the database to preserve its state between "sessions."
This tutorial builds a MFSSQLiteStore class that solves this problem by using D-MemFS as an automatic persistence backend:
- Connect — If a live shared DB exists, join it. If not, restore from D-MemFS (or create a fresh DB).
- Close — If you are the last connection, serialize the DB back to D-MemFS before it disappears.
- Thread-safe — Reference counting and locking ensure correct behavior under concurrent access.
Good fit:
- ETL pipelines where multiple stages read/write a shared intermediate database
- Test harnesses where a shared DB is populated once and queried by multiple test workers
- Applications that dynamically open/close DB connections and need state to survive between sessions
Not the right tool:
- If you only need one connection at a time, the simpler serialize/deserialize pattern is sufficient
- If you need cross-process database sharing (use a file-based SQLite or a proper RDBMS)
- If you need the database to survive process restarts (D-MemFS itself is volatile — use
export_treeto persist to disk)
"""
MFSSQLiteStore — SQLite shared-cache memory DB with D-MemFS auto-persistence.
Usage:
from dmemfs import MemoryFileSystem
mfs = MemoryFileSystem(max_quota=64 * 1024 * 1024)
store = MFSSQLiteStore()
with store.connect(mfs, "/db/app.db") as conn:
conn.execute("CREATE TABLE IF NOT EXISTS kv (k TEXT PRIMARY KEY, v TEXT)")
conn.execute("INSERT OR REPLACE INTO kv VALUES ('hello', 'world')")
conn.commit()
# Later (even after all connections have closed):
with store.connect(mfs, "/db/app.db") as conn:
row = conn.execute("SELECT v FROM kv WHERE k = 'hello'").fetchone()
print(row[0]) # "world"
"""
from __future__ import annotations
import sqlite3
import threading
from typing import Self
from dmemfs import MemoryFileSystem
class MFSSQLiteStore:
"""Manages SQLite shared-cache memory databases backed by D-MemFS.
Each unique (mfs_instance, mfs_path) pair tracks a single shared-cache
memory database. The database is lazily restored from D-MemFS on first
connect and automatically serialized back when the last connection closes.
Thread-safe: all internal state is protected by a reentrant lock.
"""
def __init__(self) -> None:
self._lock = threading.RLock()
# Key: (id(mfs), mfs_path) -> _DBSlot
self._slots: dict[tuple[int, str], _DBSlot] = {}
def connect(
self,
mfs: MemoryFileSystem,
path: str,
) -> _ManagedConnection:
"""Open a connection to the shared-cache memory DB at `path`.
Args:
mfs: The MemoryFileSystem instance that stores the serialized DB.
path: Full path within MFS (e.g. "/db/app.db").
Also used to derive the SQLite shared-cache DB name.
Returns:
A _ManagedConnection context manager wrapping sqlite3.Connection.
Behavior:
- If a live shared DB already exists for this (mfs, path),
a new connection joins it.
- If no live DB exists but MFS contains serialized data at `path`,
the DB is restored via the SQLite backup API.
- If neither exists, a fresh empty DB is created.
"""
with self._lock:
key = (id(mfs), path)
slot = self._slots.get(key)
if slot is None:
slot = _DBSlot(mfs=mfs, path=path)
self._slots[key] = slot
conn = slot.acquire(self._lock)
return _ManagedConnection(conn=conn, slot=slot, store=self)
def _release(self, slot: _DBSlot) -> None:
"""Called by _ManagedConnection.close(). Serializes if refcount → 0."""
with self._lock:
slot.release(self._lock)
if slot.refcount == 0:
key = (id(slot.mfs), slot.path)
self._slots.pop(key, None)
class _DBSlot:
"""Internal: tracks one shared-cache memory DB and its reference count."""
def __init__(self, mfs: MemoryFileSystem, path: str) -> None:
self.mfs = mfs
self.path = path
self.refcount: int = 0
# The "keeper" connection holds the shared-cache DB alive.
# Without it, SQLite destroys the DB when user connections close.
self._keeper: sqlite3.Connection | None = None
@property
def _uri(self) -> str:
"""SQLite URI for the shared-cache memory DB.
Uses the MFS path (with slashes replaced) as the DB name to ensure
uniqueness per path.
"""
safe_name = self.path.replace("/", "_").strip("_")
return f"file:{safe_name}?mode=memory&cache=shared"
def acquire(self, lock: threading.RLock) -> sqlite3.Connection:
"""Increment refcount and return a new connection. Must hold lock."""
if self._keeper is None:
# First connection — create or restore the DB.
self._keeper = sqlite3.connect(self._uri, uri=True)
self._restore_from_mfs()
conn = sqlite3.connect(self._uri, uri=True)
self.refcount += 1
return conn
def release(self, lock: threading.RLock) -> None:
"""Decrement refcount. If zero, serialize to MFS and tear down."""
self.refcount -= 1
if self.refcount == 0:
self._save_to_mfs()
if self._keeper is not None:
self._keeper.close()
self._keeper = None
def _restore_from_mfs(self) -> None:
"""Deserialize DB data from MFS into the keeper connection."""
if self._keeper is None:
return
if not self.mfs.exists(self.path) or self.mfs.is_dir(self.path):
return
with self.mfs.open(self.path, "rb") as f:
data = f.read()
if data:
# We cannot use self._keeper.deserialize(data) directly because
# deserialize() replaces the underlying pager and disconnects the
# database from the shared-cache.
# Instead, we deserialize into a temporary private memory DB,
# and use the SQLite backup API to safely copy data into the shared cache.
temp_conn = sqlite3.connect(":memory:")
temp_conn.deserialize(data)
temp_conn.backup(self._keeper)
temp_conn.close()
def _save_to_mfs(self) -> None:
"""Serialize the keeper connection's DB into MFS."""
if self._keeper is None:
return
data: bytes = self._keeper.serialize()
parent = self.path.rsplit("/", 1)[0] or "/"
if not self.mfs.exists(parent):
self.mfs.mkdir(parent)
with self.mfs.open(self.path, "wb") as f:
f.write(data)
class _ManagedConnection:
"""Context manager wrapper around sqlite3.Connection.
Delegates attribute access to the underlying connection so callers
can use .execute(), .commit(), etc. directly.
"""
def __init__(
self,
conn: sqlite3.Connection,
slot: _DBSlot,
store: MFSSQLiteStore,
) -> None:
self._conn = conn
self._slot = slot
self._store = store
self._closed = False
# --- Context manager ---
def __enter__(self) -> Self:
return self
def __exit__(self, *exc: object) -> None:
self.close()
# --- Delegation ---
def __getattr__(self, name: str):
"""Proxy attribute access to the underlying sqlite3.Connection."""
if self._closed:
raise ValueError("Connection is closed")
return getattr(self._conn, name)
# --- Lifecycle ---
def close(self) -> None:
"""Close this connection and release the slot."""
if self._closed:
return
self._closed = True
self._conn.close()
self._store._release(self._slot)from dmemfs import MemoryFileSystem
mfs = MemoryFileSystem(max_quota=64 * 1024 * 1024)
store = MFSSQLiteStore()
# Session 1: Create a table and insert data
with store.connect(mfs, "/db/myapp.db") as conn:
conn.execute("CREATE TABLE users (id INTEGER PRIMARY KEY, name TEXT)")
conn.execute("INSERT INTO users VALUES (1, 'Alice')")
conn.execute("INSERT INTO users VALUES (2, 'Bob')")
conn.commit()
# Connection closed → DB serialized to MFS automatically
# Session 2: Data survives because MFS holds the serialized state
with store.connect(mfs, "/db/myapp.db") as conn:
rows = conn.execute("SELECT name FROM users ORDER BY id").fetchall()
print(rows) # [('Alice',), ('Bob',)]store = MFSSQLiteStore()
# First connection opens (restores from MFS if data exists)
conn1 = store.connect(mfs, "/db/shared.db")
conn1.__enter__()
conn1.execute("CREATE TABLE IF NOT EXISTS log (msg TEXT)")
conn1.execute("INSERT INTO log VALUES ('from conn1')")
conn1.commit()
# Second connection joins the same live DB — no deserialization cost
with store.connect(mfs, "/db/shared.db") as conn2:
conn2.execute("INSERT INTO log VALUES ('from conn2')")
conn2.commit()
# conn2 closes, but conn1 is still open → DB stays alive
# conn1 is the last one — closing it triggers serialization
conn1.__exit__(None, None, None)
# Verify: all data was persisted
with store.connect(mfs, "/db/shared.db") as conn3:
rows = conn3.execute("SELECT msg FROM log").fetchall()
print(rows) # [('from conn1',), ('from conn2',)]# The serialized DB is a regular MFS file — all MFS features work:
# Check DB size
print(mfs.get_size("/db/myapp.db"))
# Snapshot the DB (along with other files)
snapshot = mfs.export_tree("/db")
# Copy the DB to create an independent fork
mfs.copy("/db/myapp.db", "/db/myapp_backup.db")
# Quota enforcement applies: a DB that grows too large is rejected
# during serialization (MFSQuotaExceededError)The intuitive approach would be to call self._keeper.deserialize(data) directly on the shared-cache connection. However, this does not work: deserialize() replaces the connection's underlying pager, which silently disconnects it from the shared-cache ring. After deserialize(), the keeper holds the restored data in a private memory database, but any new connection opened with the same URI sees an empty shared-cache database.
This is not documented in SQLite's Python bindings or most tutorials. The workaround is a two-step process: deserialize into a temporary private :memory: connection, then use sqlite3.Connection.backup() to copy the data into the keeper (which remains attached to the shared-cache). The backup API operates at the page level and preserves the keeper's shared-cache membership.
This was discovered through testing — the original implementation using direct deserialize() failed with OperationalError: no such table across Python 3.11–3.14. The backup API approach passes on all versions.
SQLite destroys a shared-cache memory database when the last connection closes. If two user connections open and close in sequence (not overlapping), the DB would be destroyed between them, losing uncommitted structural state.
The _keeper connection stays open as long as at least one user connection exists, ensuring the shared-cache DB survives across non-overlapping user connections within the same "session." When the refcount drops to zero, the keeper serializes and closes, allowing the DB to be cleaned up.
SQLite does track connection counts internally, but this information is not exposed through the Python sqlite3 API. Maintaining an explicit refcount in _DBSlot gives us a reliable trigger for the serialize-on-last-close behavior without relying on undocumented internals.
While the shared-cache DB is live (refcount > 0), its data resides in SQLite's own heap allocation — outside MFS quota accounting. Only when the DB is serialized back to MFS (on last close) does the data come under quota management. This is analogous to the documented behavior of export_as_bytesio(), where the returned BytesIO is outside quota management.
If the serialized DB exceeds the MFS quota at close time, MFSQuotaExceededError is raised. The caller should handle this — for example, by increasing the quota or cleaning up other MFS files before closing the last connection.
The MFSSQLiteStore._lock (an RLock) protects all slot management: creation, refcount changes, and serialization. Individual sqlite3.Connection objects follow SQLite's own threading rules (check_same_thread=False is not set by default, so each connection should be used from the thread that created it). If cross-thread connection sharing is needed, the caller must manage synchronization.
- Not a connection pool. Each
connect()call creates a newsqlite3.Connection. For high-frequency open/close patterns, consider holding connections longer. - Serialize/deserialize cost. Every "cold start" (restoring from MFS) and "last close" (saving to MFS) involves a full copy of the database. For very large databases, this can be expensive.
- Memory spikes on load/save. The memory footprint spikes significantly during persistence operations. A "cold start" (restoring from MFS) can temporarily use up to 4× the database size in memory (D-MemFS holding the file + Python
bytesobject + SQLite temporary DB + SQLite shared-cache DB). Saving to MFS on the last close uses about 3× the database size (live shared-cache DB + Pythonbytesobject from.serialize()+ new D-MemFS file). - Volatile. D-MemFS itself is in-process and volatile. If the process crashes, unserialized DB state is lost. Use
export_tree()to persist critical data to disk. deserialize()requires Python 3.11+. This aligns with D-MemFS's own Python version requirement.- Shared-cache mode caveats. SQLite's shared-cache mode has known quirks around locking granularity (table-level, not row-level). Be aware of these when designing concurrent access patterns.