Skip to content

​[Feature Request] Add configuration toggle to disable prompt disk caching to prevent excessive SSD wear #341

Description

@houzy

Description:
​Currently, mlx-engine enforces a prompt disk cache mechanism (managed in prompt_cache/disk_budget.py), which dynamically allocates disk space based on the system's free SSD storage. While this is helpful for machines with limited RAM, this hardcoded behavior causes severe SSD wear (TBW degradation) for power users running large context windows with agentic workflows.
​I would like to request an official configuration toggle to disable this disk caching, both at the mlx-engine level and exposed within the LM Studio UI.
​The Problem:
​Massive SSD Writes (Cache Thrashing): When running large context windows (e.g., 64k - 128k) via Agentic workflows (like Claude Code or MCP servers), the prompt cache thrashes rapidly. Because disk_budget.py hard-caps the budget (e.g., min(30 GiB, free disk / 4)), the cache quickly fills up. In a single session, I observed the LM Studio backend (node process) writing over 40GB+ of data to the SSD, accompanied by logs showing massive lifetime_evicted_mib cycling.
​Hardware Anxiety on Apple Silicon: Mac SSDs are soldered to the motherboard. Writing tens of gigabytes daily solely for prompt caching creates severe anxiety regarding the SSD's lifespan.
​Abundant Unified Memory is Ignored: Users with 64GB or 128GB of Unified Memory have plenty of RAM to handle the KV Cache entirely in memory. However, there is currently no way to bypass the disk_budget.py logic without manually unpacking the .app and modifying the Python source code to return 0.
​Proposed Solution:
​At the mlx-engine level: Introduce an environment variable (e.g., MLX_DISABLE_PROMPT_DISK_CACHE=true) or a configuration flag that allows users to cleanly bypass the allocation logic and set the disk budget to 0.
​At the LM Studio GUI level: Expose a toggle in the Advanced Configuration panel (e.g., a checkbox for "Disable Prompt Disk Cache (Memory Only)") so that users can easily opt out of SSD caching to protect their hardware.
​Additional Context:
​Hardware: Mac with 64GB Unified Memory.
​Use Case: Running local dense/MoE models (e.g., Gemma-4-26B) with 128k context via local Agents processing thousands of documents. When I manually hardcoded provisional_cache_store_budget_bytes to return 0, the massive SSD writes completely stopped with zero negative impact on generation quality.

Metadata

Metadata

Assignees

No one assigned

    Labels

    No labels
    No labels

    Type

    No type

    Fields

    No fields configured for issues without a type.

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions