Skip to content

fix: use safeWriteJson for all JSON file writes #3772

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Draft
wants to merge 2 commits into
base: main
Choose a base branch
from

Conversation

KJ7LNW
Copy link
Collaborator

@KJ7LNW KJ7LNW commented May 21, 2025

Context

This PR introduces safeWriteJson utility and refactors all direct JSON file writes across the codebase to use it, preventing potential data corruption during write operations.

Implementation

The new safeWriteJson utility provides:

  • Atomic write operations using a temporary file approach
  • Backup creation before overwriting existing files
  • Proper error handling and cleanup of temporary files
  • Protection against concurrent writes to the same file
  • Support for custom JSON formatting with optional replacer and space arguments

All direct JSON file writes have been refactored to use this utility, including:

  • Configuration files
  • Settings export/import
  • Task persistence
  • File context tracking

This change ensures all JSON writes throughout the application are safe from corruption due to:

  • Application crashes during writes
  • Power failures
  • Concurrent modification attempts

Fixes: #722

How to Test

  • Run the test suites to verify all tests pass
  • Verify settings can be exported and imported correctly
  • Check that configuration files are properly saved

Get in Touch

Discord: KJ7LNW


Important

Introduces safeWriteJson utility for safe JSON file writes, replacing direct writes across the codebase to prevent data corruption.

  • Behavior:
    • Introduces safeWriteJson utility for atomic JSON file writes with backup and error handling.
    • Replaces direct JSON writes with safeWriteJson in importExport.ts, FileContextTracker.ts, apiMessages.ts, taskMessages.ts, webviewMessageHandler.ts, and McpHub.ts.
    • Handles application crashes, power failures, and concurrent writes.
  • Testing:
    • Adds tests for safeWriteJson in safeWriteJson.test.ts to cover success and failure scenarios.
  • Misc:
    • Updates mocks in test files to accommodate safeWriteJson usage.

This description was created by Ellipsis for 75fbed2. You can customize this summary. It will automatically update as commits are pushed.

Eric Wheeler added 2 commits May 20, 2025 18:55
Implements a robust JSON file writing utility that:
- Prevents concurrent writes to the same file using in-memory locks
- Ensures atomic operations with temporary file and backup strategies
- Handles error cases with proper rollback mechanisms
- Cleans up temporary files even when operations fail
- Provides comprehensive test coverage for success and failure scenarios

Signed-off-by: Eric Wheeler <[email protected]>
This change refactors all direct JSON file writes to use the safeWriteJson
utility, which implements atomic file writes to prevent data corruption
during write operations.

- Modified safeWriteJson to accept optional replacer and space arguments
- Updated tests to verify correct behavior with the new implementation

Fixes: cline#722
Signed-off-by: Eric Wheeler <[email protected]>
@KJ7LNW KJ7LNW requested review from mrubens and cte as code owners May 21, 2025 03:27
@hannesrudolph hannesrudolph moved this from New to PR [Pre Approval Review] in Roo Code Roadmap May 22, 2025
@hannesrudolph hannesrudolph moved this from PR [Needs Review] to TEMP in Roo Code Roadmap May 26, 2025
@daniel-lxs daniel-lxs moved this from TEMP to PR [Needs Review] in Roo Code Roadmap May 27, 2025
@KJ7LNW KJ7LNW self-assigned this May 27, 2025
@KJ7LNW KJ7LNW added the bug Something isn't working label May 27, 2025
@KJ7LNW
Copy link
Collaborator Author

KJ7LNW commented May 27, 2025

Note to reviewer:

src/utils/safeWriteJson.ts is the primary implementation. everything else is just testing and simple 1-line replacements like this to hook it into place:

			if (!exists) {
-				await fs.writeFile(mcpPath, JSON.stringify({ mcpServers: {} }, null, 2))
+				await safeWriteJson(mcpPath, { mcpServers: {} })
			}

Note that ..., null, 2 is the default so it is not passed, however those options are still available for pass-through.

Copy link
Collaborator

@daniel-lxs daniel-lxs left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Hey, @KJ7LNW, thank you for the contribution!. Do you think we should remove the console logs? Could this implementation have high impact in performance?

throw new Error(`File operation already in progress for this path: ${absoluteFilePath}`)
}

activeLocks.add(absoluteFilePath)
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Hey, would it make sense to add a timeout mechanism to release the lock after a certain period? This could help prevent permanent locks if the process crashes or hangs while writing. Something like storing a timestamp with the lock and checking if it's expired before respecting it?

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

If you can contrive a call path that fails to release the lock in a catch case, then that is a bug that that that should be fixed.

However, we need to trust try/catches to do their work, or it may introduce file corruption, especially when large conversation histories are updated---which is the target fix for this pull request.

We cannot make assumptions about user workstation disk speeds, and enterprise environments might even use NFS for home directories and you definitely cannot trust the behavior of remote file system performance either: NFS can lock for hours during maintenance or outages that will recover automatically eventually.

The lock set is persistent per Roo instance, so a crash would reset it anyway.

There is a future improvement that I have considered: use a library to do real file system locking across operating systems, then th operating system will release the lock when the file handle closes --- but the Set will clear on crash anyway, so I think that is moot at least for now.

Copy link
Collaborator Author

@KJ7LNW KJ7LNW May 28, 2025

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

also of note: I intentionally designed this to throw upon lock contention.

Presently there are no code paths that contend on single files for parallel operations: if this happens it indicates a problem somewhere else that may point at the "blank screen" bug.

the only race I can think of that you could create that may trigger this is if you open the same task in two different Roo webviews and have them proceed: they may reach a point where they line up modifying the file task history simultaneously, but that would be a seriously messed up task because they do not synchronize between each other anyway.

after your feedback and thinking this through, I am going to change this to use operating system locks that block until completion and warn w/ backtrace in the logs if they detect contention.

try {
// Step 1: Write data to a new temporary file.
actualTempNewFilePath = path.join(
path.dirname(absoluteFilePath),
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

What happens if the process crashes and leaves .new_ or .bak_ temporary files scattered in the directories? Could these accumulate over time?

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

If the process crashes entirely, stale files sit around and do nothing. They will never interfere they will just sit there forever. I do not think we should try to plan for the case where someone pulls the power out of their system and expects the file system to be consistent (and this extends to VS Code crashes as well).

However, I think this is extremely unlikely: if there is a real problem it will be caught and cleaned up.

Now that you mention it, I wonder if there is a streaming JSON write instead of buffering the entire file output so that this is light on memory pressure.

@KJ7LNW
Copy link
Collaborator Author

KJ7LNW commented May 28, 2025

Do you think we should remove the console logs? Could this implementation have high impact in performance?

  • logs: yes
  • errors: no - should errors throw, or just log? The problem with throwing is I do not know if these are handled further up the stack and I want this to be as resilient as possible (especially in the possible case that blank screen crashes are caused by on handled throws, but that is speculation).

My preference is always to throw errors, but not at the cost of crashing the application especially if the error is transient because the next write will successfully complete because this implementation is idempotent .

In any event the errors should also include back traces, so we can figure out the cause if it happens

I am switching to draft mode until logs are cleaned up

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
bug Something isn't working PR - Draft / In Progress
Projects
Status: PR [Draft / In Progress]
Development

Successfully merging this pull request may close these issues.

Task can't be opened
2 participants