-
Notifications
You must be signed in to change notification settings - Fork 1.5k
fix: use safeWriteJson for all JSON file writes #3772
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
base: main
Are you sure you want to change the base?
Conversation
Implements a robust JSON file writing utility that: - Prevents concurrent writes to the same file using in-memory locks - Ensures atomic operations with temporary file and backup strategies - Handles error cases with proper rollback mechanisms - Cleans up temporary files even when operations fail - Provides comprehensive test coverage for success and failure scenarios Signed-off-by: Eric Wheeler <[email protected]>
This change refactors all direct JSON file writes to use the safeWriteJson utility, which implements atomic file writes to prevent data corruption during write operations. - Modified safeWriteJson to accept optional replacer and space arguments - Updated tests to verify correct behavior with the new implementation Fixes: cline#722 Signed-off-by: Eric Wheeler <[email protected]>
Note to reviewer: src/utils/safeWriteJson.ts is the primary implementation. everything else is just testing and simple 1-line replacements like this to hook it into place: if (!exists) {
- await fs.writeFile(mcpPath, JSON.stringify({ mcpServers: {} }, null, 2))
+ await safeWriteJson(mcpPath, { mcpServers: {} })
} Note that |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Hey, @KJ7LNW, thank you for the contribution!. Do you think we should remove the console logs? Could this implementation have high impact in performance?
throw new Error(`File operation already in progress for this path: ${absoluteFilePath}`) | ||
} | ||
|
||
activeLocks.add(absoluteFilePath) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Hey, would it make sense to add a timeout mechanism to release the lock after a certain period? This could help prevent permanent locks if the process crashes or hangs while writing. Something like storing a timestamp with the lock and checking if it's expired before respecting it?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
If you can contrive a call path that fails to release the lock in a catch
case, then that is a bug that that that should be fixed.
However, we need to trust try/catches to do their work, or it may introduce file corruption, especially when large conversation histories are updated---which is the target fix for this pull request.
We cannot make assumptions about user workstation disk speeds, and enterprise environments might even use NFS for home directories and you definitely cannot trust the behavior of remote file system performance either: NFS can lock for hours during maintenance or outages that will recover automatically eventually.
The lock set is persistent per Roo instance, so a crash would reset it anyway.
There is a future improvement that I have considered: use a library to do real file system locking across operating systems, then th operating system will release the lock when the file handle closes --- but the Set will clear on crash anyway, so I think that is moot at least for now.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
also of note: I intentionally designed this to throw upon lock contention.
Presently there are no code paths that contend on single files for parallel operations: if this happens it indicates a problem somewhere else that may point at the "blank screen" bug.
the only race I can think of that you could create that may trigger this is if you open the same task in two different Roo webviews and have them proceed: they may reach a point where they line up modifying the file task history simultaneously, but that would be a seriously messed up task because they do not synchronize between each other anyway.
after your feedback and thinking this through, I am going to change this to use operating system locks that block until completion and warn w/ backtrace in the logs if they detect contention.
try { | ||
// Step 1: Write data to a new temporary file. | ||
actualTempNewFilePath = path.join( | ||
path.dirname(absoluteFilePath), |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
What happens if the process crashes and leaves .new_
or .bak_
temporary files scattered in the directories? Could these accumulate over time?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
If the process crashes entirely, stale files sit around and do nothing. They will never interfere they will just sit there forever. I do not think we should try to plan for the case where someone pulls the power out of their system and expects the file system to be consistent (and this extends to VS Code crashes as well).
However, I think this is extremely unlikely: if there is a real problem it will be caught and cleaned up.
Now that you mention it, I wonder if there is a streaming JSON write instead of buffering the entire file output so that this is light on memory pressure.
My preference is always to throw errors, but not at the cost of crashing the application especially if the error is transient because the next write will successfully complete because this implementation is idempotent . In any event the errors should also include back traces, so we can figure out the cause if it happens I am switching to draft mode until logs are cleaned up |
Context
This PR introduces
safeWriteJson
utility and refactors all direct JSON file writes across the codebase to use it, preventing potential data corruption during write operations.Implementation
The new
safeWriteJson
utility provides:replacer
andspace
argumentsAll direct JSON file writes have been refactored to use this utility, including:
This change ensures all JSON writes throughout the application are safe from corruption due to:
Fixes: #722
How to Test
Get in Touch
Discord: KJ7LNW
Important
Introduces
safeWriteJson
utility for safe JSON file writes, replacing direct writes across the codebase to prevent data corruption.safeWriteJson
utility for atomic JSON file writes with backup and error handling.safeWriteJson
inimportExport.ts
,FileContextTracker.ts
,apiMessages.ts
,taskMessages.ts
,webviewMessageHandler.ts
, andMcpHub.ts
.safeWriteJson
insafeWriteJson.test.ts
to cover success and failure scenarios.safeWriteJson
usage.This description was created by
for 75fbed2. You can customize this summary. It will automatically update as commits are pushed.