Skip to content

Task can't be opened #722

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Open
Hponky opened this issue Feb 2, 2025 · 22 comments · May be fixed by #3772
Open

Task can't be opened #722

Hponky opened this issue Feb 2, 2025 · 22 comments · May be fixed by #3772
Assignees
Labels
bug Something isn't working Issue - In Progress Someone is actively working on this. Should link to a PR soon.

Comments

@Hponky
Copy link

Hponky commented Feb 2, 2025

Which version of the app are you using?

3.3.9

Which API Provider are you using?

OpenRouter

Which Model are you using?

Gemini flash 2.0

What happened?

Sometimes, specific tasks can't be opened. I created a task that generated a bug in my app, and to correct it, I had to reopen this task. The curious thing is that with this specific task, I couldn't open it anymore.

Steps to reproduce

I haven't identified a pattern to explain how to replicate the bug; this issue occurs suddenly.

Relevant API REQUEST output

Additional context

No response

@Hponky Hponky added the bug Something isn't working label Feb 2, 2025
@mrubens
Copy link
Collaborator

mrubens commented Feb 2, 2025

I think this happens when the JSON representing the task gets corrupted (for me at least, often when I try to terminate a task that's using up too many resources or something). I would love to dig in deeper but haven't had time yet...

@nissa-seru nissa-seru added Issue - Needs Scoping Valid, but needs effort estimate or design input before work can start. and removed bug Something isn't working labels Feb 7, 2025
@hannesrudolph hannesrudolph moved this to To triage in Roo Code Roadmap Mar 4, 2025
@hannesrudolph hannesrudolph moved this from To triage to Backlog in Roo Code Roadmap Mar 4, 2025
@hannesrudolph
Copy link
Collaborator

Can't repro with this info. Should be handled with #1383

@github-project-automation github-project-automation bot moved this from Backlog to Done in Roo Code Roadmap Mar 5, 2025
@mmpayi
Copy link

mmpayi commented Mar 26, 2025

Hit the same issue when machine ran out of memory and had to be cold rebooted. Couldn't reopen the Task I was working on in my last session. I assumed JSON was corrupt. Using ProcMon I found where the caches are (%AppData%\Windsurf\User\globalStorage\rooveterinaryinc.roo-cline\tasks).

I think the issue is that api_conversation_history.json gets zeroed out. It seems to never be zero in other tasks I have in the cache.

Image

So I used Windows to restore previous version of the file and the task loaded up fine. Problem solved.

Image

TLDR is that writes to api_conversation_history.json aren't append-only so file may get wiped out if the process crashes at the wrong time. Needs to be append-only for BCDR.

@hannesrudolph hannesrudolph reopened this Apr 1, 2025
@hannesrudolph
Copy link
Collaborator

@KJ7LNW thoughts?

@KJ7LNW
Copy link
Collaborator

KJ7LNW commented Apr 1, 2025

I have seen it before but I do not know anything about that section of the code

@KJ7LNW
Copy link
Collaborator

KJ7LNW commented Apr 1, 2025

Probably the following pseudocode is necessary, these instructions could be handed to an AI to do the actual work.

Important: create tests to validate correct behavior at any failure point in try (all 8 steps, or whatever it ends up being).

In simple terms:

  1. write new api_conversation_history to a temp file and then atomic-swap into place if supported, else rename:
  2. rename the old file aside
  3. rename new file in place of old file
  4. delete temp files
  5. cleanup on error
try {
1. flock the file 
2. write new content to temporary file `api_conversation_history.json.tmp1<random>`
3. atomic swap the files (which is not supported on all operating systems)
4. if atomic swap is not available, then:
5. rename existing api_conversation_history.json file to api_conversation_history.json.tmp2<different random>
6. rename new file to api_conversation_history.json
7. delete api_conversation_history.json.tmp2<different random>
8. release lock 

}
catch {
1. delete temporary file `api_conversation_history.json.tmp1<random>`
2. if api_conversation_history.json.tmp2<different random> exists, rename it back to api_conversation_history.json
3. release the lock 
}

@KJ7LNW
Copy link
Collaborator

KJ7LNW commented Apr 1, 2025

if someone feeds that into an artificial Intelligence it can probably do the work

@mmpayi
Copy link

mmpayi commented Apr 1, 2025

I found another repro. This time file wasn't zeroed out. It just wasn't fully written to. End of the file had

{"ts":1743186528900,"type":"say","say":"reasoning","text":"It looks like we're experiencing errors because we're trying to use an ErrorDetails property on the PayiIngestResponse class, but that property doesn't exist in the class definition. We need to update the PayiIngestResponse class to include this property.\n\nLet's first check the current definition of the PayiIngestResponse class:","partial":true},{"

Given recent greying out of UI, I've been frequently using Ctrl+Shift+P>Developer:Restart Extension Host as that's the only known mitigation. It's resumed without issues 100 times so this must be the unlucky 101.

The fix was to replace

},{"

with

}]

TLDR: when trying to open a task, if JSON wasn't fully serialize, trim the end until it becomes valid.

@KJ7LNW
Copy link
Collaborator

KJ7LNW commented Apr 1, 2025

I am glad that you found a workaround. Ultimately we need to handle that file a little bit more safely...

@mmpayi
Copy link

mmpayi commented Apr 2, 2025

Workaround is partial - all of the recent conversation/context gets lost. It does get me back into the task, but I need to the manually get Roo back on its tracks. This implies history isn't logged linearly and dropping last appends affects more than just last prompts.

@KJ7LNW
Copy link
Collaborator

KJ7LNW commented Apr 2, 2025

my guess is that this is not an append operation, it completely rewrites json. was your context particularly large? I wonder if it was timing out on a promise and aborted the write.

@mmpayi
Copy link

mmpayi commented Apr 3, 2025

Yes, most of my contexts are multiple days of continuous work which makes losing them very painful. Switching to appendonly is the way.

@KJ7LNW
Copy link
Collaborator

KJ7LNW commented Apr 4, 2025

Switching to appendonly

Good idea (ie, NDJSON), however we plan to modify the content at some point to reduce context size intelligently (like removing multiple identical reads) when the context as large enough that it would need to be truncated anyway.

I think we need to have a transactional process like the one outlined above.

Devs: See this link for the action item on this ticket:

@KJ7LNW KJ7LNW added bug Something isn't working Issue - Unassigned / Actionable Clear and approved. Available for contributors to pick up. and removed Issue - Needs Scoping Valid, but needs effort estimate or design input before work can start. labels Apr 4, 2025
@hannesrudolph hannesrudolph closed this as not planned Won't fix, can't repro, duplicate, stale Apr 7, 2025
@hannesrudolph hannesrudolph reopened this Apr 7, 2025
@hannesrudolph hannesrudolph moved this from New to Issue [Unassigned] in Roo Code Roadmap Apr 7, 2025
@gauthierhavet
Copy link

gauthierhavet commented May 5, 2025

same errors of corrupted api history... lost days of work too...
For me it happens when i delete one of my comment and the comments below most of time.. but sometimes it is when files goes big.

@KJ7LNW
Copy link
Collaborator

KJ7LNW commented May 5, 2025

same errors of corrupted api history... lost days of work too...

There maybe a work around-here to get your task working again:

#722 (comment)

@KJ7LNW
Copy link
Collaborator

KJ7LNW commented May 5, 2025

@gauthierhavet - if you are interested in contributing then the following comment explains how this could be fixed, so far none of the developers have taken up the task:

@KJ7LNW
Copy link
Collaborator

KJ7LNW commented May 5, 2025

@hannesrudolph if you know anyone interested in contributing or getting their feet wet in Roo development, then I think this is a relatively easy fix with low risk of side effects for someone who wishes to take it on.

@samhvw8
Copy link
Collaborator

samhvw8 commented May 6, 2025

@KJ7LNW should we migrate to NDJSON or JSONL ?

@dlab-anton
Copy link

Here is a potential fix, if anyone is reproducing the issue wants to try you can input this into Roo Code mode with 2.5 pro.

Developer Handoff: Bug #722 - Task Can't Be Opened Due to Corrupted api_conversation_history.json

Date: May 8, 2025
Prepared by: Roo Issue Solutions (Technical Assistant)

1. Problem Description

Users are reporting that specific tasks in the Roo-Code application sometimes become unopenable. The root cause appears to be the corruption of the api_conversation_history.json file associated with the task. This corruption can manifest as the file being zero bytes or incompletely written, leading to errors when the application attempts to load the task.

2. Summary of Observations from GitHub Issue #722

The following observations have been reported by users and collaborators in GitHub Issue #722:

  • O1: Specific tasks intermittently become unopenable.
  • O2: The JSON file representing the task (api_conversation_history.json) gets corrupted.
  • O3: Corruption often occurs when users try to terminate a task that is using up too many resources.
  • O4: Corruption has occurred after a machine ran out of memory and required a cold reboot.
  • O5: api_conversation_history.json is sometimes found to be zero bytes.
  • O6: api_conversation_history.json is sometimes found to be incompletely written (e.g., ending abruptly with {"ts":..., "partial":true},{").
  • O7: The issue can be triggered by restarting the VS Code Extension Host (Ctrl+Shift+P > Developer:Restart Extension Host).
  • O8: Some users suspect deleting comments in the task history might trigger the issue, especially with large task histories.
  • O9: The problem seems more prevalent when the api_conversation_history.json file becomes very large (due to extensive task context/history).

3. Code Analysis & Hypothesis

Hypothesis: The corruption of api_conversation_history.json is caused by a non-atomic file write operation. The application rewrites the entire file every time the conversation history is updated. If this write operation is interrupted, the file can be left in an inconsistent (corrupted) state.

Responsible Code:
The primary location for this issue is the saveApiConversationHistory() method within the file Roo-Code/src/core/Cline.ts.

// File: Roo-Code/src/core/Cline.ts
// Lines: 319-327
private async saveApiConversationHistory() {
    try {
        const filePath = path.join(await this.ensureTaskDirectoryExists(), GlobalFileNames.apiConversationHistory);
        // CRITICAL LINE: Non-atomic overwrite
        await fs.writeFile(filePath, JSON.stringify(this.apiConversationHistory));
    } catch (error) {
        // in the off chance this fails, we don't want to stop the task
        console.error("Failed to save API conversation history:", error);
    }
}

Explanation of How Hypothesis Addresses Observations:

  • O1, O2 (Unopenable tasks, Corrupted JSON): These are direct consequences of the file being unreadable due to being zero-byte or containing invalid/incomplete JSON.
  • O3, O4, O7 (Terminating tasks, OOM/Reboot, Restarting Extension Host): These are all scenarios that can interrupt the application abruptly. If an interruption occurs during the fs.writeFile() operation on api_conversation_history.json (Roo-Code/src/core/Cline.ts:322), the file system may not have completed writing the new data. Standard fs.writeFile typically truncates the file upon opening it for writing. If the interruption happens after truncation but before the new content is fully written, the file will be corrupted.
  • O5 (Zero-byte file): This occurs if the interruption happens immediately after the file is truncated by fs.writeFile() but before any significant data (or any data at all) is written.
  • O6 (Incompletely written file): This occurs if the interruption happens while the JSON.stringify(this.apiConversationHistory) output is being written to the disk. The JSON.stringify() operation itself could also be interrupted if this.apiConversationHistory is very large, leading to an incomplete JSON string being passed to fs.writeFile().
  • O8, O9 (Deleting comments, Large files):
    • Deleting comments or any action that modifies the task history (which is frequent, as per addToApiConversationHistory() (Roo-Code/src/core/Cline.ts:308) and overwriteApiConversationHistory() (Roo-Code/src/core/Cline.ts:314)) triggers a call to saveApiConversationHistory(), thus initiating a full rewrite.
    • When api_conversation_history.json is large, both the JSON.stringify() operation and the subsequent fs.writeFile() operation take longer. This increased duration widens the time window during which an interruption can occur, making corruption more probable.

Supporting Code Analysis:

4. Proposed Solution

Implement a transactional (atomic) write mechanism for api_conversation_history.json. This involves writing the new content to a temporary file first, and then, upon successful write to the temporary file, atomically renaming the temporary file to the actual target file name (api_conversation_history.json). This ensures that the original file is only replaced if the new version is complete and valid.

The pseudocode and strategy outlined by KJ7LNW in GitHub Issue #722, Comment should be followed:

In simple terms:

  1. Write new api_conversation_history.json content to a temporary file (e.g., api_conversation_history.json.tmp<random>).
  2. If an atomic swap operation is available on the OS, use it.
  3. If not, rename the old api_conversation_history.json to a backup name (e.g., api_conversation_history.json.bak<random>).
  4. Rename the new temporary file to api_conversation_history.json.
  5. Delete the backup file.
  6. Implement robust error handling and cleanup for temporary/backup files in case of failures at any step. File locking (flock) should also be considered as part of the implementation if concurrent access is a concern, though the primary issue here is interruption rather than concurrency.

5. TODO for Developer

  1. Modify saveApiConversationHistory():

    • File: Roo-Code/src/core/Cline.ts
    • Method: private async saveApiConversationHistory() (Roo-Code/src/core/Cline.ts:319)
    • Action: Replace the current await fs.writeFile(...) (Roo-Code/src/core/Cline.ts:322) with a transactional write implementation as described in Section 4.
      • Generate a unique temporary filename within the same task directory.
      • Write JSON.stringify(this.apiConversationHistory) to this temporary file.
      • Use fs.rename() to atomically replace the original api_conversation_history.json with the temporary file. Ensure proper error handling for fs.rename().
      • Implement cleanup logic:
        • If the temporary file write fails, delete the temporary file.
        • If renaming fails, attempt to restore any backed-up original file and delete the temporary file.
  2. Error Handling: Ensure that all file operations (write to temp, rename, delete backup) within the new transactional save are wrapped in try/catch blocks with appropriate error logging and recovery attempts (e.g., trying to restore a backup if a rename fails).

  3. Testing:

    • Create unit tests for the new transactional save logic, mocking fs operations to simulate failures at different stages (e.g., failure to write temp file, failure to rename).
    • If possible, create integration tests that simulate application interruption during the save process to verify that api_conversation_history.json remains intact or is correctly rolled back.
  4. (Optional but Recommended) Implement Fallback for Corrupted Files:

    • File: Roo-Code/src/core/Cline.ts
    • Method: private async getSavedApiConversationHistory() (Roo-Code/src/core/Cline.ts:297)
    • Action: As a secondary measure, enhance this method. If JSON.parse() fails (indicating a corrupt file), attempt to trim trailing characters from the file content until it becomes valid JSON. This could help salvage partially written histories, as suggested by mmpayi in the GitHub issue. This should be logged clearly if recovery is attempted.

6. Expected Outcome

Implementing the transactional write will significantly reduce or eliminate instances of api_conversation_history.json corruption, making tasks more reliably openable even if the application is interrupted. The optional fallback will provide a mechanism to recover data from already corrupted files where possible.

@dlab-anton
Copy link

@KJ7LNW should we migrate to NDJSON or JSONL ?

Atomic rewrites (write to a temporary file, then rename) are still needed for full history modifications even if the file format is JSONL as I understand it.

JSONL would still be good though here is how it may apply:

Using JSONL with an append-only strategy for new entries means:

  • Each conversation entry is a separate JSON object on its own line.
  • New entries are appended to the file. If an interruption occurs, only the last appended line might be corrupted; previous entries remain intact.
  • The application can read the file line by line, skipping any single corrupted line, thus preserving most of the history and keeping the task openable.

@samhvw8
Copy link
Collaborator

samhvw8 commented May 8, 2025

@dlab-anton Yes, I will make it easier to fix, but I think we should implement something using p-queue to queue write jobs for the same path that will be written to.

@KJ7LNW
Copy link
Collaborator

KJ7LNW commented May 8, 2025

JSONL would still be good though here is how it may apply

I am fine with converting the file format which will be nice and fast for the general case of append, but let's do this in manageable steps:

What do you think about the following:

  1. start by implementing rename because it is trivial, which immediately closes this issue.

Then, in a separate PR:

  1. convert to JSONL
  2. keep the current rename function because future pull requests need to replace the entire file from time to time (eg, Roo environment details compression - WIP #1661)
  3. refactor the current implementation to append when appropriate

@hannesrudolph hannesrudolph moved this from New to Issue [Unassigned] in Roo Code Roadmap May 20, 2025
@KJ7LNW KJ7LNW linked a pull request May 21, 2025 that will close this issue
@hannesrudolph hannesrudolph moved this from Issue [Unassigned] to Issue [In Progress] in Roo Code Roadmap May 21, 2025
@hannesrudolph hannesrudolph added Issue - In Progress Someone is actively working on this. Should link to a PR soon. and removed Issue - Unassigned / Actionable Clear and approved. Available for contributors to pick up. labels May 21, 2025
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
bug Something isn't working Issue - In Progress Someone is actively working on this. Should link to a PR soon.
Projects
Status: Issue [In Progress]
Development

Successfully merging a pull request may close this issue.

9 participants