Skip to content

fix: create doc_status even when LightRAG lacks multimodal insert args#255

Merged
LarFii merged 4 commits intoHKUDS:mainfrom
DeepaliPaspule:fix/doc-status-244
May 6, 2026
Merged

fix: create doc_status even when LightRAG lacks multimodal insert args#255
LarFii merged 4 commits intoHKUDS:mainfrom
DeepaliPaspule:fix/doc-status-244

Conversation

@DeepaliPaspule
Copy link
Copy Markdown
Contributor

Summary

  • bootstrap doc_status records before document insertion so process_document_complete and insert_content_list always persist a deletable document entry
  • make multimodal text insertion compatible with LightRAG versions that do not accept multimodal_content or scheme_name in ainsert()
  • add a regression test covering both the compatibility fallback and the missing doc_status bug from [Bug]: No doc_status created after process_document_complete #244
  • merge the latest upstream main into this branch so the PR is current with recent repository changes

Testing

  • python3 -m pytest tests/testdoc_status_creation.py
  • python3 -m pytest tests/testdoc_status_creation.py tests/test_raganything_example.py # fails locally because lightrag is not installed in this environment

Closes #244

@Yukari-Tryhard
Copy link
Copy Markdown

hi @DeepaliPaspule i just tried your solution and it works with text content but for image content (and or multi modal contents i believe) it doesn't work. Is that the case for you too or it's just me

@DeepaliPaspule
Copy link
Copy Markdown
Contributor Author

hi @DeepaliPaspule i just tried your solution and it works with text content but for image content (and or multi modal contents i believe) it doesn't work. Is that the case for you too or it's just me

Thanks for checking — you were right, there was still a compatibility gap for image/multimodal-only documents on older LightRAG versions.

I’ve pushed a follow-up fix to the same branch/PR. The update falls back to a schema-compatible doc_status update when the storage rejects the multimodal_processed field, so image-only and multimodal documents can still complete properly.

I also added a regression test covering that image-only case

@LarFii
Copy link
Copy Markdown
Collaborator

LarFii commented Apr 25, 2026

Thanks for the follow-up fixes and tests here. Compared with #251, this is the more complete direction because it filters optional ainsert kwargs such as multimodal_content and scheme_name based on the actual LightRAG signature, and it also adds regression coverage for doc_status creation.

There are still a couple of things I would like clarified before merge:

  1. In _mark_multimodal_processing_complete, when the storage rejects multimodal_processed, the fallback writes only a schema-compatible status update. That is useful for older LightRAG versions, but then later reads that rely on multimodal_processed default to false/missing. Can you confirm this won't cause repeated multimodal processing or inconsistent get_document_processing_status() / is_document_fully_processed() results on those older storage implementations?

  2. Please include the exact validation run against the old-LightRAG-style ainsert signature where neither multimodal_content nor scheme_name is accepted. This is the compatibility case that fix: detect ainsert signature instead of silently swallowing TypeError #251 still misses, and it is important for [Bug]: No doc_status created after process_document_complete #244.

  3. Minor: if you touch the tests again, please consider renaming tests/testdoc_status_creation.py to tests/test_doc_status_creation.py for readability. Not a blocker by itself.

I would prefer to resolve this PR's semantics before merging #253, since #253 changes where multimodal completion state lives and overlaps with this fix area.

@DeepaliPaspule
Copy link
Copy Markdown
Contributor Author

Thanks for the follow-up fixes and tests here. Compared with #251, this is the more complete direction because it filters optional ainsert kwargs such as multimodal_content and scheme_name based on the actual LightRAG signature, and it also adds regression coverage for doc_status creation.

There are still a couple of things I would like clarified before merge:

  1. In _mark_multimodal_processing_complete, when the storage rejects multimodal_processed, the fallback writes only a schema-compatible status update. That is useful for older LightRAG versions, but then later reads that rely on multimodal_processed default to false/missing. Can you confirm this won't cause repeated multimodal processing or inconsistent get_document_processing_status() / is_document_fully_processed() results on those older storage implementations?
  2. Please include the exact validation run against the old-LightRAG-style ainsert signature where neither multimodal_content nor scheme_name is accepted. This is the compatibility case that fix: detect ainsert signature instead of silently swallowing TypeError #251 still misses, and it is important for [Bug]: No doc_status created after process_document_complete #244.
  3. Minor: if you touch the tests again, please consider renaming tests/testdoc_status_creation.py to tests/test_doc_status_creation.py for readability. Not a blocker by itself.

I would prefer to resolve this PR's semantics before merging #253, since #253 changes where multimodal completion state lives and overlaps with this fix area.

Thanks, I’ve pushed a follow-up update to the same PR.

I addressed the older-storage semantics by persisting multimodal completion state in a separate compatibility KV namespace when doc_status rejects the multimodal_processed field. Reads now check that fallback state as well, so it should not cause repeated multimodal processing or inconsistent get_document_processing_status() / is_document_fully_processed() behavior on older LightRAG-style storage.

I also renamed the test file for readability and re-ran:
python3 -m pytest tests/test_doc_status_creation.py

That suite includes the explicit compatibility case where ainsert accepts neither multimodal_content nor scheme_name.

@Yukari-Tryhard
Copy link
Copy Markdown

@LarFii Do we have any further action on this issue?

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

[Bug]: No doc_status created after process_document_complete

3 participants