fix: create doc_status even when LightRAG lacks multimodal insert args by DeepaliPaspule · Pull Request #255 · HKUDS/RAG-Anything

DeepaliPaspule · 2026-04-21T17:26:13Z

Summary

bootstrap doc_status records before document insertion so process_document_complete and insert_content_list always persist a deletable document entry
make multimodal text insertion compatible with LightRAG versions that do not accept multimodal_content or scheme_name in ainsert()
add a regression test covering both the compatibility fallback and the missing doc_status bug from [Bug]: No doc_status created after process_document_complete #244
merge the latest upstream main into this branch so the PR is current with recent repository changes

Testing

python3 -m pytest tests/testdoc_status_creation.py
python3 -m pytest tests/testdoc_status_creation.py tests/test_raganything_example.py # fails locally because lightrag is not installed in this environment

Closes #244

Yukari-Tryhard · 2026-04-22T07:20:12Z

hi @DeepaliPaspule i just tried your solution and it works with text content but for image content (and or multi modal contents i believe) it doesn't work. Is that the case for you too or it's just me

DeepaliPaspule · 2026-04-22T10:28:16Z

hi @DeepaliPaspule i just tried your solution and it works with text content but for image content (and or multi modal contents i believe) it doesn't work. Is that the case for you too or it's just me

Thanks for checking — you were right, there was still a compatibility gap for image/multimodal-only documents on older LightRAG versions.

I’ve pushed a follow-up fix to the same branch/PR. The update falls back to a schema-compatible doc_status update when the storage rejects the multimodal_processed field, so image-only and multimodal documents can still complete properly.

I also added a regression test covering that image-only case

LarFii · 2026-04-25T09:23:45Z

Thanks for the follow-up fixes and tests here. Compared with #251, this is the more complete direction because it filters optional ainsert kwargs such as multimodal_content and scheme_name based on the actual LightRAG signature, and it also adds regression coverage for doc_status creation.

There are still a couple of things I would like clarified before merge:

In _mark_multimodal_processing_complete, when the storage rejects multimodal_processed, the fallback writes only a schema-compatible status update. That is useful for older LightRAG versions, but then later reads that rely on multimodal_processed default to false/missing. Can you confirm this won't cause repeated multimodal processing or inconsistent get_document_processing_status() / is_document_fully_processed() results on those older storage implementations?
Please include the exact validation run against the old-LightRAG-style ainsert signature where neither multimodal_content nor scheme_name is accepted. This is the compatibility case that fix: detect ainsert signature instead of silently swallowing TypeError #251 still misses, and it is important for [Bug]: No doc_status created after process_document_complete #244.
Minor: if you touch the tests again, please consider renaming tests/testdoc_status_creation.py to tests/test_doc_status_creation.py for readability. Not a blocker by itself.

I would prefer to resolve this PR's semantics before merging #253, since #253 changes where multimodal completion state lives and overlaps with this fix area.

DeepaliPaspule · 2026-04-28T18:12:33Z

Thanks for the follow-up fixes and tests here. Compared with #251, this is the more complete direction because it filters optional ainsert kwargs such as multimodal_content and scheme_name based on the actual LightRAG signature, and it also adds regression coverage for doc_status creation.

There are still a couple of things I would like clarified before merge:

In _mark_multimodal_processing_complete, when the storage rejects multimodal_processed, the fallback writes only a schema-compatible status update. That is useful for older LightRAG versions, but then later reads that rely on multimodal_processed default to false/missing. Can you confirm this won't cause repeated multimodal processing or inconsistent get_document_processing_status() / is_document_fully_processed() results on those older storage implementations?

Please include the exact validation run against the old-LightRAG-style ainsert signature where neither multimodal_content nor scheme_name is accepted. This is the compatibility case that fix: detect ainsert signature instead of silently swallowing TypeError #251 still misses, and it is important for [Bug]: No doc_status created after process_document_complete #244.

Minor: if you touch the tests again, please consider renaming tests/testdoc_status_creation.py to tests/test_doc_status_creation.py for readability. Not a blocker by itself.

I would prefer to resolve this PR's semantics before merging #253, since #253 changes where multimodal completion state lives and overlaps with this fix area.

Thanks, I’ve pushed a follow-up update to the same PR.

I addressed the older-storage semantics by persisting multimodal completion state in a separate compatibility KV namespace when doc_status rejects the multimodal_processed field. Reads now check that fallback state as well, so it should not cause repeated multimodal processing or inconsistent get_document_processing_status() / is_document_fully_processed() behavior on older LightRAG-style storage.

I also renamed the test file for readability and re-ran:
python3 -m pytest tests/test_doc_status_creation.py

That suite includes the explicit compatibility case where ainsert accepts neither multimodal_content nor scheme_name.

Yukari-Tryhard · 2026-05-04T08:58:58Z

@LarFii Do we have any further action on this issue?

DeepaliPaspule added 2 commits April 21, 2026 12:50

fix: always create doc_status records

feddfd4

Merge remote-tracking branch 'upstream/main' into fix/doc-status-244

00253cc

DeepaliPaspule mentioned this pull request Apr 21, 2026

fix: create doc_status even when LightRAG lacks multimodal insert args #246

Closed

fix: support multimodal doc_status on older lightrag

d88a30d

This was referenced Apr 25, 2026

fix: detect ainsert signature instead of silently swallowing TypeError #251

Closed

fix: store multimodal_processed in separate KV namespace to prevent DocProcessingStatus errors #253

Closed

fix: persist multimodal completion compat state

56f0f2a

LarFii merged commit dc12354 into HKUDS:main May 6, 2026

LarFii mentioned this pull request May 6, 2026

feat: incremental folder scan — skip unchanged files via MD5 manifest #239

Closed

7 tasks

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

fix: create doc_status even when LightRAG lacks multimodal insert args#255

fix: create doc_status even when LightRAG lacks multimodal insert args#255
LarFii merged 4 commits intoHKUDS:mainfrom
DeepaliPaspule:fix/doc-status-244

DeepaliPaspule commented Apr 21, 2026

Uh oh!

Yukari-Tryhard commented Apr 22, 2026

Uh oh!

DeepaliPaspule commented Apr 22, 2026

Uh oh!

LarFii commented Apr 25, 2026

Uh oh!

DeepaliPaspule commented Apr 28, 2026

Uh oh!

Yukari-Tryhard commented May 4, 2026

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

3 participants

Conversation

DeepaliPaspule commented Apr 21, 2026

Summary

Testing

Uh oh!

Yukari-Tryhard commented Apr 22, 2026

Uh oh!

DeepaliPaspule commented Apr 22, 2026

Uh oh!

LarFii commented Apr 25, 2026

Uh oh!

DeepaliPaspule commented Apr 28, 2026

Uh oh!

Yukari-Tryhard commented May 4, 2026

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

3 participants