-
Notifications
You must be signed in to change notification settings - Fork 187
Open
Labels
bugSomething isn't workingSomething isn't working
Description
Before You Report a Bug, Please Confirm You Have Done The Following...
- I have updated to the latest version of the packages.
- I have searched for both existing issues and closed issues and found none that matched my issue.
neo4j-graphrag-python's version
1.10.1
Python version
3.12
Operating System
Debian 13
Dependencies
"datasets==3.6.0",
"flask>=3.1.2",
"neo4j>=5.28.2",
"neo4j-graphrag[nlp,ollama,sentence-transformers]>=1.10.1",
"streamlit>=1.52.1",
Reproducible example
PDF_FILE = './some-file.pdf' # a PDF file of big size -- bigger than the model context
kg_builder = SimpleKGPipeline(
llm=llm,
driver=neo4j_driver,
embedder=embedder,
from_pdf=True,
text_splitter=text_splitter,
)
await kg_builder.run_async(file_path=PDF_FILE)Relevant Log Output
JSONDecoder error stating that the output is not in JSON format
Expected Result
I expect the pipeline to finish
What happened instead?
In the schema extraction phase, the pipeline does not split the document(s) in chunks; rather it gives the whole document, despite its size, to the ollama endpoint in a single prompt.
Additional Info
What happens is that the pipeline expects to extract the schema with a single request to OLLAMA giving it the whole document. Instead, it should perform this step chunk-by-chunk.
By giving the whole document in the prompt, the model loses track of the instructions.
Reactions are currently unavailable
Metadata
Metadata
Assignees
Labels
bugSomething isn't workingSomething isn't working