High Inode Usage on MinIO Backend (~8.6M files/day) Leading to Rapid Filesystem Exhaustion #8713
-
GitHub Issue Title: High Inode Usage on MinIO Backend (~8.6M files/day) Leading to Rapid Filesystem ExhaustionBody:Hello Langfuse Team, I am experiencing an issue where my Langfuse instance is generating an extremely high number of objects in its MinIO backend, leading to rapid inode exhaustion on the filesystem. I'm hoping you can help me understand the source of this behavior. Background & Context
The Current Problem: Unsustainable Inode GrowthAfter starting fresh, the new XFS volume is filling up with inodes at an alarming and unsustainable rate. Here is the inode usage from On August 13, 2025 (2 days after clean start): On August 25, 2025 (Today): Analysis:
My Expectation vs. RealityMy understanding is that MinIO is primarily used for storing multimedia uploads (images, etc.) associated with traces. These are typically larger files, so I was very surprised to see the inode count (the number of files) become the bottleneck again, especially at this extreme rate. This behavior seems inconsistent with storing only multimedia content. What I've Tried
My Questions for the Team
Thank you for your time and for this fantastic tool. Any insight you can provide would be greatly appreciated. System Details
|
Beta Was this translation helpful? Give feedback.
Replies: 2 comments 5 replies
-
|
Hi @weicheng59, thanks for raising this. our current ingestion pipeline is heavily using S3 under the hood. I would recommend to:
|
Beta Was this translation helpful? Give feedback.
-
|
Thank you for the response. I have migrated my application to the new Python SDK, but my S3 object creation is still very high. Questions
Current ImplementationCurrently, I am using the following way to create a trace: langfuse = Langfuse(
public_key=os.environ["LANGFUSE_PUBLIC_KEY"],
secret_key=os.environ["LANGFUSE_SECRET_KEY"],
host="http://192.168.20.236:3000",
environment="production",
tracing_enabled=True
)
# This first part seems to be separate from the main logic below
trace_id = langfuse.create_trace_id(seed=request_id)
langfuse.start_span(name="final", trace_context={"trace_id": trace_id})
span.update_trace(name=request_id, input=langfuse_input, output=response_text)
span.end()
# Main tracing logic
trace_id = langfuse.create_trace_id(seed=request_id)
with langfuse.start_as_current_span(name="before_stage1", input=langfuse_input, trace_context={"trace_id": trace_id}) as before_stage1_span:
before_stage1_span.update_trace(
name=str(request_id),
user_id=json_data["tdxid"],
session_id=str(request_id),
metadata={
"input": langfuse_input,
"knowledge_type": json_data["knowledge_type"],
"screenType": json_data["screenType"],
"ip": json_data["ip"]
}
)
before_stage1_span.update(output=stage1_prompt["prompt"])
with langfuse.start_as_current_generation(name="stage1", model=stage1_prompt.get('model'), input=[{"role": "user", "content": stage1_prompt['prompt']}], trace_context={"trace_id": trace_id}) as stage1_span:
res_str = await stage1_llm_call()
stage1_span.update(
output=json.loads(res_str)["choices"][0]['text'],
usage=json.loads(res_str).get("usage", {})
)
with langfuse.start_as_current_span(name="after_stage1", input=json.loads(res_str)["choices"][0]['text'], trace_context={"trace_id": trace_id}) as after_stage1_span:
outputs1 = await after_stage1_wrapper()
after_stage1_span.update(output=outputs1)
trace_id = langfuse.create_trace_id(seed=request_id) |
Beta Was this translation helpful? Give feedback.
This is what worked for us.
Create a file lifecycle.json:
Apply the policy:
Bash
~/minio-binaries/mc ilm import local/langfuse < lifecycle.json