User Story
As a software developer using the eknowledge package,
I want the execute_graph_generation retry logic to handle LLM API rate limits
so that the service avoids temporary bans during periods of heavy usage.
Background
The current retry logic in eknowledge/main.py (lines 84-145) blindly retries failed LLM API calls without accounting for rate limits. This creates significant risk during high-volume graph generation, as repeated rapid retries could trigger API bans from providers like LLM7. The sleep_time parameter (line 144) uses a fixed delay that doesn't adapt to API feedback.
Key risks:
- Service disruptions from API bans
- Wasted compute resources during rate-limited states
- No integration with standard rate-limit headers (e.g.,
X-RateLimit-Remaining)
Acceptance Criteria
User Story
As a software developer using the eknowledge package,
I want the
execute_graph_generationretry logic to handle LLM API rate limitsso that the service avoids temporary bans during periods of heavy usage.
Background
The current retry logic in
eknowledge/main.py(lines 84-145) blindly retries failed LLM API calls without accounting for rate limits. This creates significant risk during high-volume graph generation, as repeated rapid retries could trigger API bans from providers like LLM7. Thesleep_timeparameter (line 144) uses a fixed delay that doesn't adapt to API feedback.Key risks:
X-RateLimit-Remaining)Acceptance Criteria
execute_graph_generationineknowledge/main.pyto:429 Too Many Requestserror handling in the exception block (line 129)Retry-Afterheadersmin(sleep_time * 2^retry_count, max_backoff)USER_PROMPTdocumentation inprompts.pyto mention rate limit awarenesstests/test_eknowledge.pyverifying: