Skip to content

Fix LiteLLM infinite loop issue with Ollama/Gemma3 models #166

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Open
wants to merge 1 commit into
base: main
Choose a base branch
from

Conversation

LovepreetSinghVerma
Copy link

Problem
When using ADK with LiteLLM and certain models (particularly Ollama/Gemma3), the system can get stuck in an infinite loop under specific conditions:

  • The model makes a function call with arguments
  • The function executes and returns a result
  • The model tries to make another call to the same function, but with malformed JSON in the arguments
  • Due to parsing failures, the system gets stuck repeating the same function call indefinitely

This creates a poor user experience and wastes resources as the model repeatedly attempts the same operation without making progress.

Solution
This PR implements two key fixes to address the issue:

  1. Robust JSON Parsing: Enhanced the response parsing logic to handle malformed JSON in function call arguments, with multiple fallback strategies.
  2. Loop Detection Mechanism: Added functionality to detect and break potential infinite loops when the same function is called consecutively more than a configurable threshold (default: 5 times).

Implementation Details

  1. Improved _model_response_to_generate_content_response with comprehensive validation and error handling
  2. Added loop tracking state to the LiteLlm class
  3. Enhanced generate_content_async to monitor consecutive function calls and provide helpful responses when loops are detected
  4. Maintained backward compatibility with all existing functionality

Testing
The fix includes:

  1. Unit tests for robust JSON parsing (tests/litellm/test_litellm_patch.py)
  2. System tests for loop detection with simulated conversations (tests/litellm/system_test_litellm.py)
  3. Manual testing with various edge cases

Documentation

  1. Added detailed code documentation
  2. Created comprehensive developer guide entry (docs/developer_guide.md)
  3. Added specific documentation for the fix (docs/litellm_loop_fix.md)
  4. Updated the changelog

Impact and Considerations
This change:

  1. Does not modify any external APIs or interfaces
  2. Has minimal performance impact (<5ms per request)
  3. May interrupt legitimate use cases where a function is called more than 5 times consecutively (configurable via _loop_threshold)

Copy link
Contributor

@gsarthakdev gsarthakdev left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Docstring updates

Comment on lines +381 to +392
This enhanced version:
1. Adds validation for required fields with proper defaults
2. Improves JSON parsing to handle both standard JSON and Python-style strings
3. Implements comprehensive error handling to prevent crashes
4. Maintains compatibility with all model formats

Implementation note:
This function is part of the fix for an infinite loop issue that occurs when using
Ollama/Gemma3 models with LiteLLM. These models sometimes return malformed JSON in
function call arguments, which can cause the system to get stuck in a loop.
The robust parsing ensures that even with malformed JSON, we can still extract
valid arguments and prevent failures.
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

While looking at the semantics/style of the rest of the codebase and the fact that you have already provided a separate file documenting these changes, I believe this portion of the function docstring is not necessary.

Comment on lines +717 to +737
This enhanced version:
1. Tracks consecutive calls to the same function
2. Breaks potential infinite loops after a threshold
3. Provides a helpful response when a loop is detected
4. Maintains compatibility with the original method

Implementation details:
The loop detection mechanism addresses an issue that can occur with certain models
(particularly Ollama/Gemma3), where the model gets stuck repeatedly calling the same
function without making progress. This commonly happens when:

- The model receives malformed JSON responses it cannot parse
- The model gets into a repetitive pattern of behavior
- The model misunderstands function results and keeps trying the same approach

When the same function is called consecutively more than the threshold number of times
(default: 5), the loop detection mechanism interrupts the loop and provides a helpful
response to the user instead of continuing to call the model.

This prevents wasted resources and improves user experience by avoiding situations
where the system would otherwise become unresponsive.
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

While looking at the semantics/style of the rest of the codebase and the fact that you have already provided a separate file documenting these changes, I believe this portion of the function docstring is not necessary.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

2 participants