fix: Image Pydantic schema to properly handle Union[str, Image] deserialization#7195
Open
Sean-Kenneth-Doherty wants to merge 1 commit intomicrosoft:mainfrom
Conversation
…ialization (microsoft#7170) ## Problem When UserMessage.content contains both string and Image in a list (e.g., ), JSON deserialization fails with: "Expected dict or Image instance, got <class 'str'>" ## Root Cause The Image class's used , which accepts any input type. When Pydantic validates Union[str, Image], it would try the Image validator on strings before trying the str type, causing the validation to fail. ## Solution Changed the Image schema to use with explicit types: 1. - for Image instances (pass through) 2. with validator - for JSON dicts with 'data' key This ensures the Image validator only processes dict/Image inputs, allowing strings to be handled by the str type in Union[str, Image]. ## Tests Added comprehensive test file with 5 test cases covering: - String-only content - Image-only content - Mixed content (Image + string) - the exact bug scenario from microsoft#7170 - String before Image - Multiple strings and images interleaved All existing serialization tests continue to pass.
Author
|
@microsoft-github-policy-service agree |
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Add this suggestion to a batch that can be applied as a single commit.This suggestion is invalid because no changes were made to the code.Suggestions cannot be applied while the pull request is closed.Suggestions cannot be applied while viewing a subset of changes.Only one suggestion per line can be applied in a batch.Add this suggestion to a batch that can be applied as a single commit.Applying suggestions on deleted lines is not supported.You must change the existing code in this line in order to create a valid suggestion.Outdated suggestions cannot be applied.This suggestion has been applied or marked resolved.Suggestions cannot be applied from pending reviews.Suggestions cannot be applied on multi-line comments.Suggestions cannot be applied while the pull request is queued to merge.Suggestion cannot be applied right now. Please check back later.
Why are these changes needed?
Fixes #7170 -
UserMessagedeserialization fails whencontentcontains both string andImagedata.Problem
When
UserMessage.contentcontains both string and Image in a list (e.g.,[image, "describe this"]), JSON deserialization fails with:Root Cause
The
Imageclass's__get_pydantic_core_schema__usedcore_schema.any_schema(), which accepts any input type. When Pydantic validatesUnion[str, Image], it would try the Image validator on strings before trying the str type, causing the validation to fail.Solution
Changed the Image schema to use
core_schema.union_schemawith explicit types:core_schema.is_instance_schema(cls)- for Image instances (pass through)core_schema.dict_schema()with validator - for JSON dicts with 'data' keyThis ensures the Image validator only processes dict/Image inputs, allowing strings to be handled by the str type in
Union[str, Image].Related issue number
Closes #7170
Checks
Testing
Added comprehensive test file (
test_image_mixed_content.py) with 5 test cases covering:All existing serialization tests continue to pass: