-
Notifications
You must be signed in to change notification settings - Fork 414
Fix/3159 pydantic model incorrect serialization #3421
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
base: devel
Are you sure you want to change the base?
Conversation
…elded model has all fields contained in the validated model
not yielded by resource and without serialization if pydantic model yielded by resource.
Deploying with
|
| Status | Name | Latest Commit | Updated (UTC) |
|---|---|---|---|
| ❌ Deployment failed View logs |
docs | fd2875a | Dec 12 2025, 02:55 PM |
rudolfix
left a comment
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
trying to keep the same type of input and output automatically is pretty hard. we have many weird edge cases here. even if model instance is validated with the same model, full revalidation is possible and some items may got filtered out from the list. and no matter what we do, we will not be backward compatible (always dict after validation).
my take is to simplify it and extend DltConfig to include a flag like return_validated_models if set to True, we just return the validator output, without going to dict.
this config is applied to a model, see other options. this also needs to be documented
dlt/extract/validation.py
Outdated
| self.table_name, self.list_model, item, self.column_mode, self.data_mode | ||
| ) | ||
| if input_is_model: | ||
| input_fields = set(item[0].__class__.model_fields.keys()) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
this is a good catch. the model that is used to validate may be different from model instances passed to pipeline. even if the model is the same there are some weird cases or revalidation was requested. so we cannot just skip validation here.
still - doing this check and going back to dict is IMO not intuitive and expensive. we also probe just a first element and assume that the list is of uniform item types.
overall I think we need something simpler here
…as objects vs dicts with dltconfig dict
Description
This PR addresses an unexpected behavior when using a resource with a pydantic validator:
Right now,
dlt:This PR changes only the serialization behavior of the Pydantic validator:
(i.e., some original fields were dropped by schema contract),
then dlt returns a dict, to avoid returning partially invalid or broken model instances, so:
Related Issues