Skip to content

Conversation

@tetelio
Copy link
Contributor

@tetelio tetelio commented Dec 2, 2025

Description

This PR addresses an unexpected behavior when using a resource with a pydantic validator:

@dlt.resource(columns=Model)
def rows():
   yield Model

Right now, dlt:

  1. validates the data
  2. always converts the validated object to a dict
  3. transformers receive dicts instead of model instances
  4. user has to reconstruct Pydantic objects manually in every transformer

This PR changes only the serialization behavior of the Pydantic validator:

  • If tuser yields a Pydantic model, dlt now returns a validated Pydantic model instead of a dict
  • If the user yields dicts, dlt continues to return dicts (unchanged)
  • If the validated model is not structurally compatible with the original model
    (i.e., some original fields were dropped by schema contract),
    then dlt returns a dict, to avoid returning partially invalid or broken model instances, so:
fields(original_model) ⊆ fields(validated_model)

Related Issues

…elded model has all fields contained in the validated model
not yielded by resource and without serialization if pydantic model yielded by
resource.
@cloudflare-workers-and-pages
Copy link

cloudflare-workers-and-pages bot commented Dec 2, 2025

Deploying with  Cloudflare Workers  Cloudflare Workers

The latest updates on your project. Learn more about integrating Git with Workers.

Status Name Latest Commit Updated (UTC)
❌ Deployment failed
View logs
docs fd2875a Dec 12 2025, 02:55 PM

Copy link
Collaborator

@rudolfix rudolfix left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

trying to keep the same type of input and output automatically is pretty hard. we have many weird edge cases here. even if model instance is validated with the same model, full revalidation is possible and some items may got filtered out from the list. and no matter what we do, we will not be backward compatible (always dict after validation).

my take is to simplify it and extend DltConfig to include a flag like return_validated_models if set to True, we just return the validator output, without going to dict.

this config is applied to a model, see other options. this also needs to be documented

self.table_name, self.list_model, item, self.column_mode, self.data_mode
)
if input_is_model:
input_fields = set(item[0].__class__.model_fields.keys())
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

this is a good catch. the model that is used to validate may be different from model instances passed to pipeline. even if the model is the same there are some weird cases or revalidation was requested. so we cannot just skip validation here.

still - doing this check and going back to dict is IMO not intuitive and expensive. we also probe just a first element and assume that the list is of uniform item types.

overall I think we need something simpler here

@tetelio tetelio self-assigned this Dec 12, 2025
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

Interaction between values returned by a resource and configuring a pydantic schema

3 participants