Fix/3159 pydantic model incorrect serialization #3421

tetelio · 2025-12-02T20:59:41Z

Description

This PR addresses an unexpected behavior when using a resource with a pydantic validator:

@dlt.resource(columns=Model)
def rows():
   yield Model

Right now, dlt:

validates the data
always converts the validated object to a dict
transformers receive dicts instead of model instances
user has to reconstruct Pydantic objects manually in every transformer

This PR changes only the serialization behavior of the Pydantic validator:

If tuser yields a Pydantic model, dlt now returns a validated Pydantic model instead of a dict
If the user yields dicts, dlt continues to return dicts (unchanged)
If the validated model is not structurally compatible with the original model
(i.e., some original fields were dropped by schema contract),
then dlt returns a dict, to avoid returning partially invalid or broken model instances, so:

fields(original_model) ⊆ fields(validated_model)

Related Issues

Fixes Interaction between values returned by a resource and configuring a pydantic schema #3159

…elded model has all fields contained in the validated model

not yielded by resource and without serialization if pydantic model yielded by resource.

cloudflare-workers-and-pages · 2025-12-02T21:00:45Z

Deploying with Cloudflare Workers

The latest updates on your project. Learn more about integrating Git with Workers.

Status	Name	Latest Commit	Updated (UTC)
❌ Deployment failed View logs	docs	`fd2875a`	Dec 12 2025, 02:55 PM

rudolfix

trying to keep the same type of input and output automatically is pretty hard. we have many weird edge cases here. even if model instance is validated with the same model, full revalidation is possible and some items may got filtered out from the list. and no matter what we do, we will not be backward compatible (always dict after validation).

my take is to simplify it and extend DltConfig to include a flag like return_validated_models if set to True, we just return the validator output, without going to dict.

this config is applied to a model, see other options. this also needs to be documented

rudolfix · 2025-12-08T23:53:17Z

dlt/extract/validation.py

+                self.table_name, self.list_model, item, self.column_mode, self.data_mode
+            )
+            if input_is_model:
+                input_fields = set(item[0].__class__.model_fields.keys())


this is a good catch. the model that is used to validate may be different from model instances passed to pipeline. even if the model is the same there are some weird cases or revalidation was requested. so we cannot just skip validation here.

still - doing this check and going back to dict is IMO not intuitive and expensive. we also probe just a first element and assume that the list is of uniform item types.

overall I think we need something simpler here

…as objects vs dicts with dltconfig dict

tetelio added 4 commits December 2, 2025 20:58

Stop serializing to dict if pydantic model is being yielded and if yi…

f540e7f

…elded model has all fields contained in the validated model

Add test for pydantic validation with serialization if pydantic model

f5bbbbc

not yielded by resource and without serialization if pydantic model yielded by resource.

Add docstring back

bf346b0

Add back __str__ method removed by accident

9d6571e

Lint correctly

35dd144

rudolfix requested changes Dec 9, 2025

View reviewed changes

tetelio self-assigned this Dec 12, 2025

tetelio added 3 commits December 12, 2025 11:33

Add simple choice for pydantic objects to be passed after extraction …

49681d4

…as objects vs dicts with dltconfig dict

Add unit test for extract validator and pipeline integration test

c7b07e3

Fix lint

fd2875a

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Fix/3159 pydantic model incorrect serialization #3421

Fix/3159 pydantic model incorrect serialization #3421

Uh oh!

tetelio commented Dec 2, 2025

Uh oh!

cloudflare-workers-and-pages bot commented Dec 2, 2025 •

edited

Loading

Uh oh!

rudolfix left a comment

Uh oh!

rudolfix Dec 8, 2025

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

3 participants

Fix/3159 pydantic model incorrect serialization #3421

Are you sure you want to change the base?

Fix/3159 pydantic model incorrect serialization #3421

Uh oh!

Conversation

tetelio commented Dec 2, 2025

Description

Related Issues

Uh oh!

cloudflare-workers-and-pages bot commented Dec 2, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Deploying with Cloudflare Workers

Uh oh!

rudolfix left a comment

Choose a reason for hiding this comment

Uh oh!

rudolfix Dec 8, 2025

Choose a reason for hiding this comment

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

3 participants

cloudflare-workers-and-pages bot commented Dec 2, 2025 •

edited

Loading