Skip to content

Conversation

@sh-rp
Copy link
Collaborator

@sh-rp sh-rp commented Jul 29, 2024

Description

A couple of times our users used a destination adapter on a source which creates unexpected results. The small change in this PR changes this.

@sh-rp sh-rp marked this pull request as ready for review July 29, 2024 14:42
@netlify
Copy link

netlify bot commented Jul 29, 2024

Deploy Preview for dlt-hub-docs canceled.

Name Link
🔨 Latest commit bccbbba
🔍 Latest deploy log https://app.netlify.com/sites/dlt-hub-docs/deploys/66a8c0bdb7376a0007b1dda9

# prevent accidentally wrapping sources with adapters
if isinstance(data, DltSource):
raise Exception(
"You are trying to use an adapter on a dlt source. You can only use adapters on pure"
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Is this function only used with adapters? The error message seems to be out of context for ensure_resource function. Also, TypeError maybe?

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yes, it only is used in adapters, I checked that.

"You are trying to use an adapter on a dlt source. You can only use adapters on pure"
" data or dlt resources."
)
resource_name = None if hasattr(data, "__name__") else "content"
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think this else close should throw a warning. Otherwise, you get some unrecognizable table in the destination with the name Content out of nowhere 😱

Copy link
Collaborator

@rudolfix rudolfix left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM! but let's adapt sources with single resource as well

if isinstance(data, DltResource):
return data
# prevent accidentally wrapping sources with adapters
if isinstance(data, DltSource):
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

could you check if source selects just one resource and then adapt it? look here: #1644 I think the user's thinking is OK

@rudolfix
Copy link
Collaborator

@VioletM we should document how to adapt sources. the example below IMO is not the best pattern:

from sql_database import sql_database
from dlt.destinations.adapters import weaviate_adapter
import dlt

products_table = sql_database().with_resources("products")

pipeline = dlt.pipeline(
        pipeline_name="postgres_to_weaviate_pipeline",
        destination="weaviate",
    )

products_table.table_name = "Products"

info = pipeline.run(
    weaviate_adapter(products_table.products, vectorize="description"),
)

print(info)

beacuse it takes the resource out of source and adapts it and loads it. what user should do:

products_tables = sql_database().with_resources("products", "customers")

pipeline = dlt.pipeline(
        pipeline_name="postgres_to_weaviate_pipeline",
        destination="weaviate",
    )

# adapt the resource within the source
weaviate_adapter(products_tables.products, vectorize="description")
weaviate_adapter(products_tables.customers, vectorize="bio")

info = pipeline.run(products_tables)

add support for source with single resource
add tests
@sh-rp sh-rp requested a review from rudolfix July 30, 2024 09:18
@sh-rp
Copy link
Collaborator Author

sh-rp commented Jul 30, 2024

I have added support for sources with one resource as well as some tests. I'm not sure we should support setting of this default "content" resource name. I have a feeling we should raise similar to when loading data without table name..

resource_name = None if hasattr(data, "__name__") else "content"
# prevent accidentally wrapping sources with adapters
if isinstance(data, DltSource):
if len(data.resources.keys()) == 1:
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

please use "selected_resources". only those are evaluated

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

good point!


resource_name = None
if not hasattr(data, "__name__"):
logger.warning("Setting default resource name to `content` for adapted resource.")
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

this is not warning. IMO passing "dumb" data like a list of dicts is pretty normal. so maybe info level?

VioletM
VioletM previously approved these changes Jul 30, 2024
@sh-rp sh-rp merged commit 7676e4c into devel Jul 30, 2024
@sh-rp sh-rp deleted the feat/1644-prevent-adapter-on-source branch July 30, 2024 15:50
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

4 participants