-
Notifications
You must be signed in to change notification settings - Fork 414
Fix cluster hint overriding partition hint on bigquery #3497
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
base: devel
Are you sure you want to change the base?
Changes from all commits
File filter
Filter by extension
Conversations
Jump to
Diff view
Diff view
There are no files selected for viewing
| Original file line number | Diff line number | Diff line change |
|---|---|---|
|
|
@@ -292,6 +292,25 @@ def partitioned_table(): | |
| assert expected_clause in sql_partitioned | ||
|
|
||
|
|
||
| def test_adapter_hints_comprehensive_single_column() -> None: | ||
|
Collaborator
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. A unit test is good. But this particular scenario likely already worked before the fix because |
||
| @dlt.resource | ||
| def partitioned_table(): | ||
| yield { | ||
| "user_id": 10000, | ||
| "name": "user 1", | ||
| "created_at": "2021-01-01T00:00:00Z", | ||
| "category": "category 1", | ||
| "score": 100.0, | ||
| } | ||
|
|
||
| # hints should merge even across multiple adapter calls | ||
| bigquery_adapter(partitioned_table, partition="user_id") | ||
| bigquery_adapter(partitioned_table, cluster="user_id") | ||
| assert partitioned_table.columns == { | ||
| "user_id": {"name": "user_id", PARTITION_HINT: True, CLUSTER_HINT: True}, | ||
| } | ||
|
|
||
|
|
||
| @pytest.mark.parametrize( | ||
| "destination_config", | ||
| destinations_configs(default_sql_configs=True, subset=["bigquery"]), | ||
|
|
@@ -1001,6 +1020,36 @@ def sources() -> List[DltResource]: | |
| assert ["col1"] == hints_cluster_fields, "`hints` table IS NOT clustered by `col1`." | ||
|
|
||
|
|
||
| @pytest.mark.parametrize( | ||
| "destination_config", | ||
| destinations_configs(default_sql_configs=True, subset=["bigquery"]), | ||
| ids=lambda x: x.name, | ||
| ) | ||
| def test_adapter_hints_clustering_and_partitioning( | ||
|
Collaborator
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. Good! |
||
| destination_config: DestinationTestConfiguration, | ||
| ) -> None: | ||
| @dlt.resource(columns=[{"name": "col1", "data_type": "bigint"}]) | ||
| def data() -> Iterator[Dict[str, str]]: | ||
| yield from [{"col1": str(i)} for i in range(10)] | ||
|
|
||
| bigquery_adapter(data, partition="col1", cluster="col1") | ||
|
|
||
| pipeline = destination_config.setup_pipeline(f"bigquery_{uniq_id()}", dev_mode=True) | ||
| pipeline.run(data) | ||
|
|
||
| with pipeline.sql_client() as c: | ||
| nc: google.cloud.bigquery.client.Client = c.native_connection | ||
| fqtn = c.make_qualified_table_name("data", quote=False) | ||
| table = nc.get_table(fqtn) | ||
|
|
||
| assert table.clustering_fields == [ | ||
| "col1" | ||
| ], f"Expected clustering fields ['col1'], got {table.clustering_fields}" | ||
| assert ( | ||
| table.range_partitioning is not None and table.range_partitioning.field == "col1" | ||
| ), f"Expected partition field 'col1', got {table.range_partitioning}" | ||
|
|
||
|
|
||
| def test_adapter_hints_empty() -> None: | ||
| @dlt.resource(columns=[{"name": "int_col", "data_type": "bigint"}]) | ||
| def some_data() -> Iterator[Dict[str, str]]: | ||
|
|
||
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This fix is correct.
Here's also a chance to improve code quality: this pattern of setting column hints repeats 4 times. Let's extract this pattern into a helper function (e.g.
_set_column_hint()) for maintainability. Most likely it could be used in other adapters, but this is a for a follow-up PR. For now I'd keep_set_column_hintin this module.