-
Notifications
You must be signed in to change notification settings - Fork 418
Enable merge write disposition for athena Iceberg
#1315
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Changes from all commits
File filter
Filter by extension
Conversations
Jump to
Diff view
Diff view
There are no files selected for viewing
| Original file line number | Diff line number | Diff line change |
|---|---|---|
|
|
@@ -87,7 +87,7 @@ athena_work_group="my_workgroup" | |
| The `athena` destination handles the write dispositions as follows: | ||
| - `append` - files belonging to such tables are added to the dataset folder. | ||
| - `replace` - all files that belong to such tables are deleted from the dataset folder, and then the current set of files is added. | ||
| - `merge` - falls back to `append`. | ||
| - `merge` - falls back to `append` (unless you're using [iceberg](#iceberg-data-tables) tables). | ||
|
|
||
| ## Data loading | ||
|
|
||
|
|
@@ -137,6 +137,13 @@ force_iceberg = "True" | |
|
|
||
| For every table created as an iceberg table, the Athena destination will create a regular Athena table in the staging dataset of both the filesystem and the Athena glue catalog, and then copy all data into the final iceberg table that lives with the non-iceberg tables in the same dataset on both the filesystem and the glue catalog. Switching from iceberg to regular table or vice versa is not supported. | ||
|
|
||
| #### `merge` support | ||
| The `merge` write disposition is supported for Athena when using iceberg tables. | ||
|
|
||
| > Note that: | ||
| > 1. there is a risk of tables ending up in inconsistent state in case a pipeline run fails mid flight, because Athena doesn't support transactions, and `dlt` uses multiple DELETE/UPDATE/INSERT statements to implement `merge`, | ||
|
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. Why not use the
Collaborator
Author
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. Full discussion is here: #1294. TLDR: we're planning to add a new merge strategy based on the MERGE statement, but this isn't properly specced yet. We want consistent behavior accross different destinations, hence the implementation of |
||
| > 2. `dlt` creates additional helper tables called `insert_<table name>` and `delete_<table name>` in the staging schema to work around Athena's lack of temporary tables. | ||
|
|
||
| ### dbt support | ||
|
|
||
| Athena is supported via `dbt-athena-community`. Credentials are passed into `aws_access_key_id` and `aws_secret_access_key` of the generated dbt profile. Iceberg tables are supported, but you need to make sure that you materialize your models as iceberg tables if your source table is iceberg. We encountered problems with materializing date time columns due to different precision on iceberg (nanosecond) and regular Athena tables (millisecond). | ||
|
|
||
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
yes, and there's PR (#998 ) that allows to escape for DDL and DML separately