You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
Copy file name to clipboardExpand all lines: docs/website/docs/general-usage/pipeline.md
+58-19Lines changed: 58 additions & 19 deletions
Display the source diff
Display the rich diff
Original file line number
Diff line number
Diff line change
@@ -98,33 +98,72 @@ You can reset parts or all of your sources by using the `refresh` argument to `d
98
98
That means when you run the pipeline the sources/resources being processed will have their state reset and their tables either dropped or truncated
99
99
depending on which refresh mode is used.
100
100
101
+
`refresh` option works with all relational/sql destinations and file buckets (`filesystem`). it does not work with vector databases (we are working on that) and
102
+
with custom destinations.
103
+
101
104
The `refresh` argument should have one of the following string values to decide the refresh mode:
102
105
103
-
*`drop_sources`
104
-
All sources being processed in `pipeline.run` or `pipeline.extract` are refreshed.
105
-
That means all tables listed in their schemas are dropped and state belonging to those sources and all their resources is completely wiped.
106
-
The tables are deleted both from pipeline's schema and from the destination database.
106
+
### Drop tables and pipeline state for a source with`drop_sources`
107
+
All sources being processed in `pipeline.run` or `pipeline.extract` are refreshed.
108
+
That means all tables listed in their schemas are dropped and state belonging to those sources and all their resources is completely wiped.
109
+
The tables are deleted both from pipeline's schema and from the destination database.
107
110
108
-
If you only have one source or run with all your sources together, then this is practically like running the pipeline again for the first time
111
+
If you only have one source or run with all your sources together, then this is practically like running the pipeline again for the first time
109
112
110
-
:::caution
111
-
This erases schema history for the selected sources and only the latest version is stored
112
-
::::
113
+
:::caution
114
+
This erases schema history for the selected sources and only the latest version is stored
115
+
:::
113
116
114
-
*`drop_resources`
115
-
Limits the refresh to the resources being processed in `pipeline.run` or `pipeline.extract` (.e.g by using `source.with_resources(...)`).
116
-
Tables belonging to those resources are dropped and their resource state is wiped (that includes incremental state).
117
-
The tables are deleted both from pipeline's schema and from the destination database.
117
+
```py
118
+
import dlt
118
119
119
-
Source level state keys are not deleted in this mode (i.e. `dlt.state()[<'my_key>'] = '<my_value>'`)
In example above we instruct `dlt` to wipe pipeline state belonging to `airtable_emojis` source and drop all the database tables in `duckdb` to
124
+
which data was loaded. The `airtable_emojis` source had two resources named "📆 Schedule" and "💰 Budget" loading to tables "_schedule" and "_budget". Here's
125
+
what `dlt` does step by step:
126
+
1. collects a list of tables to drop by looking for all the tables in the schema that are created in the destination.
127
+
2. removes existing pipeline state associated with `airtable_emojis` source
128
+
3. resets the schema associated with `airtable_emojis` source
129
+
4. executes `extract` and `normalize` steps. those will create fresh pipeline state and a schema
130
+
5. before it executes `load` step, the collected tables are dropped from staging and regular dataset
131
+
6. schema `airtable_emojis` (associated with the source) is be removed from `_dlt_version` table
132
+
7. executes `load` step as usual so tables are re-created and fresh schema and pipeline state are stored.
133
+
134
+
### Selectively drop tables and resource state with `drop_resources`
135
+
Limits the refresh to the resources being processed in `pipeline.run` or `pipeline.extract` (.e.g by using `source.with_resources(...)`).
136
+
Tables belonging to those resources are dropped and their resource state is wiped (that includes incremental state).
137
+
The tables are deleted both from pipeline's schema and from the destination database.
138
+
139
+
Source level state keys are not deleted in this mode (i.e. `dlt.state()[<'my_key>'] = '<my_value>'`)
140
+
141
+
:::caution
142
+
This erases schema history for all affected sources and only the latest schema version is stored.
143
+
:::
120
144
121
-
:::caution
122
-
This erases schema history for all affected schemas and only the latest schema version is stored
123
-
::::
145
+
```py
146
+
import dlt
124
147
125
-
*`drop_data`
126
-
Same as `drop_resources` but instead of dropping tables from schema only the data is deleted from them (i.e. by `TRUNCATE <table_name>` in sql destinations). Resource state for selected resources is also wiped.
Above we request that the state associated with "📆 Schedule" resource is reset and the table generated by it ("_schedule") is dropped. Other resources,
152
+
tables and state are not affected. Please check `drop_sources` for step by step description of what `dlt` does internally.
153
+
154
+
### Selectively truncate tables and reset resource state with `drop_data`
155
+
Same as `drop_resources` but instead of dropping tables from schema only the data is deleted from them (i.e. by `TRUNCATE <table_name>` in sql destinations). Resource state for selected resources is also wiped. In case of [incremental resources](incremental-loading.md#incremental-loading-with-a-cursor-field) this will
156
+
reset the cursor state and fully reload the data from the `initial_value`.
0 commit comments