You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
@@ -193,16 +193,54 @@ Now go ahead and try to add a new record where `id` is a float number; you shoul
193
193
194
194
`json` data type tells `dlt` to load that element as JSON or string and not attempt to flatten or create a nested table out of it. Note that structured types like arrays or maps are not supported by `dlt` at this point.
195
195
196
-
`time` data type is saved in the destination without timezone info; if timezone is included, it is stripped. E.g., `'14:01:02+02:00` -> `'14:01:02'`.
196
+
`time` data type is saved in the destination **without timezone info**; if timezone is included, time is converted to UTC and then to naive.
197
197
198
-
:::tip
199
-
The precision and scale are interpreted by the particular destination and are validated when a column is created. Destinations that do not support precision for a given data type will ignore it.
200
198
201
-
The precision for **timestamp** is useful when creating **parquet** files. Use 3 for milliseconds, 6 for microseconds, and 9 for nanoseconds.
199
+
### Handling of timestamp and time zones
200
+
By default `dlt` normalizes timestamps (tz-aware an naive) into time zone aware type in UTC timezone. Since `1.16.0` it fully honors `timezone` boolean hint if set
201
+
explicitly on a column or by a source/resource. Normalizers do not infer this hint from data. The same rules apply for tabular data (arrow/pandas) and Python objects:
202
202
203
-
The precision for **bigint** is mapped to available integer types, i.e., TINYINT, INT, BIGINT. The default is 64 bits (8 bytes) precision (BIGINT).
203
+
| input timestamp |`timezone` hint | normalized timestamp |
naive timestamps will **always be considered as UTC**, system timezone settings are ignored by `dlt`
204
213
:::
205
214
215
+
Ultimately destination will interpret the timestamp values. Some destinations:
216
+
- do not support naive timestamps (ie. BigQuery) and will interpret them as naive UTC by attaching UTC timezone
217
+
- do not support tz-aware timestamps (ie. Dremio, Athena) and will strip timezones from timestamps being loaded
218
+
- do not store timezone at all and all timestamps are converted to UTC
219
+
- store timezone as column level property and internally convert timestamps to UTC. (ie. postgres)
220
+
- store timezone and offset (ie. MSSQL). however we could not find any destination that can read back the original timezones
221
+
222
+
`dlt` sets sessions to UTC timezone to minimize chances of erroneous conversion.
223
+
224
+
### Handling precision
225
+
The precision and scale are interpreted by the particular destination and are validated when a column is created. Destinations that do not support precision for a given data type will ignore it.
226
+
227
+
The precision for **bigint** is mapped to available integer types, i.e., TINYINT, INT, BIGINT. The default is 64 bits (8 bytes) precision (BIGINT).
228
+
229
+
The precision for **timestamp** is useful when creating **parquet** files. Use 3 for milliseconds, 6 for microseconds, and 9 for nanoseconds. Note that `dlt`
230
+
normalizes
231
+
232
+
### Handling nulls
233
+
In general, destinations are responsible for NULL enforcement. `dlt` does not verify nullability of data in arrow tables and Python objects. Note that:
234
+
* there's an exception to that rule if Python object (`dict`) contains explicit `None` for non-nullable key. This check will be eliminated. Note that if value
235
+
for a key is not present at all, nullability check is not done
236
+
* nullability is checked by Arrow when saving parquet files. This is a new behavior and `dlt` normalizes it for older arrow versions.
237
+
238
+
239
+
### Structured types
240
+
`dlt` has experimental support for structured types that currently piggyback on `json` data type and may be set only by yielding arrow tables. It does not
241
+
evolve nested types and will not migrate destination schemas to match. Nested types are enabled for `filesystem`, `iceberg`, `delta` and `lancedb` destinations.
242
+
243
+
206
244
## Table references
207
245
`dlt` tables refer to other tables. It supports two types of such references:
208
246
1.**Nested reference** created automatically when nested data (i.e., a `json` document containing a nested list) is converted into relational form. These references use specialized column and table hints and are used, for example, when [merging data](merge-loading.md).
0 commit comments