Skip to content

Commit 74f220b

Browse files
committed
timestamp docs WIP
1 parent 6a12ba4 commit 74f220b

File tree

2 files changed

+45
-6
lines changed

2 files changed

+45
-6
lines changed

docs/website/docs/dlt-ecosystem/verified-sources/sql_database/configuration.md

Lines changed: 1 addition & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -167,6 +167,7 @@ result in data loss ie. if naive datetime has different local timezone on the ma
167167

168168
* If your datetime columns is naive, use naive Python datetime. Note that `pendulum` datetime is tz-aware by default and standard `datetime` is naive.
169169
* Use `full` reflection level or above to reflect `timezone` (awareness hint) on the datetime columns.
170+
* read about the [timestamp handling](../../../general-usage/schema.md#handling-of-timestamp-and-time-zones) in `dlt`
170171

171172

172173
### Examples

docs/website/docs/general-usage/schema.md

Lines changed: 44 additions & 6 deletions
Original file line numberDiff line numberDiff line change
@@ -173,7 +173,7 @@ On the other hand, if the `id` field was already a string, then introducing new
173173

174174
Now go ahead and try to add a new record where `id` is a float number; you should see a new field `id__v_double` in the schema.
175175

176-
### Data types
176+
## Data types
177177

178178
| dlt Data Type | Source Value Example | Precision and Scale |
179179
| ------------- | --------------------------------------------------- | ------------------------------------------------------- |
@@ -193,16 +193,54 @@ Now go ahead and try to add a new record where `id` is a float number; you shoul
193193

194194
`json` data type tells `dlt` to load that element as JSON or string and not attempt to flatten or create a nested table out of it. Note that structured types like arrays or maps are not supported by `dlt` at this point.
195195

196-
`time` data type is saved in the destination without timezone info; if timezone is included, it is stripped. E.g., `'14:01:02+02:00` -> `'14:01:02'`.
196+
`time` data type is saved in the destination **without timezone info**; if timezone is included, time is converted to UTC and then to naive.
197197

198-
:::tip
199-
The precision and scale are interpreted by the particular destination and are validated when a column is created. Destinations that do not support precision for a given data type will ignore it.
200198

201-
The precision for **timestamp** is useful when creating **parquet** files. Use 3 for milliseconds, 6 for microseconds, and 9 for nanoseconds.
199+
### Handling of timestamp and time zones
200+
By default `dlt` normalizes timestamps (tz-aware an naive) into time zone aware type in UTC timezone. Since `1.16.0` it fully honors `timezone` boolean hint if set
201+
explicitly on a column or by a source/resource. Normalizers do not infer this hint from data. The same rules apply for tabular data (arrow/pandas) and Python objects:
202202

203-
The precision for **bigint** is mapped to available integer types, i.e., TINYINT, INT, BIGINT. The default is 64 bits (8 bytes) precision (BIGINT).
203+
| input timestamp | `timezone` hint | normalized timestamp |
204+
| --------------- | --------------- | --------------------- |
205+
| naive | `None`, `True` | tz-aware in UTC |
206+
| naive | `False` | naive (pass-through) |
207+
| tz-aware | `None`, `True` | tz-aware in UTC |
208+
| tz-aware | `False` | to UTC and then naive |
209+
| | | |
210+
211+
:::caution
212+
naive timestamps will **always be considered as UTC**, system timezone settings are ignored by `dlt`
204213
:::
205214

215+
Ultimately destination will interpret the timestamp values. Some destinations:
216+
- do not support naive timestamps (ie. BigQuery) and will interpret them as naive UTC by attaching UTC timezone
217+
- do not support tz-aware timestamps (ie. Dremio, Athena) and will strip timezones from timestamps being loaded
218+
- do not store timezone at all and all timestamps are converted to UTC
219+
- store timezone as column level property and internally convert timestamps to UTC. (ie. postgres)
220+
- store timezone and offset (ie. MSSQL). however we could not find any destination that can read back the original timezones
221+
222+
`dlt` sets sessions to UTC timezone to minimize chances of erroneous conversion.
223+
224+
### Handling precision
225+
The precision and scale are interpreted by the particular destination and are validated when a column is created. Destinations that do not support precision for a given data type will ignore it.
226+
227+
The precision for **bigint** is mapped to available integer types, i.e., TINYINT, INT, BIGINT. The default is 64 bits (8 bytes) precision (BIGINT).
228+
229+
The precision for **timestamp** is useful when creating **parquet** files. Use 3 for milliseconds, 6 for microseconds, and 9 for nanoseconds. Note that `dlt`
230+
normalizes
231+
232+
### Handling nulls
233+
In general, destinations are responsible for NULL enforcement. `dlt` does not verify nullability of data in arrow tables and Python objects. Note that:
234+
* there's an exception to that rule if Python object (`dict`) contains explicit `None` for non-nullable key. This check will be eliminated. Note that if value
235+
for a key is not present at all, nullability check is not done
236+
* nullability is checked by Arrow when saving parquet files. This is a new behavior and `dlt` normalizes it for older arrow versions.
237+
238+
239+
### Structured types
240+
`dlt` has experimental support for structured types that currently piggyback on `json` data type and may be set only by yielding arrow tables. It does not
241+
evolve nested types and will not migrate destination schemas to match. Nested types are enabled for `filesystem`, `iceberg`, `delta` and `lancedb` destinations.
242+
243+
206244
## Table references
207245
`dlt` tables refer to other tables. It supports two types of such references:
208246
1. **Nested reference** created automatically when nested data (i.e., a `json` document containing a nested list) is converted into relational form. These references use specialized column and table hints and are used, for example, when [merging data](merge-loading.md).

0 commit comments

Comments
 (0)