timestamp docs WIP

rudolfix · rudolfix · commit 74f220b50b50 · 2025-08-17T23:04:31.000+02:00
diff --git a/docs/website/docs/dlt-ecosystem/verified-sources/sql_database/configuration.md b/docs/website/docs/dlt-ecosystem/verified-sources/sql_database/configuration.md
@@ -167,6 +167,7 @@ result in data loss ie. if naive datetime has different local timezone on the ma
 
 * If your datetime columns is naive, use naive Python datetime. Note that `pendulum` datetime is tz-aware by default and standard `datetime` is naive.
 * Use `full` reflection level or above to reflect `timezone` (awareness hint) on the datetime columns.
+* read about the [timestamp handling](../../../general-usage/schema.md#handling-of-timestamp-and-time-zones) in `dlt`
 
 
 ### Examples
diff --git a/docs/website/docs/general-usage/schema.md b/docs/website/docs/general-usage/schema.md
@@ -173,7 +173,7 @@ On the other hand, if the `id` field was already a string, then introducing new
 
 Now go ahead and try to add a new record where `id` is a float number; you should see a new field `id__v_double` in the schema.
 
-### Data types
+## Data types
 
 | dlt Data Type | Source Value Example                                | Precision and Scale                                     |
 | ------------- | --------------------------------------------------- | ------------------------------------------------------- |
@@ -193,16 +193,54 @@ Now go ahead and try to add a new record where `id` is a float number; you shoul
 
 `json` data type tells `dlt` to load that element as JSON or string and not attempt to flatten or create a nested table out of it. Note that structured types like arrays or maps are not supported by `dlt` at this point.
 
-`time` data type is saved in the destination without timezone info; if timezone is included, it is stripped. E.g., `'14:01:02+02:00` -> `'14:01:02'`.
+`time` data type is saved in the destination **without timezone info**; if timezone is included, time is converted to UTC and then to naive.
 
-:::tip
-The precision and scale are interpreted by the particular destination and are validated when a column is created. Destinations that do not support precision for a given data type will ignore it.
 
-The precision for **timestamp** is useful when creating **parquet** files. Use 3 for milliseconds, 6 for microseconds, and 9 for nanoseconds.
+### Handling of timestamp and time zones
+By default `dlt` normalizes timestamps (tz-aware an naive) into time zone aware type in UTC timezone. Since `1.16.0` it fully honors `timezone` boolean hint if set
+explicitly on a column or by a source/resource. Normalizers do not infer this hint from data. The same rules apply for tabular data (arrow/pandas) and Python objects:
 
-The precision for **bigint** is mapped to available integer types, i.e., TINYINT, INT, BIGINT. The default is 64 bits (8 bytes) precision (BIGINT).
+| input timestamp | `timezone` hint | normalized timestamp  |
+| --------------- | --------------- | --------------------- |
+| naive           | `None`, `True`  | tz-aware in UTC       |
+| naive           | `False`         | naive (pass-through)  |
+| tz-aware        | `None`, `True`  | tz-aware in UTC       |
+| tz-aware        | `False`         | to UTC and then naive |
+|                 |                 |                       |
+
+:::caution
+naive timestamps will **always be considered as UTC**, system timezone settings are ignored by `dlt`
 :::
 
+Ultimately destination will interpret the timestamp values. Some destinations:
+- do not support naive timestamps (ie. BigQuery) and will interpret them as naive UTC by attaching UTC timezone
+- do not support tz-aware timestamps (ie. Dremio, Athena) and will strip timezones from timestamps being loaded
+- do not store timezone at all and all timestamps are converted to UTC
+- store timezone as column level property and internally convert timestamps to UTC. (ie. postgres)
+- store timezone and offset (ie. MSSQL). however we could not find any destination that can read back the original timezones
+
+`dlt` sets sessions to UTC timezone to minimize chances of erroneous conversion.
+
+### Handling precision
+The precision and scale are interpreted by the particular destination and are validated when a column is created. Destinations that do not support precision for a given data type will ignore it.
+
+The precision for **bigint** is mapped to available integer types, i.e., TINYINT, INT, BIGINT. The default is 64 bits (8 bytes) precision (BIGINT).
+
+The precision for **timestamp** is useful when creating **parquet** files. Use 3 for milliseconds, 6 for microseconds, and 9 for nanoseconds. Note that `dlt`
+normalizes 
+
+### Handling nulls
+In general, destinations are responsible for NULL enforcement. `dlt` does not verify nullability of data in arrow tables and Python objects. Note that:
+* there's an exception to that rule if Python object (`dict`) contains explicit `None` for non-nullable key. This check will be eliminated. Note that if value
+for a key is not present at all, nullability check is not done
+* nullability is checked by Arrow when saving parquet files. This is a new behavior and `dlt` normalizes it for older arrow versions.
+
+
+### Structured types
+`dlt` has experimental support for structured types that currently piggyback on `json` data type and may be set only by yielding arrow tables. It does not
+evolve nested types and will not migrate destination schemas to match. Nested types are enabled for `filesystem`, `iceberg`, `delta` and `lancedb` destinations.
+
+
 ## Table references
 `dlt` tables refer to other tables. It supports two types of such references:
 1. **Nested reference** created automatically when nested data (i.e., a `json` document containing a nested list) is converted into relational form. These references use specialized column and table hints and are used, for example, when [merging data](merge-loading.md).