-
Notifications
You must be signed in to change notification settings - Fork 414
Description
dlt version
1.20.0
Describe the problem
When using ConnectorX with return_type="arrow_stream", timestamp columns are returned as date64[ms] instead of timestamp[us].
Timestamp columns incorrectly converted to Date64 with arrow_stream return_type #866
The cast_date64_columns_to_timestamp function in pyarrow.py uses .view() to reinterpret the bits, but this doesn't rescale the units, causing milliseconds to be interpreted as microseconds (1000x diff).
This results in dates like 2025-12-14 being stored as 1970-01-21.
Expected behavior
No response
Steps to reproduce
source = sql_database(
connection_string,
table_names=["my_table"],
backend="connectorx",
backend_kwargs={"return_type": "arrow_stream"}, # ← triggers bug
)
source.my_table.apply_hints(incremental=dlt.sources.incremental("updated_at"))
pipeline.run(source)
# Watermark: 1970-01-21 <- (should be 2025-12-14)Root Cause:
ConnectorX arrow_stream returns date64[ms], but dlt cast_date64_columns_to_timestamp uses .view(timestamp[us]) which reinterprets bits without unit conversion:
new_col = col.view(pyarrow.timestamp("us")) # treats ms as μs → 1000xExample:
- Raw value:
1765747379262(milliseconds) .view(timestamp[us])→ interprets as1765747379262 μs→ 1970-01-21- Correct:
1765747379262 ms=1765747379262000 μs→ 2025-12-14
Fix:
Probably Use .view() to reinterpret type, then .cast() to convert units?
# date64[ms] → view as timestamp[ms] → cast to timestamp[us]
chunk_ts_us = pyarrow.compute.cast(chunk.view(pyarrow.timestamp("ms")), pyarrow.timestamp("us"))Operating system
macOS
Runtime environment
Local
Python version
3.12
dlt data source
SQL
dlt destination
Filesystem & buckets
Other deployment details
No response
Additional information
No response
Metadata
Metadata
Assignees
Labels
Type
Projects
Status