Skip to content

Connectorx arrow_stream timestamp conversion issue #3491

@louiewhw

Description

@louiewhw

dlt version

1.20.0

Describe the problem

When using ConnectorX with return_type="arrow_stream", timestamp columns are returned as date64[ms] instead of timestamp[us].

Timestamp columns incorrectly converted to Date64 with arrow_stream return_type #866

The cast_date64_columns_to_timestamp function in pyarrow.py uses .view() to reinterpret the bits, but this doesn't rescale the units, causing milliseconds to be interpreted as microseconds (1000x diff).

#3218

This results in dates like 2025-12-14 being stored as 1970-01-21.

Expected behavior

No response

Steps to reproduce

source = sql_database(
    connection_string,
    table_names=["my_table"],
    backend="connectorx",
    backend_kwargs={"return_type": "arrow_stream"},  # ← triggers bug
)
source.my_table.apply_hints(incremental=dlt.sources.incremental("updated_at"))
pipeline.run(source)
# Watermark: 1970-01-21 <- (should be 2025-12-14)

Root Cause:
ConnectorX arrow_stream returns date64[ms], but dlt cast_date64_columns_to_timestamp uses .view(timestamp[us]) which reinterprets bits without unit conversion:

new_col = col.view(pyarrow.timestamp("us"))  # treats ms as μs → 1000x

Example:

  • Raw value: 1765747379262 (milliseconds)
  • .view(timestamp[us]) → interprets as 1765747379262 μs1970-01-21
  • Correct: 1765747379262 ms = 1765747379262000 μs2025-12-14

Fix:

Probably Use .view() to reinterpret type, then .cast() to convert units?

# date64[ms] → view as timestamp[ms] → cast to timestamp[us]
chunk_ts_us = pyarrow.compute.cast(chunk.view(pyarrow.timestamp("ms")), pyarrow.timestamp("us"))

Operating system

macOS

Runtime environment

Local

Python version

3.12

dlt data source

SQL

dlt destination

Filesystem & buckets

Other deployment details

No response

Additional information

No response

Metadata

Metadata

Assignees

No one assigned

    Labels

    bugSomething isn't workingquestionFurther information is requested

    Type

    No type

    Projects

    Status

    Todo

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions