-
Notifications
You must be signed in to change notification settings - Fork 701
Description
What happened?
This is three related issues -- one main issue and two encountered trying to workaround the first.
Issue 1
Suppose we have some con = ibis.mssql.connect(...) connection to a SQL Server instance, and a pandas DataFrame:
import pandas as pd
import pytz
# Create a datetime object for 12:15 EST
est = pytz.timezone('US/Eastern')
dt = pd.Timestamp('2024-01-01 12:15', tz=est)
# Create DataFrame
df = pd.DataFrame({'datetime': [dt]})We then want to create a table in our SQL Server database with this data
con.create_table('dt_test', obj=df, overwrite=True)This indeed returns
DatabaseTable: dt_test1
datetime timestamp('US/Eastern')as expected. However, upon inspecting the table, we get
print(con.table('dt_test1').execute())
datetime
0 2024-01-01 12:15:00+00:00The timezone was simply dropped and replaced with UTC. This is just a completely different timestamp. This error reproduces if we use ibis.memtable(df) instead of the DataFrame directly. To make sure this isn't a read error, we can check using sqlcmd
1> select * from dt_test1
2> go
datetime
---------------------------------------------
2024-01-01 12:15:00.0000000 +00:00
(1 rows affected)We can see that ibis simply uploaded the wrong value
Issue 2
At this point we asked if the issue wasn't from the pandas-backed table -- after all, there is a lot of writing on your website about how pandas and ibis are philosophically different in approach. So instead, now suppose we took that DataFrame and wrote it to parquet:
df.to_parquet("/tmp/test.parquet")and double-checked that the timestamps were represented in the file:
print(pd.read_parquet("/tmp/test.parquet"))
datetime
0 2024-01-01 12:15:00-05:00So far, so good. However, now we write the contents of that file to a new table in our database:
t = ibis.read_parquet("/tmp/test.parquet")
con.create_table('dt_test2', obj=ibis.memtable(t), overwrite=True)
DatabaseTable: dt_test2
datetime timestamp('UTC', 6)Looks like the timezone is wrong again. And indeed
print(con.table('dt_test2').execute())
datetime
0 2024-01-01 17:15:00+00:00it is at least the right instant in time, but the wrong representation of it. Having experienced similar issues with duckdb before, I believe that this has to do with how duckdb handles timezones. In duckdb, the timezone of the representation of timestamps has to do with a timezone that you set for a given session with duckdb. However, I couldn't find any way to influence this via the ibis api.
Issue 3
And what if we use the mssql backend's read_parquet as written in the docs?
con.read_parquet('/tmp/test.parquet', table_name='dt_test3')
---------------------------------------------------------------------------
NotImplementedError Traceback (most recent call last)
Cell In[27], [line 1](vscode-notebook-cell:?execution_count=27&line=1)
----> [1](vscode-notebook-cell:?execution_count=27&line=1) con.read_parquet('/tmp/test.parquet', table_name='dt_test3')
File /.../.venv/lib/python3.13/site-packages/ibis/backends/__init__.py:402, in _FileIOHandler.read_parquet(self, path, table_name, **kwargs)
382 def read_parquet(
383 self, path: str | Path, /, *, table_name: str | None = None, **kwargs: Any
384 ) -> ir.Table:
385 """Register a parquet file as a table in the current backend.
386
387 Parameters
(...)
400 The just-registered table
401 """
--> [402](https://vscode-remote+dev-002dcontainer-.../.venv/lib/python3.13/site-packages/ibis/backends/__init__.py:402) raise NotImplementedError(
403 f"{self.name} does not support direct registration of parquet data."
404 )
NotImplementedError: mssql does not support direct registration of parquet data.So it seems we have to use some raw sql to make sure we insert timestamps correctly into the database. Issue 2 is not technically incorrect, Issue 3 is only frustrating because a documented API isn't implemented in actuality, but Issue 1 is the big one since Ibis is inserting incorrect data into the database.
What version of ibis are you using?
11.0.0
What backend(s) are you using, if any?
mssql, duckdb
Relevant log output
Code of Conduct
- I agree to follow this project's Code of Conduct
Metadata
Metadata
Assignees
Labels
Type
Projects
Status