Skip to content

Sqlachemy raises when writing write timedelta64 columns to sqlite. #6921

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Closed
danielballan opened this issue Apr 22, 2014 · 16 comments · Fixed by #7076
Closed

Sqlachemy raises when writing write timedelta64 columns to sqlite. #6921

danielballan opened this issue Apr 22, 2014 · 16 comments · Fixed by #7076
Labels
IO SQL to_sql, read_sql, read_sql_query Timedelta Timedelta data type
Milestone

Comments

@danielballan
Copy link
Contributor

Maybe we can work around this. At least it should raise informatively.

In [24]: df = pd.to_timedelta(Series(['00:00:01', '00:00:03'], name='foo')).to_frame()

In [25]: df
Out[25]: 
       foo
0 00:00:01 
1 00:00:03

[2 rows x 1 columns]

In [26]: df.dtypes
Out[26]: 
foo    timedelta64[ns]
dtype: object

In [27]: df.to_sql('bar', engine)
(...)
StatementError: unsupported operand type(s) for +: 'datetime.datetime' and 'long' (original cause: TypeError: unsupported operand type(s) for +: 'datetime.datetime' and 'long') 'INSERT INTO bar (foo) VALUES (?)' [{'foo': 1000000000L}, {'foo': 3000000000L}]

The full traceback is in this gist.

Using a sqlite3.Connection (legacy-style) allows you to write and read.

In [32]: conn = sqlite3.connect(':memory:')

In [33]: df.to_sql('bar', conn)

In [37]: pd.read_sql('SELECT * FROM bar', conn, flavor='sqlite')
Out[37]: 
   index         foo
0      0  1000000000
1      1  3000000000

The data comes back as int64 type. There is no clean way around this, no obvious way to tell that these sqlite3 integers are actually timedeltas. We could store timedeltas as strings, like datetimes. But I'm not necessarily in favor of that approach.

@jorisvandenbossche jorisvandenbossche added this to the 0.14.0 milestone Apr 22, 2014
@jtratner
Copy link
Contributor

it looks like there isn't a timedelta/interval type in sqlite, so we can't really support serialization/deserialization. (given that there's no metadata to be found)

I'd support raising here.

@jorisvandenbossche
Copy link
Member

The reason you get this error is the following I think: we say to sqlalchemy that it is a timedelta (wich maps to Interval sqlalchemy type, https://github.com/pydata/pandas/blob/master/pandas/io/sql.py#L663). And for databases that don't support this, sqlalchemy stores it as a datetime (relative to 1970-01-01, http://docs.sqlalchemy.org/en/rel_0_9/core/types.html#sqlalchemy.types.Interval). For sqlite, a datetime is stored as a string.
But when feeding the data to insert, I suppose that sqlachemy does not recognizes it as a timedelta but just sees an int64.

But I agree this is a bit difficult to support.

And we have also to test this for other flavors, as I actually think this will not work for any (timedelta's are totally untested at the moment). Eg postgresql has a timedelta type, but at the moment this will give the same error I think.

Maybe we should just disallow timedelta's for now, or explicitely save them to sql as ints and raise a warning saying this?

@jtratner
Copy link
Contributor

Can we at least add a warning?

@jorisvandenbossche
Copy link
Member

Convert to integer and add warning? Or drop column and warning?

@danielballan
Copy link
Contributor Author

I vote for convert to integer and warn -- if we can't solve it properly.

On Friday, April 25, 2014, Joris Van den Bossche [email protected]
wrote:

Convert to integer and add warning? Or drop column and warning?


Reply to this email directly or view it on GitHubhttps://github.com//issues/6921#issuecomment-41412157
.

@jreback
Copy link
Contributor

jreback commented Apr 25, 2014

Here's some options (and pd.timedelta can interpret any of these); note that string conversion is the slowest

In [1]: s = pd.to_timedelta(np.arange(5),unit='s')

In [2]: s
Out[2]: 
0   00:00:00
1   00:00:01
2   00:00:02
3   00:00:03
4   00:00:04
dtype: timedelta64[ns]

In [3]: s.astype('timedelta64[s]')
Out[3]: 
0    0
1    1
2    2
3    3
4    4
dtype: float64

In [4]: s.astype('timedelta64[ms]')
Out[4]: 
0       0
1    1000
2    2000
3    3000
4    4000
dtype: float64

string conversion

In [10]: from pandas.core.format import _get_format_timedelta64

In [11]: [formatter(x) for x in s.values]

In [13]: formatter = _get_format_timedelta64(s.values)

In [14]: [formatter(x) for x in s.values]
Out[14]: ['00:00:00', '00:00:01', '00:00:02', '00:00:03', '00:00:04']

@jorisvandenbossche
Copy link
Member

@jreback The string conversion, is this always non-ambiguous? And not subject to change? (eg how the string output of timedelta looks like has recently changed? #5701) Because if we convert it to a string, it should be some kind of 'standard' way of formatting a timedelta I think.

sqlalchemy does support datetime.timedelta objects, so a possibility is also to convert it to that.

But easiest seems to do a conversion to int. I suppose we should always use the same frequency base, so always using timedelta64[ns] and not s or ms?

@jreback
Copy link
Contributor

jreback commented May 5, 2014

it is always non ambiguous
the change in 5701 to shorten up output when possible (so their is a long and short format)

however the format I picked is not necessarily standard per se

if I can use a timedelata object maybe do that
I think string is better than ints
not really sure what to do here

biggest issue is reverse inference
it's not trivial to figure out what a string or int column actually should be a timedela (or a date for that matter)

so if sqlalchemy does have a timedelta type I would just do that (and defer to 0.14.1 if needed)

@jorisvandenbossche
Copy link
Member

From sqlalchmey docs: 'the value is stored as a date which is relative to the “epoch” (Jan. 1, 1970).' (http://docs.sqlalchemy.org/en/rel_0_9/core/types.html#sqlalchemy.types.Interval). So also not ideal I think, and difficult to reverse infer the data type.

Eg for mysql (which has no native timedelta type), for a timedelta of one day, you will get the datetime 1970-01-02 back, and I think there is no way to distuinguish between datetime and timedelta in this case.

@jreback
Copy link
Contributor

jreback commented May 5, 2014

If you know its a timedelta, then this is very easy to reverse

In [3]: v = pd.to_timedelta(1,unit='d')

In [4]: v
Out[4]: numpy.timedelta64(86400000000000,'ns')

In [5]: Timestamp(0)
Out[5]: Timestamp('1970-01-01 00:00:00')

In [6]: Timestamp(0+v)
Out[6]: Timestamp('1970-01-02 00:00:00')

In [7]: pd.to_timedelta(Timestamp(0+v).value)
Out[7]: numpy.timedelta64(86400000000000,'ns')

just to_timedelta rather than to_datetime, they are both in ns

In [8]: pd.to_timedelta(v)
Out[8]: numpy.timedelta64(86400000000000,'ns')

@jorisvandenbossche
Copy link
Member

yes, but the problem is: we don't know it is a timedelta, as it is just stored as a datetime.

@jreback
Copy link
Contributor

jreback commented May 5, 2014

isn't their a column type? (isn't that that what sqlalchemy returns to you?) if not that IS a big problem; storing as int/string have the same problem. you could add a parse_timedelta, like parse_dates and make the user do it

@jorisvandenbossche
Copy link
Member

@jreback yes there will be a column type, but this will be of type DATETIME, so there is no way to know if is has to be interpreted as timedelta from epoch or as real datetime.

parse_timedelta could be an option, and maybe good as a feature request, because apart from that we have to decide how we handle timedelta's when writing to sql: int or string and/or warn.

I would for now certainly raise a UserWarning. For conversion to int/string, I would also tend to convert to int (although @jreback you prefered string?), as this seems less surprising than a datetime somwhere in the '70s. But then what timebase: ns? (as this is how it is stored in pandas)

@danielballan
Copy link
Contributor Author

+1 for storing as ints. Another reason: you can do numerical operations on them in the database, also true for the flavors that support a full-fledged timedelta column type. Nanosecond timebase makes sense to me -- it would be hard to justify a different choice.

@jreback
Copy link
Contributor

jreback commented May 8, 2014

all sounds good to me
ns it is then
the reverse converse would have to wait for next version for parse_timedelta

@jorisvandenbossche
Copy link
Member

ok, I will put up a fix

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
IO SQL to_sql, read_sql, read_sql_query Timedelta Timedelta data type
Projects
None yet
Development

Successfully merging a pull request may close this issue.

4 participants