DEPR: pandas.io.sql.execute #50185

cdcadman · 2022-12-11T16:59:48Z

There is currently a bug in pandas.io.sql.execute on the main branch (not in any released versions). I created the bug in #49531 while working on #48576. In #49531, I changed SQLDatabase to only accept a Connection and not an Engine, because sqlalchemy 2.0 restricts the methods that are available to Engine. #49531 created bugs in both read_sql and pandas.io.sql.execute, and I have an open PR to fix read_sql in #49967.

This reproduces the bug in pandas.io.sql.execute:

from pathlib import Path
from sqlalchemy import create_engine
from pandas import DataFrame
from pandas.io import sql

db_file = Path(r"C:\Temp\test.db")
db_file.unlink(missing_ok=True)
db_uri = "sqlite:///" + str(db_file)
engine = create_engine(db_uri)
DataFrame({'a': [2, 4], 'b': [3, 6]}).to_sql('test_table', engine)
sql.execute('select * from test_table', engine).fetchall()

Traceback when running on the main branch in pandas:

Error closing cursor
Traceback (most recent call last):
  File "C:\Program Files\Python310\lib\site-packages\sqlalchemy\engine\cursor.py", line 991, in fetchall
    rows = dbapi_cursor.fetchall()
sqlite3.ProgrammingError: Cannot operate on a closed database.

During handling of the above exception, another exception occurred:

Traceback (most recent call last):
  File "C:\Program Files\Python310\lib\site-packages\sqlalchemy\engine\base.py", line 1995, in _safe_close_cursor
    cursor.close()
sqlite3.ProgrammingError: Cannot operate on a closed database.
Traceback (most recent call last):
  File "C:\Program Files\Python310\lib\site-packages\sqlalchemy\engine\cursor.py", line 991, in fetchall
    rows = dbapi_cursor.fetchall()
sqlite3.ProgrammingError: Cannot operate on a closed database.

The above exception was the direct cause of the following exception:

Traceback (most recent call last):
  File "C:\Github\pandas\execute_error.py", line 11, in <module>
    sql.execute('select * from test_table', engine).fetchall()
  File "C:\Program Files\Python310\lib\site-packages\sqlalchemy\engine\result.py", line 1072, in fetchall
    return self._allrows()
  File "C:\Program Files\Python310\lib\site-packages\sqlalchemy\engine\result.py", line 401, in _allrows
    rows = self._fetchall_impl()
  File "C:\Program Files\Python310\lib\site-packages\sqlalchemy\engine\cursor.py", line 1819, in _fetchall_impl
    return self.cursor_strategy.fetchall(self, self.cursor)
  File "C:\Program Files\Python310\lib\site-packages\sqlalchemy\engine\cursor.py", line 995, in fetchall
    self.handle_exception(result, dbapi_cursor, e)
  File "C:\Program Files\Python310\lib\site-packages\sqlalchemy\engine\cursor.py", line 955, in handle_exception
    result.connection._handle_dbapi_exception(
  File "C:\Program Files\Python310\lib\site-packages\sqlalchemy\engine\base.py", line 2124, in _handle_dbapi_exception
    util.raise_(
  File "C:\Program Files\Python310\lib\site-packages\sqlalchemy\util\compat.py", line 211, in raise_
    raise exception
  File "C:\Program Files\Python310\lib\site-packages\sqlalchemy\engine\cursor.py", line 991, in fetchall
    rows = dbapi_cursor.fetchall()
sqlalchemy.exc.ProgrammingError: (sqlite3.ProgrammingError) Cannot operate on a closed database.
(Background on this error at: https://sqlalche.me/e/14/f405)

When running on pandas 1.5.2, there is no error.

After #49967, SQLDatabase accepts str, Engine, and Connection, but in SQLDatabase.__init__, a Connection is created if it was not passed in, and an ExitStack is created to clean up the Connection and Engine if necessary. Cleanup happens either in SQLDatabase.__exit__ or at the end of the generator which is returned by read_sql if chunksize is not None. What I'm not able to see is how to do the cleanup after the caller fetches the results of pandas.io.sql.execute. I have come up with two options so far:

Restrict pandas.io.sql.execute to require a Connection, or else do not allow queries that return rows if a str or Engine is used.
Use Result.freeze to fetch the results before closing the SQLDatabase context manager. The problem here is that it requires sqlalchemy 1.4.45, which was only released on 12/10/2022, due to a bug in prior versions: freeze does not work for textual SQL sqlalchemy/sqlalchemy#8963

The text was updated successfully, but these errors were encountered:

phofl · 2022-12-11T17:22:26Z

So pandas.io.sql.execute is not public. How does this impact the public API?

When putting your query into read_sql it works as expected

cdcadman · 2022-12-11T17:40:23Z

No impacts outside of pandas.io.sql.execute. This method is mentioned above this line, so I should at least update the documentation: https://pandas.pydata.org/docs/user_guide/io.html?highlight=sql%20execute#engine-connection-examples. I'm not worried about the behavior of pandas.io.sql.execute changing if it's ok with you.

phofl · 2022-12-11T17:46:53Z

Thx for pointing this out. Was not aware of that. Do we have public docstrings for this function?
We should raise a proper error message at least. Do you know if there is an advantage compared to calling engine.execute?

@fangchenli could you weight in here? Any reason that this is public?

cdcadman · 2022-12-11T18:36:46Z

I don't think there are any public docstrings. I could change the PR to raise an error if the caller passes an Engine or str and the result returns rows. Engine.execute will be removed in sqlalchemy 2.0: https://docs.sqlalchemy.org/en/14/core/connections.html#sqlalchemy.engine.Engine.execute.

fangchenli · 2022-12-11T18:49:34Z

This method was introduced in the initial commit of the SQL module in 2010 as a helper function. I don't think it's still relevant today. Removing it completely would be an even better option.

phofl · 2022-12-11T18:51:14Z

Thx, then let’s raise an error if anything other than a connection is passed and let’s deprecate it to get rid of it in 3.0

luke396 · 2023-01-07T13:52:46Z

take

cdcadman mentioned this issue Dec 11, 2022

BUG: Make io.sql.execute raise TypeError on Engine or URI string. #50177

Merged

5 tasks

phofl added IO SQL to_sql, read_sql, read_sql_query Deprecate Functionality to remove in pandas labels Dec 12, 2022

phofl added this to the 2.0 milestone Dec 12, 2022

phofl added the Error Reporting Incorrect or improved errors from pandas label Dec 12, 2022

mroeschke changed the title ~~It is difficult for pandas.io.sql.execute to work with sqlalchemy 2.0.~~ DEPR: pandas.io.sql.execute Dec 15, 2022

datapythonista added the good first issue label Jan 4, 2023

github-actions bot assigned luke396 Jan 7, 2023

This was referenced Jan 9, 2023

DEPR: Add FutureWarning for pandas.io.sql.execute #50638

Closed

DEPR: ADD FutureWarning for pandas.io.execute #50676

Merged

phofl closed this as completed in #50676 Jan 12, 2023

phofl mentioned this issue Jan 12, 2023

DEPR: List of deprecations to be removed in 3.0 #50578

Open

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

DEPR: pandas.io.sql.execute #50185

DEPR: pandas.io.sql.execute #50185

cdcadman commented Dec 11, 2022

phofl commented Dec 11, 2022

cdcadman commented Dec 11, 2022

phofl commented Dec 11, 2022

cdcadman commented Dec 11, 2022

fangchenli commented Dec 11, 2022

phofl commented Dec 11, 2022

luke396 commented Jan 7, 2023

DEPR: pandas.io.sql.execute #50185

DEPR: pandas.io.sql.execute #50185

Comments

cdcadman commented Dec 11, 2022

phofl commented Dec 11, 2022

cdcadman commented Dec 11, 2022

phofl commented Dec 11, 2022

cdcadman commented Dec 11, 2022

fangchenli commented Dec 11, 2022

phofl commented Dec 11, 2022

luke396 commented Jan 7, 2023