Merge branch 'mangecoeur-master'

jreback · jreback · commit c7e3058d0ae2 · 2014-02-06T16:31:44.000-05:00
diff --git a/README.md b/README.md
@@ -106,6 +106,7 @@ pip install pandas
 - [Cython](http://www.cython.org): Only necessary to build development version. Version 0.17.1 or higher.
 - [SciPy](http://www.scipy.org): miscellaneous statistical functions
 - [PyTables](http://www.pytables.org): necessary for HDF5-based storage
+- [SQLAlchemy](http://www.sqlalchemy.org): for SQL database support. Version 0.8.1 or higher recommended.
 - [matplotlib](http://matplotlib.sourceforge.net/): for plotting
 - [statsmodels](http://statsmodels.sourceforge.net/)
    - Needed for parts of `pandas.stats`
diff --git a/ci/requirements-2.6.txt b/ci/requirements-2.6.txt
@@ -6,3 +6,4 @@ http://www.crummy.com/software/BeautifulSoup/bs4/download/4.2/beautifulsoup4-4.2
 html5lib==1.0b2
 bigquery==2.0.17
 numexpr==1.4.2
+sqlalchemy==0.8.1
diff --git a/ci/requirements-2.7.txt b/ci/requirements-2.7.txt
@@ -19,3 +19,4 @@ scipy==0.10.0
 beautifulsoup4==4.2.1
 statsmodels==0.5.0
 bigquery==2.0.17
+sqlalchemy==0.8.1
diff --git a/ci/requirements-2.7_LOCALE.txt b/ci/requirements-2.7_LOCALE.txt
@@ -15,3 +15,4 @@ scipy==0.10.0
 beautifulsoup4==4.2.1
 statsmodels==0.5.0
 bigquery==2.0.17
+sqlalchemy==0.8.1
diff --git a/ci/requirements-3.3.txt b/ci/requirements-3.3.txt
@@ -14,3 +14,4 @@ lxml==3.2.1
 scipy==0.12.0
 beautifulsoup4==4.2.1
 statsmodels==0.4.3
+sqlalchemy==0.9.1
diff --git a/doc/source/install.rst b/doc/source/install.rst
@@ -95,6 +95,7 @@ Optional Dependencies
     version. Version 0.17.1 or higher.
   * `SciPy <http://www.scipy.org>`__: miscellaneous statistical functions
   * `PyTables <http://www.pytables.org>`__: necessary for HDF5-based storage
+  * `SQLAlchemy <http://www.sqlalchemy.org>`__: for SQL database support. Version 0.8.1 or higher recommended.
   * `matplotlib <http://matplotlib.sourceforge.net/>`__: for plotting
   * `statsmodels <http://statsmodels.sourceforge.net/>`__
      * Needed for parts of :mod:`pandas.stats`
diff --git a/doc/source/io.rst b/doc/source/io.rst
@@ -1823,7 +1823,7 @@ class. The following two command are equivalent:
     read_excel('path_to_file.xls', 'Sheet1', index_col=None, na_values=['NA'])
 
 The class based approach can be used to read multiple sheets or to introspect
-the sheet names using the ``sheet_names`` attribute. 
+the sheet names using the ``sheet_names`` attribute.
 
 .. note::
 
@@ -3068,13 +3068,52 @@ SQL Queries
 -----------
 
 The :mod:`pandas.io.sql` module provides a collection of query wrappers to both
-facilitate data retrieval and to reduce dependency on DB-specific API. These
-wrappers only support the Python database adapters which respect the `Python
-DB-API <http://www.python.org/dev/peps/pep-0249/>`__. See some
-:ref:`cookbook examples <cookbook.sql>` for some advanced strategies
+facilitate data retrieval and to reduce dependency on DB-specific API. Database abstraction
+is provided by SQLAlchemy if installed, in addition you will need a driver library for
+your database.
 
-For example, suppose you want to query some data with different types from a
-table such as:
+.. versionadded:: 0.14.0
+
+
+If SQLAlchemy is not installed a legacy fallback is provided for sqlite and mysql.
+These legacy modes require Python database adapters which respect the `Python
+DB-API <http://www.python.org/dev/peps/pep-0249/>`__.
+
+See also some :ref:`cookbook examples <cookbook.sql>` for some advanced strategies.
+
+The key functions are:
+:func:`~pandas.io.sql.to_sql`
+:func:`~pandas.io.sql.read_sql`
+:func:`~pandas.io.sql.read_table`
+
+
+In the following example, we use the `SQlite <http://www.sqlite.org/>`__ SQL database
+engine. You can use a temporary SQLite database where data are stored in
+"memory".
+
+To connect with SQLAlchemy you use the :func:`create_engine` function to create an engine
+object from database URI. You only need to create the engine once per database you are
+connecting to.
+
+For more information on :func:`create_engine` and the URI formatting, see the examples
+below and the SQLAlchemy `documentation <http://docs.sqlalchemy.org/en/rel_0_9/core/engines.html>`__
+
+.. code-block:: python
+
+   from sqlalchemy import create_engine
+   from pandas.io import sql
+   # Create your connection.
+   engine = create_engine('sqlite:///:memory:')
+
+Writing DataFrames
+~~~~~~~~~~~~~~~~~~
+
+<<<<<<< HEAD
+Assuming the following data is in a DataFrame "data", we can insert it into
+=======
+Assuming the following data is in a DataFrame ``data``, we can insert it into
+>>>>>>> 6314e6f... ENH #4163 Tweaks to docs, avoid mutable default args, mysql tests
+the database using :func:`~pandas.io.sql.to_sql`.
 
 
 +-----+------------+-------+-------+-------+
@@ -3088,81 +3127,153 @@ table such as:
 +-----+------------+-------+-------+-------+
 
 
-Functions from :mod:`pandas.io.sql` can extract some data into a DataFrame. In
-the following example, we use the `SQlite <http://www.sqlite.org/>`__ SQL database
-engine. You can use a temporary SQLite database where data are stored in
-"memory". Just do:
-
-.. code-block:: python
+.. ipython:: python
+   :suppress:
 
-   import sqlite3
+   from sqlalchemy import create_engine
    from pandas.io import sql
-   # Create your connection.
-   cnx = sqlite3.connect(':memory:')
+   engine = create_engine('sqlite:///:memory:')
 
 .. ipython:: python
    :suppress:
 
-   import sqlite3
-   from pandas.io import sql
-   cnx = sqlite3.connect(':memory:')
+   c = ['id', 'Date', 'Col_1', 'Col_2', 'Col_3']
+   d = [(26, datetime.datetime(2010,10,18), 'X', 27.5, True),
+   (42, datetime.datetime(2010,10,19), 'Y', -12.5, False),
+   (63, datetime.datetime(2010,10,20), 'Z', 5.73, True)]
+
+   data  = DataFrame(d, columns=c)
+
+Reading Tables
+~~~~~~~~~~~~~~
+
+:func:`~pandas.io.sql.read_table` will read a databse table given the
+table name and optionally a subset of columns to read.
+
+.. note::
+
+    In order to use :func:`~pandas.io.sql.read_table`, you **must** have the
+    SQLAlchemy optional dependency installed.
 
 .. ipython:: python
-   :suppress:
 
-   cu = cnx.cursor()
-   # Create a table named 'data'.
-   cu.execute("""CREATE TABLE data(id integer,
-                                   date date,
-                                   Col_1 string,
-                                   Col_2 float,
-                                   Col_3 bool);""")
-   cu.executemany('INSERT INTO data VALUES (?,?,?,?,?)',
-                  [(26, datetime.datetime(2010,10,18), 'X', 27.5, True),
-                   (42, datetime.datetime(2010,10,19), 'Y', -12.5, False),
-                   (63, datetime.datetime(2010,10,20), 'Z', 5.73, True)])
+   sql.read_table('data', engine)
 
+You can also specify the name of the column as the DataFrame index,
+and specify a subset of columns to be read.
 
-Let ``data`` be the name of your SQL table. With a query and your database
-connection, just use the :func:`~pandas.io.sql.read_sql` function to get the
-query results into a DataFrame:
+.. ipython:: python
+
+   sql.read_table('data', engine, index_col='id')
+   sql.read_table('data', engine, columns=['Col_1', 'Col_2'])
+
+And you can explicitly force columns to be parsed as dates:
 
 .. ipython:: python
 
-   sql.read_sql("SELECT * FROM data;", cnx)
+   sql.read_table('data', engine, parse_dates=['Date'])
 
-You can also specify the name of the column as the DataFrame index:
+If needed you can explicitly specifiy a format string, or a dict of arguments
+to pass to :func:`pandas.tseries.tools.to_datetime`.
+
+.. code-block:: python
+
+   sql.read_table('data', engine, parse_dates={'Date': '%Y-%m-%d'})
+   sql.read_table('data', engine, parse_dates={'Date': {'format': '%Y-%m-%d %H:%M:%S'}})
+
+
+You can check if a table exists using :func:`~pandas.io.sql.has_table`
+
+In addition, the class :class:`~pandas.io.sql.PandasSQLWithEngine` can be
+instantiated directly for more manual control over the SQL interaction.
+
+Querying
+~~~~~~~~
+
+You can query using raw SQL in the :func:`~pandas.io.sql.read_sql` function.
+In this case you must use the SQL variant appropriate for your database.
+When using SQLAlchemy, you can also pass SQLAlchemy Expression language constructs,
+which are database-agnostic.
 
 .. ipython:: python
 
-   sql.read_sql("SELECT * FROM data;", cnx, index_col='id')
-   sql.read_sql("SELECT * FROM data;", cnx, index_col='date')
+  sql.read_sql('SELECT * FROM data', engine)
 
 Of course, you can specify a more "complex" query.
 
 .. ipython:: python
 
-   sql.read_sql("SELECT id, Col_1, Col_2 FROM data WHERE id = 42;", cnx)
+   sql.read_frame("SELECT id, Col_1, Col_2 FROM data WHERE id = 42;", engine)
 
-.. ipython:: python
-   :suppress:
 
-   cu.close()
-   cnx.close()
+You can also run a plain query without creating a dataframe with
+:func:`~pandas.io.sql.execute`. This is useful for queries that don't return values,
+such as INSERT. This is functionally equivalent to calling ``execute`` on the
+SQLAlchemy engine or db connection object. Again, ou must use the SQL syntax
+variant appropriate for your database.
 
+.. code-block:: python
 
-There are a few other available functions:
+   sql.execute('SELECT * FROM table_name', engine)
 
-  - ``tquery`` returns a list of tuples corresponding to each row.
-  - ``uquery`` does the same thing as tquery, but instead of returning results
-    it returns the number of related rows.
-  - ``write_frame`` writes records stored in a DataFrame into the SQL table.
-  - ``has_table`` checks if a given SQLite table exists.
+<<<<<<< HEAD
+<<<<<<< HEAD
+:func:`~pandas.io.sql.tquery` returns a list of tuples corresponding to each row.
 
-.. note::
+:func:`~pandas.io.sql.uquery` does the same thing as tquery, but instead of
+returning results it returns the number of related rows.
+=======
+>>>>>>> ac6bf42... ENH #4163 Added more robust type coertion, datetime parsing, and parse date options. Updated optional dependancies
+
+In addition, the class :class:`~pandas.io.sql.PandasSQLWithEngine` can be
+instantiated directly for more manual control over the SQL interaction.
+=======
+   sql.execute('INSERT INTO table_name VALUES(?, ?, ?)', engine, params=[('id', 1, 12.2, True)])
+
+>>>>>>> 6314e6f... ENH #4163 Tweaks to docs, avoid mutable default args, mysql tests
+
+Engine connection examples
+~~~~~~~~~~~~~~~~~~~~~~~~~~
+
+.. code-block:: python
+
+  from sqlalchemy import create_engine
+
+  engine = create_engine('postgresql://scott:tiger@localhost:5432/mydatabase')
+
+  engine = create_engine('mysql+mysqldb://scott:tiger@localhost/foo')
+
+  engine = create_engine('oracle://scott:tiger@127.0.0.1:1521/sidname')
+
+  engine = create_engine('mssql+pyodbc://mydsn')
+
+  # sqlite://<nohostname>/<path>
+  # where <path> is relative:
+  engine = create_engine('sqlite:///foo.db')
+
+  # or absolute, starting with a slash:
+  engine = create_engine('sqlite:////absolute/path/to/foo.db')
+
+
+Legacy
+~~~~~~
+To use the sqlite support without SQLAlchemy, you can create connections like so:
+
+.. code-block:: python
+
+   import sqlite3
+   from pandas.io import sql
+   cnx = sqlite3.connect(':memory:')
+
+And then issue the following queries, remembering to also specify the flavor of SQL
+you are using.
+
+.. code-block:: python
+
+   sql.to_sql(data, 'data', cnx,  flavor='sqlite')
+
+   sql.read_sql("SELECT * FROM data", cnx, flavor='sqlite')
 
-   For now, writing your DataFrame into a database works only with
-   **SQLite**. Moreover, the **index** will currently be **dropped**.
 
 .. _io.bigquery:
 
diff --git a/pandas/io/sql.py b/pandas/io/sql.py
diff --git a/pandas/io/tests/data/iris.csv b/pandas/io/tests/data/iris.csv
diff --git a/pandas/io/tests/test_sql.py b/pandas/io/tests/test_sql.py