Skip to content

Nested labeled expressions are omitted #39

Closed
@JacobHayes

Description

@JacobHayes

The pybigquery driver replaces nested labeled expression with the label instead of the actual expression. The result is an "Unrecognized name" error. Otherwise, when the expression label matches the inner column name, the expression will be silently omitted for a plain column reference.

When the query is compiled and printed with literal_binds, the query looks correct (ie: the full expression is included), but for plain print, compile, and the actual execution, the expression is missing.

It seems like labeled expressions are only compiled one step past the first inner label instead of recursively.

Test case:

from contextlib import contextmanager

from sqlalchemy import MetaData, Table, case, create_engine, func, literal, select

engine = create_engine("bigquery://<your connstr here>")
# engine = create_engine("postgresql:/<your connstr here>")


@contextmanager
def test_query():
    engine.execute(
        """
        create table test as (
            select 1 as val
            union all select 2
            union all select 3
        )
    """
    )
    val = Table("test", MetaData(bind=engine), autoload=True).c.val
    sum = func.sum
    try:
        yield select(
            [
                # References `inner` instead of `val`.
                sum(sum(val.label("inner")).label("outer")).over(),
                # Also references `inner` instead of `val`.
                sum(case([[literal(True), val.label("inner")]]).label("outer")),
                # Completely omits the case expression and references `middle`.
                sum(
                    sum(
                        case([[literal(True), val.label("inner")]]).label("middle")
                    ).label("outer")
                ).over(),
            ]
        )
    finally:
        engine.execute("drop table test")


with test_query() as q:
    print("Print:")
    print(q)

    print("\nCompile:")
    print(q.compile(engine))

    print("\nLiteral compile:")
    print(q.compile(engine, compile_kwargs={"literal_binds": True}))

    print("\nResult:")
    print(tuple(engine.execute(q)))

The output from the Bigquery Engine is below - note that the query in the exception and all prints except the literal compile show the labels instead of inner expressions.

Print:
SELECT sum(sum(`inner`)) OVER () AS `anon_1`, sum(CASE WHEN %(param_1)s THEN `inner` END) AS `sum_1`, sum(sum(`middle`)) OVER () AS `anon_2`
FROM `test`

Compile:
SELECT sum(sum(`inner`)) OVER () AS `anon_1`, sum(CASE WHEN %(param_1)s THEN `inner` END) AS `sum_1`, sum(sum(`middle`)) OVER () AS `anon_2`
FROM `test`

Literal compile:
SELECT sum(sum(`test`.`val`)) OVER () AS `anon_1`, sum(CASE WHEN true THEN `test`.`val` END) AS `sum_1`, sum(sum(CASE WHEN true THEN `test`.`val` END)) OVER () AS `anon_2`
FROM `test`

Result:
Traceback (most recent call last):
  File "/usr/local/lib/python3.7/site-packages/google/cloud/bigquery/dbapi/cursor.py", line 172, in execute
    self._query_job.result()
  File "/usr/local/lib/python3.7/site-packages/google/cloud/bigquery/job.py", line 2939, in result
    super(QueryJob, self).result(timeout=timeout)
  File "/usr/local/lib/python3.7/site-packages/google/cloud/bigquery/job.py", line 734, in result
    return super(_AsyncJob, self).result(timeout=timeout)
  File "/usr/local/lib/python3.7/site-packages/google/api_core/future/polling.py", line 127, in result
    raise self._exception
google.api_core.exceptions.BadRequest: 400 Unrecognized name: `inner` at [1:16]

(job ID: <omitted>)

                                                         -----Query Job SQL Follows-----                                                       

    |    .    |    .    |    .    |    .    |    .    |    .    |    .    |    .    |    .    |    .    |    .    |    .    |    .    |    .    |
   1:SELECT sum(sum(`inner`)) OVER () AS `anon_1`, sum(CASE WHEN @`param_1` THEN `inner` END) AS `sum_1`, sum(sum(`middle`)) OVER () AS `anon_2`
   2:FROM `test`
    |    .    |    .    |    .    |    .    |    .    |    .    |    .    |    .    |    .    |    .    |    .    |    .    |    .    |    .    |

During handling of the above exception, another exception occurred:

Traceback (most recent call last):
  File "/usr/local/lib/python3.7/site-packages/sqlalchemy/engine/base.py", line 1249, in _execute_context
    cursor, statement, parameters, context
  File "/usr/local/lib/python3.7/site-packages/sqlalchemy/engine/default.py", line 552, in do_execute
    cursor.execute(statement, parameters)
  File "/usr/local/lib/python3.7/site-packages/google/cloud/bigquery/dbapi/cursor.py", line 174, in execute
    raise exceptions.DatabaseError(exc)
google.cloud.bigquery.dbapi.exceptions.DatabaseError: 400 Unrecognized name: `inner` at [1:16]

(job ID: <omitted>)

                                                         -----Query Job SQL Follows-----                                                       

    |    .    |    .    |    .    |    .    |    .    |    .    |    .    |    .    |    .    |    .    |    .    |    .    |    .    |    .    |
   1:SELECT sum(sum(`inner`)) OVER () AS `anon_1`, sum(CASE WHEN @`param_1` THEN `inner` END) AS `sum_1`, sum(sum(`middle`)) OVER () AS `anon_2`
   2:FROM `test`
    |    .    |    .    |    .    |    .    |    .    |    .    |    .    |    .    |    .    |    .    |    .    |    .    |    .    |    .    |

The above exception was the direct cause of the following exception:

Traceback (most recent call last):
  File "src/test.py", line 54, in <module>
    print(tuple(engine.execute(q)))
  File "/usr/local/lib/python3.7/site-packages/sqlalchemy/engine/base.py", line 2179, in execute
    return connection.execute(statement, *multiparams, **params)
  File "/usr/local/lib/python3.7/site-packages/sqlalchemy/engine/base.py", line 988, in execute
    return meth(self, multiparams, params)
  File "/usr/local/lib/python3.7/site-packages/sqlalchemy/sql/elements.py", line 287, in _execute_on_connection
    return connection._execute_clauseelement(self, multiparams, params)
  File "/usr/local/lib/python3.7/site-packages/sqlalchemy/engine/base.py", line 1107, in _execute_clauseelement
    distilled_params,
  File "/usr/local/lib/python3.7/site-packages/sqlalchemy/engine/base.py", line 1253, in _execute_context
    e, statement, parameters, cursor, context
  File "/usr/local/lib/python3.7/site-packages/sqlalchemy/engine/base.py", line 1473, in _handle_dbapi_exception
    util.raise_from_cause(sqlalchemy_exception, exc_info)
  File "/usr/local/lib/python3.7/site-packages/sqlalchemy/util/compat.py", line 398, in raise_from_cause
    reraise(type(exception), exception, tb=exc_tb, cause=cause)
  File "/usr/local/lib/python3.7/site-packages/sqlalchemy/util/compat.py", line 152, in reraise
    raise value.with_traceback(tb)
  File "/usr/local/lib/python3.7/site-packages/sqlalchemy/engine/base.py", line 1249, in _execute_context
    cursor, statement, parameters, context
  File "/usr/local/lib/python3.7/site-packages/sqlalchemy/engine/default.py", line 552, in do_execute
    cursor.execute(statement, parameters)
  File "/usr/local/lib/python3.7/site-packages/google/cloud/bigquery/dbapi/cursor.py", line 174, in execute
    raise exceptions.DatabaseError(exc)
sqlalchemy.exc.DatabaseError: (google.cloud.bigquery.dbapi.exceptions.DatabaseError) 400 Unrecognized name: `inner` at [1:16]

(job ID: <omitted>)

                                                         -----Query Job SQL Follows-----                                                       

    |    .    |    .    |    .    |    .    |    .    |    .    |    .    |    .    |    .    |    .    |    .    |    .    |    .    |    .    |
   1:SELECT sum(sum(`inner`)) OVER () AS `anon_1`, sum(CASE WHEN @`param_1` THEN `inner` END) AS `sum_1`, sum(sum(`middle`)) OVER () AS `anon_2`
   2:FROM `test`
    |    .    |    .    |    .    |    .    |    .    |    .    |    .    |    .    |    .    |    .    |    .    |    .    |    .    |    .    |
[SQL: SELECT sum(sum(`inner`)) OVER () AS `anon_1`, sum(CASE WHEN %(param_1)s THEN `inner` END) AS `sum_1`, sum(sum(`middle`)) OVER () AS `anon_2`
FROM `test`]
[parameters: {'param_1': True}]
(Background on this error at: http://sqlalche.me/e/4xp6)

For context (and correct behavior I think), when I change the engine to a Postgres database, the query looks correct and runs fine:

SELECT sum(sum(test.val)) OVER () AS anon_1, sum(CASE WHEN %(param_1)s THEN test.val END) AS sum_1, sum(sum(CASE WHEN %(param_2)s THEN test.val END)) OVER () AS anon_2
FROM test

Compile:
SELECT sum(sum(test.val)) OVER () AS anon_1, sum(CASE WHEN %(param_1)s THEN test.val END) AS sum_1, sum(sum(CASE WHEN %(param_2)s THEN test.val END)) OVER () AS anon_2
FROM test

Literal compile:
SELECT sum(sum(test.val)) OVER () AS anon_1, sum(CASE WHEN true THEN test.val END) AS sum_1, sum(sum(CASE WHEN true THEN test.val END)) OVER () AS anon_2
FROM test

Result:
((Decimal('6'), 6, Decimal('6')),)

Metadata

Metadata

Assignees

No one assigned

    Labels

    type: bugError or flaw in code with unintended results or allowing sub-optimal usage patterns.

    Type

    No type

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions