Closed
Description
Environment details
- OS type and version: Ubuntu 20.04.2
- Python version:
3.9.4
- pip version:
21.1.2
google-cloud-bigquery
version:2.20.0
Steps to reproduce
- Run the code example below
Code example
import pandas
from google.cloud import bigquery
df = pandas.DataFrame({
"series_a": [1, 2, pandas.NA]
})
json_iter = bigquery._pandas_helpers.dataframe_to_json_generator(df)
for row in json_iter:
print(row)
Stack trace
{'series_a': 1}
{'series_a': 2}
Traceback (most recent call last):
File "/home/christian/code/bug_example.py", line 11, in <module>
for row in json_iter:
File "/home/christian/miniconda3/envs/data-services-prod/lib/python3.9/site-packages/google/cloud/bigquery/_pandas_helpers.py", line 783, in dataframe_to_json_generator
if value != value:
File "pandas/_libs/missing.pyx", line 360, in pandas._libs.missing.NAType.__bool__
TypeError: boolean value of NA is ambiguous
Suggested fix
Starting with pandas 1.0, an experimental pandas.NA value (singleton) is available to represent scalar missing values as
opposed to numpy.nan. Comparing the variable with itself (value != value
) results in a TypeError
as the pandas.NA
value doesn't support type-casting to boolean.
I am planning to make a PR that switches the syntax value != value
on _pandas_helpers.py#L783 to use the pandas.isna
function but wanted to check if there is a better solution before I submit a patch?