Description
I am trying to add a list of strings stored in a pandas Dataframe to a BigQuery table with a REPEATED
field. When running this code:
import pandas as pd
from google.cloud import bigquery
from google.oauth2 import service_account
df = pd.DataFrame([{"repeated": ["hi", "hello"], "not_repeated": "a_string"}])
table = bigquery.Table(
"project.dataset_name.table_name",
schema=[
bigquery.SchemaField("repeated", "string", "REPEATED"),
bigquery.SchemaField("not_repeated", "string", "NULLABLE"),
],
)
bigquery_client = bigquery.Client(
credentials=service_account.Credentials.from_service_account_file(
"service-account-credentials.json"
)
)
bigquery_client.insert_rows_from_dataframe(table, df)
I get this error:
Traceback (most recent call last):
File "test.py", line 20, in <module>
bigquery_client.insert_rows_from_dataframe(table, df)
File "/Users/emmacombes/.local/share/virtualenvs/bq-stats-sAw4GWcD/lib/python3.7/site-packages/google/cloud/bigquery/client.py", line 3433, in insert_rows_from_dataframe
result = self.insert_rows(table, rows_chunk, selected_fields, **kwargs)
File "/Users/emmacombes/.local/share/virtualenvs/bq-stats-sAw4GWcD/lib/python3.7/site-packages/google/cloud/bigquery/client.py", line 3381, in insert_rows
json_rows = [_record_field_to_json(schema, row) for row in rows]
File "/Users/emmacombes/.local/share/virtualenvs/bq-stats-sAw4GWcD/lib/python3.7/site-packages/google/cloud/bigquery/client.py", line 3381, in <listcomp>
json_rows = [_record_field_to_json(schema, row) for row in rows]
File "/Users/emmacombes/.local/share/virtualenvs/bq-stats-sAw4GWcD/lib/python3.7/site-packages/google/cloud/bigquery/_pandas_helpers.py", line 800, in dataframe_to_json_generator
if pandas.isna(value):
ValueError: The truth value of an array with more than one element is ambiguous. Use a.any() or a.all()
Which stops the execution, and does not allow the code to upload to bigquery. I can confirm that if I run the same code without the list element (aka. df = pd.DataFrame([{"not_repeated": "a_string"}])
, the error does not occur.
I think this can be traced back to the recently changed line if pandas.isna(value):
from this previous PR (use pandas function to check for NaN #750) to solve this previous issue (dataframe_to_json_generator doesn't support pandas.NA type #729 ). As evaluating pandas.isna(value)
on a list will give an array of bools, which can then not be interpreted by the if
statement.
I can confirm that if I go to an older version of this library before this change was made, the code works.
Environment details
- OS type and version: MacOS BigSur 11.5.2
- Python version: Python 3.7.5
- pip version: pip 19.2.3
google-cloud-bigquery
version: 2.24.0