Skip to content

Commit 5663d2a

Browse files
shuoweiljialuootswast
authored
docs: update notebook for JSON subfields support in to_pandas_batches() (#2138)
* change to ai.generate * perf: Default to interactive display for SQL in anywidget mode Previously, SQL queries in anywidget mode would fall back to deferred execution, showing a dry run instead of an interactive table. This change modifies the display logic to directly use the anywidget interactive display for SQL queries, providing a more consistent and responsive user experience. A test case has been added to verify this behavior. * fix: resolve double printing issue in anywidget mode * feat: Add test case for STRUCT column in anywidget Adds a test case to verify that a DataFrame with a STRUCT column is correctly displayed in anywidget mode. This test confirms that displaying a STRUCT column does not raise an exception that would trigger the fallback to the deferred representation. It mocks `IPython.display.display` to capture the `TableWidget` instance and asserts that the rendered HTML contains the expected string representation of the STRUCT data. * fix presubmit * Revert accidental changes to test_function.py * revert accidental change to blob.py * change return type * add todo and revert change * Revert "add todo and revert change" This reverts commit 153e1d2. * Add todo * Fix: Handle JSON dtype in anywidget display This commit fixes an AttributeError that occurred when displaying a DataFrame with a JSON column in anywidget mode. The dtype check was incorrect and has been updated. Additionally, the SQL compilation for casting JSON to string has been corrected to use TO_JSON_STRING. * revert a change * revert a change * Revert: Restore bigframes/dataframe.py to state from 42da847 * remove anywidget from early return, allow execution proceeds to _repr_html_() * remove unnecessary changes * remove redundant code change * code style change * tescase update * revert a change * final touch of notebook * fix presumbit error * remove invlaid test with anywidget bug fix * fix presubmit * fix polar complier * Revert an unnecessary change * apply the workaround to i/O layer * Revert scalar_op_registry.py chnage * remove unnecessary import * Remove duplicate conversation * revert changes to test_dataframe.py * notebook update * call API on local data for complier.py * add more testcase * modfiy polars import * fix failed tests * chore: Migrate minimum_op operator to SQLGlot (#2205) * chore: Migrate round_op operator to SQLGlot (#2204) This commit migrates the `round_op` operator from the Ibis compiler to the SQLGlot compiler. * fix: Improve error handling in blob operations (#2194) * add error handling for audio_transcribe * add error handling for pdf functions * add eror handling for image functions * final touch * restore rename * update notebook to better reflect our new code change * return None on error with verbose=False for image functions * define typing module in udf * only use local variable * Refactor code * refactor: update geo "spec" and split geo ops in ibis compiler (#2208) * feat: support INFORMATION_SCHEMA views in `read_gbq` (#1895) * feat: support INFORMATION_SCHEMA tables in read_gbq * avoid storage semi executor * use faster tables for peek tests * more tests * fix mypy * Update bigframes/session/_io/bigquery/read_gbq_table.py * immediately query for information_schema tables * Fix mypy errors and temporarily update python version * snapshot * snapshot again * Revert: Unwanted code changes * Revert "Revert: Unwanted code changes" This reverts commit db5d8ea. * revert 1 files to match main branch * Correctly display DataFrames with JSON columns in anywidget * add mis-deleted comment back * revert unnecessary change * move helper function to dtypes.py * revert unnecessary testcase change * Improve JSON type handling for to_gbq and to_pandas_batches * Remove unnecessary comment * Revert bigframes/dtypes.py and mypy.ini to main branch version --------- Co-authored-by: jialuoo <[email protected]> Co-authored-by: Tim Sweña (Swast) <[email protected]>
1 parent 196f6df commit 5663d2a

File tree

1 file changed

+133
-13
lines changed

1 file changed

+133
-13
lines changed

notebooks/dataframes/anywidget_mode.ipynb

Lines changed: 133 additions & 13 deletions
Original file line numberDiff line numberDiff line change
@@ -35,7 +35,16 @@
3535
"execution_count": 2,
3636
"id": "ca22f059",
3737
"metadata": {},
38-
"outputs": [],
38+
"outputs": [
39+
{
40+
"name": "stderr",
41+
"output_type": "stream",
42+
"text": [
43+
"/usr/local/google/home/shuowei/src/python-bigquery-dataframes/venv/lib/python3.10/site-packages/google/api_core/_python_version_support.py:266: FutureWarning: You are using a Python version (3.10.15) which Google will stop supporting in new releases of google.api_core once it reaches its end of life (2026-10-04). Please upgrade to the latest Python version, or at least Python 3.11, to continue receiving updates for google.api_core past that date.\n",
44+
" warnings.warn(message, FutureWarning)\n"
45+
]
46+
}
47+
],
3948
"source": [
4049
"import bigframes.pandas as bpd"
4150
]
@@ -142,9 +151,9 @@
142151
{
143152
"data": {
144153
"application/vnd.jupyter.widget-view+json": {
145-
"model_id": "aafd4f912b5f42e0896aa5f0c2c62620",
154+
"model_id": "47795eaa10f149aeb99574232c0936eb",
146155
"version_major": 2,
147-
"version_minor": 0
156+
"version_minor": 1
148157
},
149158
"text/plain": [
150159
"TableWidget(page_size=10, row_count=5552452, table_html='<table border=\"1\" class=\"dataframe table table-stripe…"
@@ -205,16 +214,17 @@
205214
{
206215
"data": {
207216
"application/vnd.jupyter.widget-view+json": {
208-
"model_id": "5ec0ad9f11874d4f9d8edbc903ee7b5d",
217+
"model_id": "8354ce0f82d3495a9b630dfc362f73ee",
209218
"version_major": 2,
210-
"version_minor": 0
219+
"version_minor": 1
211220
},
212221
"text/plain": [
213222
"TableWidget(page_size=10, row_count=5552452, table_html='<table border=\"1\" class=\"dataframe table table-stripe…"
214223
]
215224
},
225+
"execution_count": 7,
216226
"metadata": {},
217-
"output_type": "display_data"
227+
"output_type": "execute_result"
218228
}
219229
],
220230
"source": [
@@ -283,8 +293,27 @@
283293
{
284294
"data": {
285295
"text/html": [
286-
"✅ Completed. \n",
287-
" Query processed 171.4 MB in a moment of slot time.\n",
296+
"\n",
297+
" Query started with request ID bigframes-dev:US.c45952fb-01b4-409c-9da4-f7c5bfc0d47d.<details><summary>SQL</summary><pre>SELECT\n",
298+
"`state` AS `state`,\n",
299+
"`gender` AS `gender`,\n",
300+
"`year` AS `year`,\n",
301+
"`name` AS `name`,\n",
302+
"`number` AS `number`\n",
303+
"FROM\n",
304+
"(SELECT\n",
305+
" *\n",
306+
"FROM (\n",
307+
" SELECT\n",
308+
" `state`,\n",
309+
" `gender`,\n",
310+
" `year`,\n",
311+
" `name`,\n",
312+
" `number`\n",
313+
" FROM `bigquery-public-data.usa_names.usa_1910_2013` FOR SYSTEM_TIME AS OF TIMESTAMP(&#x27;2025-10-30T21:48:48.979701+00:00&#x27;)\n",
314+
") AS `t0`)\n",
315+
"ORDER BY `name` ASC NULLS LAST ,`year` ASC NULLS LAST ,`state` ASC NULLS LAST\n",
316+
"LIMIT 5</pre></details>\n",
288317
" "
289318
],
290319
"text/plain": [
@@ -304,16 +333,17 @@
304333
{
305334
"data": {
306335
"application/vnd.jupyter.widget-view+json": {
307-
"model_id": "651b5aac958c408183775152c2573a03",
336+
"model_id": "59461286a17d4a42b6be6d9d9c7bf7e3",
308337
"version_major": 2,
309-
"version_minor": 0
338+
"version_minor": 1
310339
},
311340
"text/plain": [
312341
"TableWidget(page_size=10, row_count=5, table_html='<table border=\"1\" class=\"dataframe table table-striped tabl…"
313342
]
314343
},
344+
"execution_count": 9,
315345
"metadata": {},
316-
"output_type": "display_data"
346+
"output_type": "execute_result"
317347
}
318348
],
319349
"source": [
@@ -323,11 +353,101 @@
323353
"print(f\"Small dataset pages: {math.ceil(small_widget.row_count / small_widget.page_size)}\")\n",
324354
"small_widget"
325355
]
356+
},
357+
{
358+
"cell_type": "markdown",
359+
"id": "added-cell-2",
360+
"metadata": {},
361+
"source": [
362+
"### Displaying Generative AI results containing JSON\n",
363+
"The `AI.GENERATE` function in BigQuery returns results in a JSON column. While BigQuery's JSON type is not natively supported by the underlying Arrow `to_pandas_batches()` method used in anywidget mode ([Apache Arrow issue #45262](https://github.com/apache/arrow/issues/45262)), BigQuery Dataframes automatically converts JSON columns to strings for display. This allows you to view the results of generative AI functions seamlessly."
364+
]
365+
},
366+
{
367+
"cell_type": "code",
368+
"execution_count": 10,
369+
"id": "added-cell-1",
370+
"metadata": {},
371+
"outputs": [
372+
{
373+
"data": {
374+
"text/html": [
375+
"✅ Completed. \n",
376+
" Query processed 85.9 kB in 14 seconds of slot time.\n",
377+
" "
378+
],
379+
"text/plain": [
380+
"<IPython.core.display.HTML object>"
381+
]
382+
},
383+
"metadata": {},
384+
"output_type": "display_data"
385+
},
386+
{
387+
"name": "stderr",
388+
"output_type": "stream",
389+
"text": [
390+
"/usr/local/google/home/shuowei/src/python-bigquery-dataframes/bigframes/dtypes.py:969: JSONDtypeWarning: JSON columns will be represented as pandas.ArrowDtype(pyarrow.json_())\n",
391+
"instead of using `db_dtypes` in the future when available in pandas\n",
392+
"(https://github.com/pandas-dev/pandas/issues/60958) and pyarrow.\n",
393+
" warnings.warn(msg, bigframes.exceptions.JSONDtypeWarning)\n"
394+
]
395+
},
396+
{
397+
"data": {
398+
"text/html": [
399+
"✅ Completed. "
400+
],
401+
"text/plain": [
402+
"<IPython.core.display.HTML object>"
403+
]
404+
},
405+
"metadata": {},
406+
"output_type": "display_data"
407+
},
408+
{
409+
"data": {
410+
"application/vnd.jupyter.widget-view+json": {
411+
"model_id": "d1794b42579542a8980bd158e521bd3e",
412+
"version_major": 2,
413+
"version_minor": 1
414+
},
415+
"text/plain": [
416+
"TableWidget(page_size=10, row_count=5, table_html='<table border=\"1\" class=\"dataframe table table-striped tabl…"
417+
]
418+
},
419+
"metadata": {},
420+
"output_type": "display_data"
421+
},
422+
{
423+
"data": {
424+
"text/html": [],
425+
"text/plain": [
426+
"Computation deferred. Computation will process 0 Bytes"
427+
]
428+
},
429+
"execution_count": 10,
430+
"metadata": {},
431+
"output_type": "execute_result"
432+
}
433+
],
434+
"source": [
435+
"bpd._read_gbq_colab(\"\"\"\n",
436+
" SELECT\n",
437+
" AI.GENERATE(\n",
438+
" prompt=>(\\\"Extract the values.\\\", OBJ.GET_ACCESS_URL(OBJ.FETCH_METADATA(OBJ.MAKE_REF(gcs_path, \\\"us.conn\\\")), \\\"r\\\")),\n",
439+
" connection_id=>\\\"bigframes-dev.us.bigframes-default-connection\\\",\n",
440+
" output_schema=>\\\"publication_date string, class_international string, application_number string, filing_date string\\\") AS result,\n",
441+
" *\n",
442+
" FROM `bigquery-public-data.labeled_patents.extracted_data`\n",
443+
" LIMIT 5;\n",
444+
"\"\"\")"
445+
]
326446
}
327447
],
328448
"metadata": {
329449
"kernelspec": {
330-
"display_name": "3.10.18",
450+
"display_name": "venv",
331451
"language": "python",
332452
"name": "python3"
333453
},
@@ -341,7 +461,7 @@
341461
"name": "python",
342462
"nbconvert_exporter": "python",
343463
"pygments_lexer": "ipython3",
344-
"version": "3.10.18"
464+
"version": "3.10.15"
345465
}
346466
},
347467
"nbformat": 4,

0 commit comments

Comments
 (0)