fix: Improve error handling in blob operations #2194

shuoweil · 2025-10-24T02:40:30Z

This pull request refactors the blob processing functionalities to introduce comprehensive, two-layer error handling, enhancing robustness and providing clearer feedback on failures.

Key Changes:

Remote Function Error Handling (bigframes/blob/_functions.py):
- Wrapped the core logic of all image and PDF processing remote functions (exif_func, image_blur_func, image_resize_func, image_normalize_func, and their _to_bytes variants) in try...excet blocks.
- These functions now catch all exceptions and return a structured JSON response containing a status field with the error message, preventing remote function crashes.
- Added validation for image decoding and encoding steps to gracefully handle corrupted or unsupported file formats.
Caller-Side Error Handling (bigframes/operations/blob.py):
- Updated the public-facing blob methods (exif, image_blur, image_resize, image_normalize, pdf_extract, pdf_chunk) to handle potential failures from the remote UDFs.
- Added try...except blocks around the UDF execution calls (_df_apply_udf) to catch and re-raise exceptions with more contex.
- Implemented None checks to ensure that UDFs do not return empty results without raising an error.
- Corrected JSON parsing logic to properly handle both verbose=True (struct-like JSON) and verbose=False (raw content) responses, fixing TypeError issues when accessing fields.
- Updated all relevant docstrings to include a Raises section, documenting the RuntimeError that can now be expected on processig failures.

This two-layer approach ensures that errors are handled gracefully at both the remote execution layer and the user-facing API layer, making the blob operations more resilient and easier to debug.

Fixes #<454752361> 🦕

review-notebook-app · 2025-10-24T18:44:58Z

Check out this pull request on

See visual diffs & provide feedback on Jupyter Notebooks.

Powered by ReviewNB

tswast · 2025-10-27T18:25:48Z

This is not a refactor: because it is changing user-visible behavior. fix: would be more appropriate, IMO.

tswast · 2025-10-27T18:31:26Z

bigframes/blob/_functions.py

+            # The calling function expects a json string that can be parsed as a blob ref
+            # Return a valid blob ref json string with empty values.


Can we update the calling function to be more robust so you don't have to fake this?

I will return None on error with verbose=False, update type hints and adjust _output_bq_type to reflect this change.

tswast · 2025-10-27T18:32:00Z

bigframes/blob/_functions.py

+            # The calling function expects a json string that can be parsed as a blob ref
+            # Return a valid blob ref json string with empty values.


Same here. This seems to indicate a problem in the calling function.

sycai · 2025-10-28T00:48:10Z

bigframes/blob/_functions.py

+                try:
+                    json.dumps(value)
+                    exif_dict[tag_name] = value
+                except (TypeError, ValueError):
+                    exif_dict[tag_name] = str(value)
+


I would refactor this code block to a separate function. Maybe:

def _serialize(value): try: return json.dumps(value) except (...): return str(value)

Then here you can just write:

exif_dict[tag_name] = serialize(value)

Pros: less code indentation, better readability.

I appreciate the suggestion for the _serialize helper function to improve readability and reduce indentation. However, this specific refactoring cannot be applied to the code within bigframes/blob/_functions.py.

The functions in _functions.py are deployed as python UDFs by extracting their source code using inspect.getsource(). This process only captures the function body itself, not any external helper functions. If we were to extract _serialize into a separate function, the remote UDF execution environment would not have access to it, leading to runtime failures.

Therefore, while the intent to improve readability is valid, this particular approach is not feasible for these UDF function bodies.

sycai · 2025-10-28T00:50:16Z

bigframes/blob/_functions.py

+        if verbose:
+            error_result = {
+                "status": f"Error: {type(e).__name__}: {str(e)}",
+                "content": "",
+            }
+            return json.dumps(error_result)
+        else:
+            return None


Looks like this code is repeated several times in this change. Maybe define a helper function instead?

sycai · 2025-10-28T00:53:41Z

bigframes/operations/blob.py

+        try:
+            res = self._df_apply_udf(df, image_normalize_udf)
+        except Exception as e:
+            raise RuntimeError(f"Image normalize UDF execution failed: {e}") from e
+
+        if res is None:
+            raise RuntimeError("Image normalize returned None result")


It feels like we can refactor this repeated logic into a separate helper function too:

def _apply_udf_or_raise_error(self, ...): try: res = self._df_apply_udf(...) ...

then here you can write:

res = self._apply_udf_or_raise_error(...)

sycai

The blob function setup does not support nested function call at the moment. Refactor is not realistic.

* add error handling for audio_transcribe * add error handling for pdf functions * add eror handling for image functions * final touch * restore rename * update notebook to better reflect our new code change * return None on error with verbose=False for image functions * define typing module in udf * only use local variable * Refactor code

…() (#2138) * change to ai.generate * perf: Default to interactive display for SQL in anywidget mode Previously, SQL queries in anywidget mode would fall back to deferred execution, showing a dry run instead of an interactive table. This change modifies the display logic to directly use the anywidget interactive display for SQL queries, providing a more consistent and responsive user experience. A test case has been added to verify this behavior. * fix: resolve double printing issue in anywidget mode * feat: Add test case for STRUCT column in anywidget Adds a test case to verify that a DataFrame with a STRUCT column is correctly displayed in anywidget mode. This test confirms that displaying a STRUCT column does not raise an exception that would trigger the fallback to the deferred representation. It mocks `IPython.display.display` to capture the `TableWidget` instance and asserts that the rendered HTML contains the expected string representation of the STRUCT data. * fix presubmit * Revert accidental changes to test_function.py * revert accidental change to blob.py * change return type * add todo and revert change * Revert "add todo and revert change" This reverts commit 153e1d2. * Add todo * Fix: Handle JSON dtype in anywidget display This commit fixes an AttributeError that occurred when displaying a DataFrame with a JSON column in anywidget mode. The dtype check was incorrect and has been updated. Additionally, the SQL compilation for casting JSON to string has been corrected to use TO_JSON_STRING. * revert a change * revert a change * Revert: Restore bigframes/dataframe.py to state from 42da847 * remove anywidget from early return, allow execution proceeds to _repr_html_() * remove unnecessary changes * remove redundant code change * code style change * tescase update * revert a change * final touch of notebook * fix presumbit error * remove invlaid test with anywidget bug fix * fix presubmit * fix polar complier * Revert an unnecessary change * apply the workaround to i/O layer * Revert scalar_op_registry.py chnage * remove unnecessary import * Remove duplicate conversation * revert changes to test_dataframe.py * notebook update * call API on local data for complier.py * add more testcase * modfiy polars import * fix failed tests * chore: Migrate minimum_op operator to SQLGlot (#2205) * chore: Migrate round_op operator to SQLGlot (#2204) This commit migrates the `round_op` operator from the Ibis compiler to the SQLGlot compiler. * fix: Improve error handling in blob operations (#2194) * add error handling for audio_transcribe * add error handling for pdf functions * add eror handling for image functions * final touch * restore rename * update notebook to better reflect our new code change * return None on error with verbose=False for image functions * define typing module in udf * only use local variable * Refactor code * refactor: update geo "spec" and split geo ops in ibis compiler (#2208) * feat: support INFORMATION_SCHEMA views in `read_gbq` (#1895) * feat: support INFORMATION_SCHEMA tables in read_gbq * avoid storage semi executor * use faster tables for peek tests * more tests * fix mypy * Update bigframes/session/_io/bigquery/read_gbq_table.py * immediately query for information_schema tables * Fix mypy errors and temporarily update python version * snapshot * snapshot again * Revert: Unwanted code changes * Revert "Revert: Unwanted code changes" This reverts commit db5d8ea. * revert 1 files to match main branch * Correctly display DataFrames with JSON columns in anywidget * add mis-deleted comment back * revert unnecessary change * move helper function to dtypes.py * revert unnecessary testcase change * Improve JSON type handling for to_gbq and to_pandas_batches * Remove unnecessary comment * Revert bigframes/dtypes.py and mypy.ini to main branch version --------- Co-authored-by: jialuoo <[email protected]> Co-authored-by: Tim Sweña (Swast) <[email protected]>

shuoweil requested review from a team as code owners October 24, 2025 02:40

shuoweil requested a review from chelsea-lin October 24, 2025 02:40

blunderbuss-gcf bot assigned drylks-work Oct 24, 2025

product-auto-label bot added size: l Pull request size is large. api: bigquery Issues related to the googleapis/python-bigquery-dataframes API. labels Oct 24, 2025

shuoweil requested review from sycai and tswast and removed request for chelsea-lin October 24, 2025 02:40

shuoweil assigned shuoweil and unassigned drylks-work Oct 24, 2025

shuoweil removed request for sycai and tswast October 24, 2025 17:10

shuoweil marked this pull request as draft October 24, 2025 17:10

shuoweil force-pushed the shuowei-blob-null branch from 59bd020 to ff8c6b7 Compare October 24, 2025 18:44

shuoweil requested review from sycai and tswast October 24, 2025 19:00

shuoweil marked this pull request as ready for review October 24, 2025 19:00

tswast added the owlbot:run Add this label to trigger the Owlbot post processor. label Oct 27, 2025

gcf-owl-bot bot removed the owlbot:run Add this label to trigger the Owlbot post processor. label Oct 27, 2025

tswast changed the title ~~refactor: Improve error handling in blob operations~~ fix: Improve error handling in blob operations Oct 27, 2025

tswast reviewed Oct 27, 2025

View reviewed changes

shuoweil added 6 commits October 27, 2025 21:17

add error handling for audio_transcribe

fe774c2

add error handling for pdf functions

8b31c8a

add eror handling for image functions

6a37c18

final touch

2d66362

restore rename

052bf81

update notebook to better reflect our new code change

313f04a

shuoweil added 2 commits October 27, 2025 21:17

return None on error with verbose=False for image functions

639c0a5

define typing module in udf

526ab6e

shuoweil force-pushed the shuowei-blob-null branch from 446440d to 526ab6e Compare October 27, 2025 21:17

only use local variable

ebc9aee

shuoweil requested a review from tswast October 27, 2025 21:28

sycai reviewed Oct 28, 2025

View reviewed changes

shuoweil added 2 commits October 29, 2025 05:49

Merge branch 'main' into shuowei-blob-null

810122a

Refactor code

f14cf55

shuoweil requested a review from sycai October 29, 2025 06:48

sycai approved these changes Oct 29, 2025

View reviewed changes

shuoweil merged commit d410046 into main Oct 29, 2025
24 of 25 checks passed

shuoweil deleted the shuowei-blob-null branch October 29, 2025 19:03

release-please bot mentioned this pull request Oct 29, 2025

chore(main): release 2.28.0 #2199

Merged

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

fix: Improve error handling in blob operations #2194

fix: Improve error handling in blob operations #2194

Uh oh!

shuoweil commented Oct 24, 2025

Uh oh!

review-notebook-app bot commented Oct 24, 2025

Uh oh!

tswast commented Oct 27, 2025

Uh oh!

tswast Oct 27, 2025

Uh oh!

shuoweil Oct 27, 2025

Uh oh!

tswast Oct 27, 2025

Uh oh!

sycai Oct 28, 2025

Uh oh!

shuoweil Oct 29, 2025

Uh oh!

sycai Oct 28, 2025

Uh oh!

sycai Oct 28, 2025

Uh oh!

sycai left a comment

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

5 participants

		# The calling function expects a json string that can be parsed as a blob ref
		# Return a valid blob ref json string with empty values.

fix: Improve error handling in blob operations #2194

fix: Improve error handling in blob operations #2194

Uh oh!

Conversation

shuoweil commented Oct 24, 2025

Uh oh!

review-notebook-app bot commented Oct 24, 2025

Uh oh!

tswast commented Oct 27, 2025

Uh oh!

tswast Oct 27, 2025

Choose a reason for hiding this comment

Uh oh!

shuoweil Oct 27, 2025

Choose a reason for hiding this comment

Uh oh!

tswast Oct 27, 2025

Choose a reason for hiding this comment

Uh oh!

sycai Oct 28, 2025

Choose a reason for hiding this comment

Uh oh!

shuoweil Oct 29, 2025

Choose a reason for hiding this comment

Uh oh!

sycai Oct 28, 2025

Choose a reason for hiding this comment

Uh oh!

sycai Oct 28, 2025

Choose a reason for hiding this comment

Uh oh!

sycai left a comment

Choose a reason for hiding this comment

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

5 participants