Skip to content

Conversation

@shuoweil
Copy link
Contributor

This pull request refactors the blob processing functionalities to introduce comprehensive, two-layer error handling, enhancing robustness and providing clearer feedback on failures.

Key Changes:

  1. Remote Function Error Handling (bigframes/blob/_functions.py):

    • Wrapped the core logic of all image and PDF processing remote functions (exif_func, image_blur_func, image_resize_func, image_normalize_func, and their _to_bytes variants) in try...excet blocks.
    • These functions now catch all exceptions and return a structured JSON response containing a status field with the error message, preventing remote function crashes.
    • Added validation for image decoding and encoding steps to gracefully handle corrupted or unsupported file formats.
  2. Caller-Side Error Handling (bigframes/operations/blob.py):

    • Updated the public-facing blob methods (exif, image_blur, image_resize, image_normalize, pdf_extract, pdf_chunk) to handle potential failures from the remote UDFs.
    • Added try...except blocks around the UDF execution calls (_df_apply_udf) to catch and re-raise exceptions with more contex.
    • Implemented None checks to ensure that UDFs do not return empty results without raising an error.
    • Corrected JSON parsing logic to properly handle both verbose=True (struct-like JSON) and verbose=False (raw content) responses, fixing TypeError issues when accessing fields.
    • Updated all relevant docstrings to include a Raises section, documenting the RuntimeError that can now be expected on processig failures.

This two-layer approach ensures that errors are handled gracefully at both the remote execution layer and the user-facing API layer, making the blob operations more resilient and easier to debug.

Fixes #<454752361> 🦕

@shuoweil shuoweil requested review from a team as code owners October 24, 2025 02:40
@shuoweil shuoweil requested a review from chelsea-lin October 24, 2025 02:40
@product-auto-label product-auto-label bot added size: l Pull request size is large. api: bigquery Issues related to the googleapis/python-bigquery-dataframes API. labels Oct 24, 2025
@shuoweil shuoweil requested review from sycai and tswast and removed request for chelsea-lin October 24, 2025 02:40
@shuoweil shuoweil assigned shuoweil and unassigned drylks-work Oct 24, 2025
@shuoweil shuoweil removed request for sycai and tswast October 24, 2025 17:10
@shuoweil shuoweil marked this pull request as draft October 24, 2025 17:10
@review-notebook-app
Copy link

Check out this pull request on  ReviewNB

See visual diffs & provide feedback on Jupyter Notebooks.


Powered by ReviewNB

@shuoweil shuoweil requested review from sycai and tswast October 24, 2025 19:00
@shuoweil shuoweil marked this pull request as ready for review October 24, 2025 19:00
@tswast tswast added the owlbot:run Add this label to trigger the Owlbot post processor. label Oct 27, 2025
@gcf-owl-bot gcf-owl-bot bot removed the owlbot:run Add this label to trigger the Owlbot post processor. label Oct 27, 2025
@tswast tswast changed the title refactor: Improve error handling in blob operations fix: Improve error handling in blob operations Oct 27, 2025
@tswast
Copy link
Collaborator

tswast commented Oct 27, 2025

This is not a refactor: because it is changing user-visible behavior. fix: would be more appropriate, IMO.

Comment on lines 382 to 383
# The calling function expects a json string that can be parsed as a blob ref
# Return a valid blob ref json string with empty values.
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Can we update the calling function to be more robust so you don't have to fake this?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I will return None on error with verbose=False, update type hints and adjust _output_bq_type to reflect this change.

Comment on lines 536 to 537
# The calling function expects a json string that can be parsed as a blob ref
# Return a valid blob ref json string with empty values.
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Same here. This seems to indicate a problem in the calling function.

@shuoweil shuoweil requested a review from tswast October 27, 2025 21:28
Comment on lines +156 to +161
try:
json.dumps(value)
exif_dict[tag_name] = value
except (TypeError, ValueError):
exif_dict[tag_name] = str(value)

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I would refactor this code block to a separate function. Maybe:

def _serialize(value):
  try:
     return json.dumps(value)
  except (...):
     return str(value)

Then here you can just write:

exif_dict[tag_name] = serialize(value)

Pros: less code indentation, better readability.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I appreciate the suggestion for the _serialize helper function to improve readability and reduce indentation. However, this specific refactoring cannot be applied to the code within bigframes/blob/_functions.py.

The functions in _functions.py are deployed as python UDFs by extracting their source code using inspect.getsource(). This process only captures the function body itself, not any external helper functions. If we were to extract _serialize into a separate function, the remote UDF execution environment would not have access to it, leading to runtime failures.

Therefore, while the intent to improve readability is valid, this particular approach is not feasible for these UDF function bodies.

Comment on lines +246 to +253
if verbose:
error_result = {
"status": f"Error: {type(e).__name__}: {str(e)}",
"content": "",
}
return json.dumps(error_result)
else:
return None
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Looks like this code is repeated several times in this change. Maybe define a helper function instead?

Comment on lines 819 to 825
try:
res = self._df_apply_udf(df, image_normalize_udf)
except Exception as e:
raise RuntimeError(f"Image normalize UDF execution failed: {e}") from e

if res is None:
raise RuntimeError("Image normalize returned None result")
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It feels like we can refactor this repeated logic into a separate helper function too:

def _apply_udf_or_raise_error(self, ...):
    try:
        res = self._df_apply_udf(...)
    ...

then here you can write:

res = self._apply_udf_or_raise_error(...)

@shuoweil shuoweil requested a review from sycai October 29, 2025 06:48
Copy link
Contributor

@sycai sycai left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The blob function setup does not support nested function call at the moment. Refactor is not realistic.

@shuoweil shuoweil merged commit d410046 into main Oct 29, 2025
24 of 25 checks passed
@shuoweil shuoweil deleted the shuowei-blob-null branch October 29, 2025 19:03
shuoweil added a commit that referenced this pull request Oct 30, 2025
* add error handling for audio_transcribe

* add error handling for pdf functions

* add eror handling for image functions

* final touch

* restore rename

* update notebook to better reflect our new code change

* return None on error with verbose=False for image functions

* define typing module in udf

* only use local variable

* Refactor code
tswast added a commit that referenced this pull request Nov 4, 2025
…() (#2138)

* change to ai.generate

* perf: Default to interactive display for SQL in anywidget mode

Previously, SQL queries in anywidget mode would fall back to deferred execution, showing a dry run instead of an interactive table.

This change modifies the display logic to directly use the anywidget interactive display for SQL queries, providing a more consistent and responsive user experience. A test case has been added to verify this behavior.

* fix: resolve double printing issue in anywidget mode

* feat: Add test case for STRUCT column in anywidget

Adds a test case to verify that a DataFrame with a STRUCT column is
correctly displayed in anywidget mode.

This test confirms that displaying a STRUCT column does not raise an
exception that would trigger the fallback to the deferred representation.
It mocks `IPython.display.display` to capture the `TableWidget` instance
and asserts that the rendered HTML contains the expected string
representation of the STRUCT data.

* fix presubmit

* Revert accidental changes to test_function.py

* revert accidental change to blob.py

* change return type

* add todo and revert change

* Revert "add todo and revert change"

This reverts commit 153e1d2.

* Add todo

* Fix: Handle JSON dtype in anywidget display

This commit fixes an AttributeError that occurred when displaying a
DataFrame with a JSON column in anywidget mode. The dtype check
was incorrect and has been updated. Additionally, the SQL compilation
for casting JSON to string has been corrected to use TO_JSON_STRING.

* revert a change

* revert a change

* Revert: Restore bigframes/dataframe.py to state from 42da847

* remove anywidget from early return, allow execution proceeds to _repr_html_()

* remove unnecessary changes

* remove redundant code change

* code style change

* tescase update

* revert a change

* final touch of notebook

* fix presumbit error

* remove invlaid test with anywidget bug fix

* fix presubmit

* fix polar complier

* Revert an unnecessary change

* apply the workaround to i/O layer

* Revert scalar_op_registry.py chnage

* remove unnecessary import

* Remove duplicate conversation

* revert changes to test_dataframe.py

* notebook update

* call API on local data for complier.py

* add more testcase

* modfiy polars import

* fix failed tests

* chore: Migrate minimum_op operator to SQLGlot (#2205)

* chore: Migrate round_op operator to SQLGlot (#2204)

This commit migrates the `round_op` operator from the Ibis compiler to the SQLGlot compiler.

* fix: Improve error handling in blob operations (#2194)

* add error handling for audio_transcribe

* add error handling for pdf functions

* add eror handling for image functions

* final touch

* restore rename

* update notebook to better reflect our new code change

* return None on error with verbose=False for image functions

* define typing module in udf

* only use local variable

* Refactor code

* refactor: update geo "spec" and split geo ops in ibis compiler (#2208)

* feat: support INFORMATION_SCHEMA views in `read_gbq` (#1895)

* feat: support INFORMATION_SCHEMA tables in read_gbq

* avoid storage semi executor

* use faster tables for peek tests

* more tests

* fix mypy

* Update bigframes/session/_io/bigquery/read_gbq_table.py

* immediately query for information_schema tables

* Fix mypy errors and temporarily update python version

* snapshot

* snapshot again

* Revert: Unwanted code changes

* Revert "Revert: Unwanted code changes"

This reverts commit db5d8ea.

* revert 1 files to match main branch

* Correctly display DataFrames with JSON columns in anywidget

* add mis-deleted comment back

* revert unnecessary change

* move helper function to dtypes.py

* revert unnecessary testcase change

* Improve JSON type handling for to_gbq and to_pandas_batches

* Remove unnecessary comment

* Revert bigframes/dtypes.py and mypy.ini to main branch version

---------

Co-authored-by: jialuoo <[email protected]>
Co-authored-by: Tim Sweña (Swast) <[email protected]>
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

api: bigquery Issues related to the googleapis/python-bigquery-dataframes API. size: l Pull request size is large.

Projects

None yet

Development

Successfully merging this pull request may close these issues.

5 participants