-
Notifications
You must be signed in to change notification settings - Fork 62
fix: Improve error handling in blob operations #2194
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Conversation
59bd020 to
ff8c6b7
Compare
|
Check out this pull request on See visual diffs & provide feedback on Jupyter Notebooks. Powered by ReviewNB |
|
This is not a |
bigframes/blob/_functions.py
Outdated
| # The calling function expects a json string that can be parsed as a blob ref | ||
| # Return a valid blob ref json string with empty values. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Can we update the calling function to be more robust so you don't have to fake this?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I will return None on error with verbose=False, update type hints and adjust _output_bq_type to reflect this change.
bigframes/blob/_functions.py
Outdated
| # The calling function expects a json string that can be parsed as a blob ref | ||
| # Return a valid blob ref json string with empty values. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Same here. This seems to indicate a problem in the calling function.
446440d to
526ab6e
Compare
| try: | ||
| json.dumps(value) | ||
| exif_dict[tag_name] = value | ||
| except (TypeError, ValueError): | ||
| exif_dict[tag_name] = str(value) | ||
|
|
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I would refactor this code block to a separate function. Maybe:
def _serialize(value):
try:
return json.dumps(value)
except (...):
return str(value)
Then here you can just write:
exif_dict[tag_name] = serialize(value)
Pros: less code indentation, better readability.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I appreciate the suggestion for the _serialize helper function to improve readability and reduce indentation. However, this specific refactoring cannot be applied to the code within bigframes/blob/_functions.py.
The functions in _functions.py are deployed as python UDFs by extracting their source code using inspect.getsource(). This process only captures the function body itself, not any external helper functions. If we were to extract _serialize into a separate function, the remote UDF execution environment would not have access to it, leading to runtime failures.
Therefore, while the intent to improve readability is valid, this particular approach is not feasible for these UDF function bodies.
| if verbose: | ||
| error_result = { | ||
| "status": f"Error: {type(e).__name__}: {str(e)}", | ||
| "content": "", | ||
| } | ||
| return json.dumps(error_result) | ||
| else: | ||
| return None |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Looks like this code is repeated several times in this change. Maybe define a helper function instead?
bigframes/operations/blob.py
Outdated
| try: | ||
| res = self._df_apply_udf(df, image_normalize_udf) | ||
| except Exception as e: | ||
| raise RuntimeError(f"Image normalize UDF execution failed: {e}") from e | ||
|
|
||
| if res is None: | ||
| raise RuntimeError("Image normalize returned None result") |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
It feels like we can refactor this repeated logic into a separate helper function too:
def _apply_udf_or_raise_error(self, ...):
try:
res = self._df_apply_udf(...)
...
then here you can write:
res = self._apply_udf_or_raise_error(...)
sycai
left a comment
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The blob function setup does not support nested function call at the moment. Refactor is not realistic.
* add error handling for audio_transcribe * add error handling for pdf functions * add eror handling for image functions * final touch * restore rename * update notebook to better reflect our new code change * return None on error with verbose=False for image functions * define typing module in udf * only use local variable * Refactor code
…() (#2138) * change to ai.generate * perf: Default to interactive display for SQL in anywidget mode Previously, SQL queries in anywidget mode would fall back to deferred execution, showing a dry run instead of an interactive table. This change modifies the display logic to directly use the anywidget interactive display for SQL queries, providing a more consistent and responsive user experience. A test case has been added to verify this behavior. * fix: resolve double printing issue in anywidget mode * feat: Add test case for STRUCT column in anywidget Adds a test case to verify that a DataFrame with a STRUCT column is correctly displayed in anywidget mode. This test confirms that displaying a STRUCT column does not raise an exception that would trigger the fallback to the deferred representation. It mocks `IPython.display.display` to capture the `TableWidget` instance and asserts that the rendered HTML contains the expected string representation of the STRUCT data. * fix presubmit * Revert accidental changes to test_function.py * revert accidental change to blob.py * change return type * add todo and revert change * Revert "add todo and revert change" This reverts commit 153e1d2. * Add todo * Fix: Handle JSON dtype in anywidget display This commit fixes an AttributeError that occurred when displaying a DataFrame with a JSON column in anywidget mode. The dtype check was incorrect and has been updated. Additionally, the SQL compilation for casting JSON to string has been corrected to use TO_JSON_STRING. * revert a change * revert a change * Revert: Restore bigframes/dataframe.py to state from 42da847 * remove anywidget from early return, allow execution proceeds to _repr_html_() * remove unnecessary changes * remove redundant code change * code style change * tescase update * revert a change * final touch of notebook * fix presumbit error * remove invlaid test with anywidget bug fix * fix presubmit * fix polar complier * Revert an unnecessary change * apply the workaround to i/O layer * Revert scalar_op_registry.py chnage * remove unnecessary import * Remove duplicate conversation * revert changes to test_dataframe.py * notebook update * call API on local data for complier.py * add more testcase * modfiy polars import * fix failed tests * chore: Migrate minimum_op operator to SQLGlot (#2205) * chore: Migrate round_op operator to SQLGlot (#2204) This commit migrates the `round_op` operator from the Ibis compiler to the SQLGlot compiler. * fix: Improve error handling in blob operations (#2194) * add error handling for audio_transcribe * add error handling for pdf functions * add eror handling for image functions * final touch * restore rename * update notebook to better reflect our new code change * return None on error with verbose=False for image functions * define typing module in udf * only use local variable * Refactor code * refactor: update geo "spec" and split geo ops in ibis compiler (#2208) * feat: support INFORMATION_SCHEMA views in `read_gbq` (#1895) * feat: support INFORMATION_SCHEMA tables in read_gbq * avoid storage semi executor * use faster tables for peek tests * more tests * fix mypy * Update bigframes/session/_io/bigquery/read_gbq_table.py * immediately query for information_schema tables * Fix mypy errors and temporarily update python version * snapshot * snapshot again * Revert: Unwanted code changes * Revert "Revert: Unwanted code changes" This reverts commit db5d8ea. * revert 1 files to match main branch * Correctly display DataFrames with JSON columns in anywidget * add mis-deleted comment back * revert unnecessary change * move helper function to dtypes.py * revert unnecessary testcase change * Improve JSON type handling for to_gbq and to_pandas_batches * Remove unnecessary comment * Revert bigframes/dtypes.py and mypy.ini to main branch version --------- Co-authored-by: jialuoo <[email protected]> Co-authored-by: Tim Sweña (Swast) <[email protected]>
This pull request refactors the blob processing functionalities to introduce comprehensive, two-layer error handling, enhancing robustness and providing clearer feedback on failures.
Key Changes:
Remote Function Error Handling (
bigframes/blob/_functions.py):Caller-Side Error Handling (
bigframes/operations/blob.py):This two-layer approach ensures that errors are handled gracefully at both the remote execution layer and the user-facing API layer, making the blob operations more resilient and easier to debug.
Fixes #<454752361> 🦕