Skip to content

Conversation

holtskinner
Copy link
Collaborator

No description provided.

@holtskinner holtskinner requested a review from a team as a code owner May 30, 2025 18:40
Copy link
Contributor

@gemini-code-assist gemini-code-assist bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Hello @holtskinner, I'm Gemini Code Assist1! I'm currently reviewing this pull request and will post my feedback shortly. In the meantime, here's a summary to help you and other reviewers quickly get up to speed!

Summary of Changes

Hello team,

Gemini here, providing a summary of this pull request authored by @holtskinner. The primary goal of this PR, as indicated by the title, is to enhance the notebook testing script by enabling parallel execution of notebooks. Previously, the script processed notebooks sequentially, launching one, waiting for it to complete, and then moving to the next. This change refactors the script to launch all notebook executions concurrently and then monitor their status until all are finished. This approach should significantly reduce the overall execution time for the notebook test suite.

The changes involve a substantial rewrite of the main execution logic in the shell script, introducing functions to manage the launching, monitoring, and reporting phases of the test run.

Highlights

  • Parallel Execution: The core change is the transition from sequential to parallel execution of notebooks. The script now launches all notebook tests using gcloud colab executions create without waiting for each to finish.
  • Refactored Script Structure: The script logic has been broken down into three distinct functions: launch_notebook (to start executions), monitor_executions (to track their status), and publish_results (to report the final outcome via Pub/Sub).
  • Concurrent Monitoring: A new monitoring loop (monitor_executions) is introduced that periodically checks the status of all pending notebook executions until they have all completed (either successfully or failed).
  • Simplified Data Handling: The script now uses readarray to read the list of notebooks and < for reading file contents into variables, which are generally more robust methods in Bash.
  • Improved Error Handling: The set -e command has been added at the beginning of the script to ensure that the script exits immediately if any command fails.

Changelog

  • .cloud-build/executor/test_notebooks.sh
    • Removed the sequential loop for executing notebooks.
    • Introduced launch_notebook, monitor_executions, and publish_results functions.
    • Implemented parallel launching of notebook executions using gcloud colab executions create.
    • Added a monitoring loop to check the status of all running executions concurrently.
    • Used readarray -t to read notebooks into an array.
    • Used < for reading file contents into variables.
    • Added set -e for stricter error handling.
    • Removed --verbosity=debug flag from gcloud colab executions create.
    • Simplified the logic for collecting results and publishing the Pub/Sub message.
Using Gemini Code Assist

The full guide for Gemini Code Assist can be found on our documentation page, here are some quick tips.

Invoking Gemini

You can request assistance from Gemini at any point in your pull request via creating an issue comment (i.e. comment on the pull request page) using either /gemini <command> or @gemini-code-assist <command>. Below is a summary of the supported commands.

Feature Command Description
Code Review /gemini review Performs a code review for the current pull request in its current state.
Pull Request Summary /gemini summary Provides a summary of the current pull request in its current state.
Comment @gemini-code-assist Responds in comments when explicitly tagged, both in issue comments and review comments.
Help /gemini help Displays a list of available commands.

Customization

To customize Gemini Code Assist for GitHub experience, repository maintainers can create a configuration file and/or provide a custom code review style guide (such as PEP-8 for Python) by creating and adding files to a .gemini/ folder in the base of the repository. Detailed instructions can be found here.

Limitations & Feedback

Gemini Code Assist is currently in preview and may make mistakes. Please leave feedback on any instances where its feedback is incorrect or counter productive. You can react with 👍 and 👎 on @gemini-code-assist comments to provide feedback.

You can also get AI-powered code generation, chat, as well as code reviews directly in the IDE at no cost with the Gemini Code Assist IDE Extension.


Scripts run side-by-side,
No longer must they hide,
Faster tests now fly,
Beneath the cloud-build sky,
Efficiency is our guide.

Footnotes

  1. Review the Privacy Notices, Generative AI Prohibited Use Policy, Terms of Service, and learn how to configure Gemini Code Assist in GitHub here. Gemini can make mistakes, so double check it and use code with caution.

Copy link
Contributor

@gemini-code-assist gemini-code-assist bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Code Review

This pull request introduces parallel processing to the notebook testing script, a significant improvement for efficiency. The script is now more modular with the use of functions. While the core logic for parallelization is well-implemented, there are a few areas related to error handling and JSON report formatting that should be addressed to ensure robustness and correctness.

Summary of Findings

  • JSON Formatting for Notebook Lists: The failed_notebooks and successful_notebooks fields in the Pub/Sub JSON message are formatted as single comma-separated strings instead of proper JSON arrays of strings. This should be corrected to ensure the consumer can parse these fields as lists (e.g., ["item1", "item2"] or []).
  • Error Handling for gcloud describe: The script currently suppresses errors from gcloud colab executions describe and treats any failure of this command as if the notebook job itself failed. This can mask underlying infrastructure or gcloud command issues. Error handling should be improved to distinguish these cases.
  • Robustness of Operation ID Usage: Accessing operation IDs from the operation_map should include a check to ensure an ID is actually found before attempting to use it with gcloud commands, to prevent errors if a path is unexpectedly missing from the map.
  • Verbosity of gcloud create command: The --verbosity=debug flag was removed from the gcloud colab executions create command. Re-adding it could aid in debugging notebook launch issues.

Merge Readiness

The pull request introduces valuable parallelization to the notebook testing script. However, there are a few high and medium severity issues related to error handling and JSON reporting that should be addressed before merging to ensure the script's robustness and the correctness of its output. Specifically, the JSON formatting for lists of notebooks and the error handling for gcloud commands need attention. I am unable to approve pull requests directly, but I recommend making these changes for a more reliable script. Other reviewers should then perform a final check.

for path in "${pending[@]}"; do
local id="${operation_map["$path"]}"

local status=$(gcloud colab executions describe "$id" --region="$REGION" --format="value(jobState)" 2>/dev/null || echo "JOB_STATE_FAILED")
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

high

The current error handling for gcloud colab executions describe might mask the true cause of failures and lead to misreporting.
Specifically, 2>/dev/null || echo "JOB_STATE_FAILED" means:

  1. Any error output from gcloud describe (e.g., network issues, permission problems, invalid ID format before API query) is suppressed (2>/dev/null).
  2. If the gcloud describe command exits with a non-zero status for any reason, it's unconditionally treated as if the notebook job itself JOB_STATE_FAILED.

This prevents distinguishing between an actual failed notebook execution and an issue with the gcloud command or environment (e.g., transient network error, quota issue for describe calls).

Consider explicitly checking the gcloud command's exit status and logging its output/error for better diagnostics.

Suggested change
local status=$(gcloud colab executions describe "$id" --region="$REGION" --format="value(jobState)" 2>/dev/null || echo "JOB_STATE_FAILED")
local gcloud_status_cmd_output
# Try to get the job state. If gcloud command fails, log its output and set a failure status.
if ! gcloud_status_cmd_output=$(gcloud colab executions describe "$id" --region="$REGION" --format="value(jobState)" 2>&1); then
echo "Error: Failed to retrieve status for notebook '$path' (ID: $id). Marking as failed. gcloud output: $gcloud_status_cmd_output"
status="JOB_STATE_FAILED_OR_ERROR" # Use a distinct status or JOB_STATE_FAILED
else
status="$gcloud_status_cmd_output"
fi

local fail_list=$(IFS=, ; echo "${completed_failure[*]}")
local pass_list=$(IFS=, ; echo "${completed_success[*]}")

local message="{\"total_count\":$total,\"failed_count\":$failed,\"failed_notebooks\":\"$fail_list\",\"successful_notebooks\":\"$pass_list\",\"successful_count\":$passed,\"execution_date\":\"$TIMESTAMP\"}"
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

high

The current construction of the failed_notebooks and successful_notebooks fields in the JSON message (line 85) does not produce a valid JSON array of strings.

Variables $fail_list and $pass_list (from lines 82-83) are simple comma-separated strings. For example, if completed_failure contains (nb1.ipynb nb2.ipynb), then $fail_list becomes nb1.ipynb,nb2.ipynb. The resulting JSON will be "failed_notebooks":"nb1.ipynb,nb2.ipynb", which is a single JSON string, not an array.

A correct JSON array of strings should look like "failed_notebooks":["nb1.ipynb","nb2.ipynb"], or "failed_notebooks":[] if the list is empty. This discrepancy can lead to parsing errors or incorrect data interpretation by the Pub/Sub message consumer.

To fix this, you'll need to modify how the list of failed and successful notebooks are formatted into JSON strings. This typically involves iterating through the completed_failure and completed_success arrays, quoting each item, separating them with commas, and enclosing the result in square brackets (e.g., ["item1","item2"]). Special care is needed for empty arrays (resulting in []) and proper escaping of characters within notebook names if they could conflict with JSON syntax (though unlikely for typical paths).

Once you have variables, say fail_list_json_array and pass_list_json_array, containing these correctly formatted JSON array strings, this line should be updated to use them without the surrounding quotes.

Suggested change
local message="{\"total_count\":$total,\"failed_count\":$failed,\"failed_notebooks\":\"$fail_list\",\"successful_notebooks\":\"$pass_list\",\"successful_count\":$passed,\"execution_date\":\"$TIMESTAMP\"}"
local message="{\"total_count\":$total,\"failed_count\":$failed,\"failed_notebooks\":$fail_list_json_array,\"successful_notebooks\":$pass_list_json_array,\"successful_count\":$passed,\"execution_date\":\"$TIMESTAMP\"}"

Comment on lines 35 to 48
--execution-timeout="1h30m" \
--format="value(name)")
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

medium

The --verbosity=debug flag was previously used with the gcloud colab executions create command (line 40 in the old code) but has been removed in this version. Was this removal intentional?

Retaining this flag can be very helpful for diagnosing issues that might occur during the notebook launch phase, providing more detailed logs from gcloud.

If its removal wasn't a specific requirement, consider re-adding it.

Suggested change
--execution-timeout="1h30m" \
--format="value(name)")
--execution-timeout="1h30m" \
--verbosity=debug \
--format="value(name)")

local still_pending=()

for path in "${pending[@]}"; do
local id="${operation_map["$path"]}"
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

medium

There's a potential issue if operation_map["$path"] does not yield a valid operation ID. If $id is empty or unset (e.g., if launch_notebook failed to populate operation_map for some reason before set -e could act, or if $path is unexpectedly not in the map), the subsequent gcloud colab executions describe "$id" ... command on line 52 would effectively become gcloud colab executions describe "" ..., which will fail.

It would be more robust to check if $id is non-empty before attempting to use it.

Suggested change
local id="${operation_map["$path"]}"
local id="${operation_map["$path"]}"
if [[ -z "$id" ]]; then
echo "Error: Could not find operation ID for path '$path'. Marking as failed and skipping status check."
completed_failure+=("$path (Error: Missing Operation ID)")
continue # Skip to the next path in the pending loop
fi

@holtskinner holtskinner force-pushed the nb-testing-improvements branch from 2ef8241 to 40cdc86 Compare May 30, 2025 19:15
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

1 participant