Skip to content

fix(FR-1261): Show fallback UI if there is storage proxy crash issue on faulty node #3978

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Open
wants to merge 2 commits into
base: main
Choose a base branch
from

Conversation

agatha197
Copy link
Contributor

@agatha197 agatha197 commented Jul 18, 2025

Resolves #3977 (FR-1261)

Improve error handling in storage and resource group components

This PR adds error boundaries and graceful error handling to storage and resource group related components to prevent UI crashes when backend API calls fail.

Key changes:

  • Replaced useSuspenseTanQuery with useTanQuery in ResourceGroupSelect and StorageSelect components
  • Added proper error handling with fallbacks in API calls
  • Added type definitions for better code maintainability
  • Modified useCurrentProject hook to handle vhost info fetch failures gracefully
  • Added proper fallback for alerts in MainLayout
  • Added error boundaries to storage-related cards in VFolderNodeListPage

These changes ensure that storage-related UI components continue to function even when backend API calls fail, providing a more resilient user experience.

Checklist:

  • Documentation
  • Minium required manager version
  • Specific setting for review (eg., KB link, endpoint or how to setup): http://10.122.12.219:8090
  • Minimum requirements to check during review: see whole pages without error boundary.
  • Test case(s) to demonstrate the difference of before/after

@github-actions github-actions bot added the size:L 100~500 LoC label Jul 18, 2025
Copy link
Contributor Author


How to use the Graphite Merge Queue

Add either label to this PR to merge it via the merge queue:

  • flow:merge-queue - adds this PR to the back of the merge queue
  • flow:hotfix - for urgent hot fixes, skip the queue and merge this PR next

You must have a Graphite account in order to use the merge queue. Sign up using this link.

An organization admin has required the Graphite Merge Queue in this repository.

Please do not merge from GitHub as this will restart CI on PRs being processed by the merge queue.

This stack of pull requests is managed by Graphite. Learn more about stacking.

@agatha197 agatha197 marked this pull request as ready for review July 18, 2025 06:54
@Copilot Copilot AI review requested due to automatic review settings July 18, 2025 06:54
Copy link
Contributor

@Copilot Copilot AI left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Pull Request Overview

This PR implements error handling and fallback UI to address storage proxy crash issues on faulty nodes. The changes focus on gracefully handling failures in storage-related components by wrapping them with error boundaries and using non-suspending queries for non-critical data.

  • Implements BAIErrorBoundary wrappers around storage-related components
  • Converts critical data fetching from suspending to non-suspending queries with fallback values
  • Adds try-catch blocks for storage proxy operations with console warnings

Reviewed Changes

Copilot reviewed 7 out of 7 changed files in this pull request and generated 5 comments.

Show a summary per file
File Description
VFolderNodeListPage.tsx Wraps StorageStatusPanelCard and QuotaPerStorageVolumePanelCard with error boundaries
useCurrentProject.tsx Separates scaling groups and vhost info fetching with fallback handling for vhost failures
StorageSelect.tsx Converts to useTanQuery with null fallback and defensive programming
ResourceGroupSelect.tsx Splits queries and adds graceful error handling for hosts data
NoResourceGroupAlert.tsx Wraps component with error boundary to prevent crashes
MainLayout.tsx Adds minimal fallback for non-critical alert components
FolderCreateModal.tsx Wraps StorageSelect with error boundary in modal context

Copy link

github-actions bot commented Jul 18, 2025

Coverage report for ./react

Caution

Test run failed

St.
Category Percentage Covered / Total
🔴 Statements
4.45% (-0% 🔻)
491/11042
🔴 Branches
3.91% (-0% 🔻)
303/7751
🔴 Functions
2.58% (-0% 🔻)
90/3483
🔴 Lines
4.41% (-0% 🔻)
476/10804
Show files with reduced coverage 🔻
St.
File Statements Branches Functions Lines
🔴
... / useCurrentProject.tsx
20.83% (-1.39% 🔻)
0% 0%
20.83% (-1.39% 🔻)
🔴
... / ResourceGroupSelect.tsx
2.5% (-0.2% 🔻)
0% 0%
2.56% (-0.21% 🔻)

Test suite run failed

Failed tests: 1/158. Failed suites: 1/15.
  ● DomainSelect › default render

    Invariant Violation: graphql: Unexpected invocation at runtime. Either the Babel transform was not set up, or it failed to identify this call site. Make sure it is being used verbatim as `graphql`. Note also that there cannot be a space between graphql and the backtick that follows.

      16 |
      17 |   const { domains } = useLazyLoadQuery<DomainSelectorQuery>(
    > 18 |     graphql`
         |            ^
      19 |       query DomainSelectorQuery {
      20 |         domains(is_active: true) {
      21 |           name

      at invariant (../node_modules/.pnpm/[email protected]/node_modules/invariant/invariant.js:40:15)
      at graphql (../node_modules/.pnpm/[email protected]/node_modules/relay-runtime/lib/query/GraphQLTag.js:7:52)
      at DomainSelector (src/components/DomainSelector.tsx:18:12)
      at Object.react-stack-bottom-frame (../node_modules/.pnpm/[email protected][email protected]/node_modules/react-dom/cjs/react-dom-client.development.js:22428:20)
      at renderWithHooks (../node_modules/.pnpm/[email protected][email protected]/node_modules/react-dom/cjs/react-dom-client.development.js:5757:22)
      at updateFunctionComponent (../node_modules/.pnpm/[email protected][email protected]/node_modules/react-dom/cjs/react-dom-client.development.js:8018:19)
      at beginWork (../node_modules/.pnpm/[email protected][email protected]/node_modules/react-dom/cjs/react-dom-client.development.js:9683:18)
      at runWithFiberInDEV (../node_modules/.pnpm/[email protected][email protected]/node_modules/react-dom/cjs/react-dom-client.development.js:543:16)
      at performUnitOfWork (../node_modules/.pnpm/[email protected][email protected]/node_modules/react-dom/cjs/react-dom-client.development.js:15052:22)
      at workLoopSync (../node_modules/.pnpm/[email protected][email protected]/node_modules/react-dom/cjs/react-dom-client.development.js:14870:41)
      at renderRootSync (../node_modules/.pnpm/[email protected][email protected]/node_modules/react-dom/cjs/react-dom-client.development.js:14850:11)
      at performWorkOnRoot (../node_modules/.pnpm/[email protected][email protected]/node_modules/react-dom/cjs/react-dom-client.development.js:14384:44)
      at performWorkOnRootViaSchedulerTask (../node_modules/.pnpm/[email protected][email protected]/node_modules/react-dom/cjs/react-dom-client.development.js:15931:7)
      at flushActQueue (../node_modules/.pnpm/[email protected]/node_modules/react/cjs/react.development.js:862:34)
      at Object.<anonymous>.process.env.NODE_ENV.exports.act (../node_modules/.pnpm/[email protected]/node_modules/react/cjs/react.development.js:1151:10)
      at ../node_modules/.pnpm/@[email protected]_@[email protected]_@[email protected]_@type_5aef3ee8b2555a079c2b9ce844db8ed7/node_modules/@testing-library/react/dist/act-compat.js:47:25
      at renderRoot (../node_modules/.pnpm/@[email protected]_@[email protected]_@[email protected]_@type_5aef3ee8b2555a079c2b9ce844db8ed7/node_modules/@testing-library/react/dist/pure.js:188:26)
      at render (../node_modules/.pnpm/@[email protected]_@[email protected]_@[email protected]_@type_5aef3ee8b2555a079c2b9ce844db8ed7/node_modules/@testing-library/react/dist/pure.js:287:10)
      at Object.<anonymous> (src/components/DomainSelector.test.tsx:28:34)

Report generated by 🧪jest coverage report action from 6a6c2ae

@agatha197 agatha197 force-pushed the fix_FR-1261_/show-fallback-ui-for-storage-proxy-crash branch from f29fb75 to 5f5fee5 Compare July 18, 2025 06:59
Copy link
Contributor

@nowgnuesLee nowgnuesLee left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

What do you think about adding error handling to the part that calls list_hosts()?

// useCurrentProject
const resourceGroupsForCurrentProjectAtom = atom(async (get) => {
  // NOTE: cannot use hook inside atom
  const currentProject = get(currentProjectAtom);
  const [resourceGroups, vhostInfo] = await Promise.all([
    // @ts-ignore
    globalThis.backendaiclient.scalingGroup.list(currentProject.name) as {
      scaling_groups: {
        name: string;
      }[];
    },
    // @ts-ignore
    globalThis.backendaiclient.vfolder.list_hosts(currentProject.id) as {
      allowed: string[];
      default: string;
      volume_info: {
        [key: string]: {
          backend: string;
          capabilities: string[];
          usage: {
            percentage: number;
          };
          sftp_scaling_groups?: string[];
        };
      };
    },
  ]).catch((error) => {
    console.error(error);
    return [
      { scaling_groups: [] },
      {
        allowed: [],
        default: '',
        volume_info: {},
      },
    ];
  });

@agatha197 agatha197 force-pushed the fix_FR-1261_/show-fallback-ui-for-storage-proxy-crash branch from 5f5fee5 to a766cae Compare July 18, 2025 08:15
@agatha197 agatha197 requested a review from nowgnuesLee July 18, 2025 08:16
@agatha197 agatha197 force-pushed the fix_FR-1261_/show-fallback-ui-for-storage-proxy-crash branch from a766cae to d4ae30a Compare July 18, 2025 08:27
@agatha197 agatha197 force-pushed the fix_FR-1261_/show-fallback-ui-for-storage-proxy-crash branch 3 times, most recently from 4c8ff79 to c090b90 Compare July 22, 2025 08:10
@agatha197 agatha197 requested a review from nowgnuesLee July 22, 2025 08:11
Copy link
Contributor

@nowgnuesLee nowgnuesLee left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM

@agatha197 agatha197 force-pushed the fix_FR-1261_/show-fallback-ui-for-storage-proxy-crash branch from c090b90 to 6a6c2ae Compare August 6, 2025 00:55
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
size:L 100~500 LoC
Projects
None yet
Development

Successfully merging this pull request may close these issues.

Show fallback UI if there is storage proxy crash issue on faulty node
2 participants