Skip to content

feat(image-adapter): improve error handling and status codes #886

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Closed
wants to merge 0 commits into from

Conversation

iDVB
Copy link

@iDVB iDVB commented May 30, 2025

Fixes #885

Copy link

changeset-bot bot commented May 30, 2025

⚠️ No Changeset found

Latest commit: de77cff

Merging this PR will not cause a version bump for any packages. If these changes should not result in a new version, you're good to go. If these changes should result in a version bump, you need to add a changeset.

This PR includes no changesets

When changesets are added to this PR, you'll see the packages that this PR includes changesets for and the associated semver types

Click here to learn what changesets are, and how to add one.

Click here if you're a maintainer who wants to add a changeset to this PR

Copy link

pkg-pr-new bot commented May 30, 2025

Open in StackBlitz

pnpm add https://pkg.pr.new/@opennextjs/aws@886

commit: eeea47c

Copy link
Contributor

@conico974 conico974 left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I have one big issue with this one, it's that in general we don't want to return the error message. Or if we want to do that, it should be under some env variable like OPEN_NEXT_FORWARD_ERROR_MESSAGE that would default to false or in debug mode only.

Even forwarding the status code could be bad, we need to look into what next start is doing here. We should do the same, and if they don't forward the status code we could put it under an env variable as well

@iDVB
Copy link
Author

iDVB commented May 30, 2025

I have one big issue with this one, it's that in general we don't want to return the error message. Or if we want to do that, it should be under some env variable like OPEN_NEXT_FORWARD_ERROR_MESSAGE that would default to false or in debug mode only.

Even forwarding the status code could be bad, we need to look into what next start is doing here. We should do the same, and if they don't forward the status code we could put it under an env variable as well

Agreed @conico974 , we should be choosy with what info we pass down.

Although, the main focus for this PR is that nothing that a client passing in a param to this endpoint like this... should return a 5xx. That would mean that any malicious user could just cause a 5xx error by requesting that endpoint with flawed querystring data. And at the end of the day, this is client issue 403 Bad Request

/_next/image?url=myurl

@iDVB
Copy link
Author

iDVB commented May 30, 2025

With the above commit....

Image Optimization Adapter Error Handling Improvements

Overview

The changes we've made to the image optimization adapter ensure that it behaves like the standard Next.js image optimization while also addressing security concerns about sensitive data leakage and preventing false alarms in monitoring.

Key Improvements

1. Preserves Next.js's Default Error Behavior:

  • The adapter now passes through the standard Next.js error messages like
    "url" parameter is valid but upstream response is invalid
  • This maintains consistency with how next start would handle these errors

2. Preserves Correct HTTP Status Codes:

  • When an upstream image returns a 403 or 404, that same status code is returned to the client
  • This ensures accurate error reporting to clients

3. Prevents Sensitive Data Leakage:

4. Improves Monitoring Accuracy:

  • Client errors (4xx) are now logged at debug level instead of error level
  • This prevents false alarms in monitoring systems like DataDog
  • True server errors (5xx in our code) are still properly logged as errors

Security Considerations

The security of this approach comes from two layers:

1. Next.js's Built-in Sanitization:

  • Next.js already sanitizes error messages from upstream servers
  • It converts them to generic messages like "url" parameter is valid but upstream response is invalid"
  • These messages don't contain sensitive information from the upstream server

2. Our Additional Error Handling:

  • We classify errors based on status codes and error types
  • We preserve Next.js's sanitized messages rather than exposing raw error details
  • We wrap client errors in IgnorableError to prevent them from being treated as server failures

@sommeeeer
Copy link
Contributor

sommeeeer commented May 31, 2025

error(
"Error during validation of image params",
imageParams.errorMessage,
);
Could you put a RecoverableError in this one aswell?

@iDVB
Copy link
Author

iDVB commented Jun 2, 2025

  1. Error Classes in error.ts

    • Added statusCode property to BaseOpenNextError interface
    • Added statusCode field to all error classes
    • Updated constructors to accept an optional statusCode parameter with appropriate defaults:
      • IgnorableError: default 400 (client error)
      • RecoverableError: default 400 (client error, but logged as warning)
      • FatalError: default 500 (server error)
  2. Update IgnorableError class to include status code
    Implemented as suggested

  3. Create a function to classify errors instead of repeating logic
    Added classifyError() function that centralizes error classification
    This function returns the appropriate error type with correct status code

  4. Use error logger with proper error types instead of debug logger
    Now using error() with appropriate error types that control the log level
    Removed debug() calls for client errors

  5. For validation errors, use RecoverableError
    Image parameter validation errors now use RecoverableError

statusCode = e.statusCode;
} else if ("code" in e) {
const code = e.code as string;
if (code === "ENOTFOUND" || code === "ECONNREFUSED") {
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Isn't ECONNREFUSED usually a 500 level error?

Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@khuezy I think you might be missing the original PR context?
All these errors (most) are fundamentally due to the server acting on the client's provided URL.
If a client provides a broken url (eg. invalid url, missing image, access denied, connection refused etc) then thats a "client error" 4xx not a fault of the server 5xx.

This is indeed how this works with native NextJS image optimizer. next start.
You can test this by going to
/_next/image?url=https://someUrl.com
where https://someUrl.com is a url that returns ECONNREFUSED, ENOTFOUNT, ACCESS DENIED etc.
Open Next server will 500
NextJS (next start) will return 4xx with the standard message "url" parameter is valid but upstream response is invalid

Copy link
Author

@iDVB iDVB Jun 2, 2025

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think an important question might be to what extent does the native NextJS Image Optimizer disregard the error codes and message it gets back from this adapter and simply respond to the client in that generic way.

Then perhaps all we need to do in the adapter is ensure we're not throwing or console.error on anything ... while still returning the accurate error messages?

This would change how this code works now.... as its trying to modify the error codes and messages to lower 5xx to 4xx so as not to crash / throw the server due to a client bad request (url)

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

gotcha. Very weird that nextjs decided to swallow the 500s into 400. Ooof

Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Really? This is what this PR is proposing it should be.

Otherwise you give access to end users, malicious or otherwise, to cause a 5xx on your servers but simply giving the endpoint a faulty URL or "Bad Request".

It's no fault of the image optimizer that the end-user passed it a URL to a ZIP file vs an image, or a 404 link, or an access denied one etc etc.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

You'd have that problem no matter what right? You're suppose to whitelist the endpoints, eg even for server actions and images - remotePatterns

Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Next.js's approach of normalizing upstream 5xx errors to 4xx responses is actually quite sensible for this use case. Here's why:

Monitoring clarity: By converting upstream 5xx errors to 4xx, Next.js ensures that your monitoring systems only alert on problems you can actually fix, not issues with third-party services.

Semantic correctness: From Next.js's perspective, if a user provides a URL that leads to a broken resource, that's fundamentally a client input problem - the client provided a "bad" URL, even if it's "bad" because the upstream server failed.

Practical operations: Development and operations teams need clean separation between "our systems are broken" vs. "external dependencies are broken." The conversion to 4xx helps maintain this boundary.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Monitoring clarity - I think you can argue the opposite too. If Nextjs converts the specific error to a generic, it's hard for you to determine what the problem is.

Semantic correctness - Most of the time, the "end user" is not the consumer of the image optimization request - the app is. This is why the remotePatterns exists. Your app service is not a service for people to arbitrarily use to optimize their images.

Practical operations: Development and operations teams need clean separation between "our systems are broken" vs. "external dependencies are broken." The conversion to 4xx helps maintain this boundary.

I'm confused by this statement, it sounds contradicting? If the error is swallowed, how would you know if it's "our system" or "their system" that's broken?

@iDVB
Copy link
Author

iDVB commented Jun 2, 2025

I've simplified the classifyError function to focus on the core principle: distinguishing between client errors and server errors for proper logging, while preserving the original error information.

The new implementation:

Preserves original error details - We keep the original error message and status code when available
Minimal classification - We only classify errors as client (4xx) or server (5xx)
Passes through to Next.js - We don't try to normalize error messages or create complex mappings

This approach has several benefits:

  • Simplicity - The code is more straightforward and easier to maintain
  • Less error-prone - Fewer code paths means fewer places for bugs to hide
  • Better alignment with Next.js - We let Next.js handle the client response formatting
  • The key insight is that our primary goal is to ensure proper error logging and monitoring (avoiding alerts for client errors), while allowing Next.js to handle the actual client-facing error responses. We're not trying to override Next.js's error handling behavior - just making sure our monitoring systems don't get noisy with expected client errors.

@iDVB iDVB requested a review from conico974 June 2, 2025 19:14
@iDVB
Copy link
Author

iDVB commented Jun 5, 2025

Core Changes

  • Consolidated error handling: Refactored into a single handleImageError helper function for clarity and reuse
  • Enhanced error detection: Improved S3 ListBucket permission error detection by checking multiple error paths in AWS SDK v3
  • Preserved status codes: Maintained appropriate HTTP status codes (403 for access denied) while avoiding server error status codes
  • Secure messaging: Generic error messages for clients while preserving detailed server-side logging
  • Better monitoring: Used IgnorableError to classify client errors and prevent false monitoring alerts

Internal vs External Image Handling

Next.js treats internal and external image errors differently, requiring separate approaches:

  • For internal images (S3):

    • Avoided setting Content-Type to trigger Next.js's "URL valid but internal response invalid" error message
    • Added debugging headers while maintaining secure client responses
  • For external images:

    • Preserved existing behavior with appropriate text/plain responses
    • Maintained proper error propagation that Next.js expects for external resources

This approach ensures consistent error behavior across image types while providing appropriate client feedback and maintaining security.

Copy link
Contributor

@khuezy khuezy left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM but I'd like some more eyes on this since I'm not currently using OpenNext.

I think we have some image optimization e2e tests already.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

Image Optimization Adapter Returns Improper HTTP Status Codes
4 participants