Skip to content

[Data][LLM] Support OpenAI's nested image_url schema in PrepareImageStage#56584

Merged
kouroshHakha merged 4 commits intoray-project:masterfrom
GuyStone:support-openai-nested
Oct 2, 2025
Merged

[Data][LLM] Support OpenAI's nested image_url schema in PrepareImageStage#56584
kouroshHakha merged 4 commits intoray-project:masterfrom
GuyStone:support-openai-nested

Conversation

@GuyStone
Copy link
Contributor

@GuyStone GuyStone commented Sep 16, 2025

Why are these changes needed?

  • Add support for OpenAI-compatible nested image_url schema in PrepareImageStage.
  • Currently, the documentation and code indicate that inputs should follow the OpenAI chat messages format. However, the accepted image format differs from what OpenAI supports. This inconsistency creates confusion and increases complexity when switching between Batch OpenAI and Batch with Ray.
The first stage of the processor is ChatTemplateStage.
Required input columns:
        messages: A list of messages in OpenAI chat format. See https://platform.openai.com/docs/api-reference/chat/create for details.

Related issue number

  • N/A

Checks

  • I've signed off every commit(by using the -s flag, i.e., git commit -s) in this PR.
  • I've run scripts/format.sh to lint the changes in this PR.
  • I've included any doc changes needed for https://docs.ray.io/en/master/.
    • I've added any new APIs to the API Reference. For example, if I added a
      method in Tune, I've added it in doc/source/tune/api/ under the
      corresponding .rst file.
  • I've made sure the tests are passing. Note that there might be a few flaky tests, see the recent failures at https://flakey-tests.ray.io/
  • Testing Strategy
    • Unit tests
    • Release tests
    • This PR is not tested :(

…tage

Signed-off-by: Guy Stone <guys@spotify.com>
@GuyStone GuyStone requested a review from a team as a code owner September 16, 2025 15:10
Copy link
Contributor

@gemini-code-assist gemini-code-assist bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Code Review

This pull request adds support for OpenAI's nested image_url schema in PrepareImageStage, which is a great improvement for consistency and user experience. The implementation is clean and correctly handles the new format, including validation for malformed inputs. A new unit test is also added to cover the happy path. My only suggestion is to also add a test case for the error path to ensure the new validation logic is fully covered.

Comment on lines 346 to 347
if image is None:
raise ValueError("image_url dict must contain 'url' key")
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

medium

This is good validation to ensure the image_url dictionary is well-formed. To make this more robust, it would be beneficial to add a unit test case that verifies this ValueError is raised when the url key is missing. This would improve test coverage for error handling paths.

@richardliaw richardliaw added the go add ONLY when ready to merge, run all tests label Sep 16, 2025
@ray-gardener ray-gardener bot added data Ray Data-related issues llm community-contribution Contributed by the community labels Sep 16, 2025
@gvspraveen gvspraveen removed the data Ray Data-related issues label Sep 21, 2025
@kouroshHakha kouroshHakha requested a review from nrghosh October 1, 2025 15:05
Copy link
Contributor

@nrghosh nrghosh left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Supporting nested image_url makes perfect sense, thanks for the contribution @GuyStone

Few quick nits

  • there's an optional param called detail (From openai schema that we're dropping here. I don't think the downstream templates / etc would currently use it, but would be good to just leave a note in the docstring that we're not passing it forward
  • Would be good to have at least one or two unhappy path tests - ex. missing or non-string image_url
  • type-guard + clarify error message, something like
if content_item["type"] == "image_url" and isinstance(image_data, dict):
    url = image_data.get("url")
    if not isinstance(url, str) or not url:
        raise ValueError("image_url must be an object with a non-empty 'url' string")
    image = url

otherwise looks good - kicked off a release test in the meantime

Signed-off-by: Nikhil Ghosh <nikhil@anyscale.com>
Copy link
Contributor

@nrghosh nrghosh left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM after review

Not re-running release tests, so good to merge cc @kouroshHakha

@kouroshHakha kouroshHakha enabled auto-merge (squash) October 2, 2025 21:36
@kouroshHakha kouroshHakha merged commit 12970ea into ray-project:master Oct 2, 2025
7 checks passed
dstrodtman pushed a commit that referenced this pull request Oct 6, 2025
…tage (#56584)

Signed-off-by: Guy Stone <guys@spotify.com>
Signed-off-by: Nikhil Ghosh <nikhil@anyscale.com>
Co-authored-by: Nikhil Ghosh <nikhil@anyscale.com>
Co-authored-by: Nikhil G <nrghosh@users.noreply.github.com>
Signed-off-by: Douglas Strodtman <douglas@anyscale.com>
eicherseiji pushed a commit to eicherseiji/ray that referenced this pull request Oct 6, 2025
…tage (ray-project#56584)

Signed-off-by: Guy Stone <guys@spotify.com>
Signed-off-by: Nikhil Ghosh <nikhil@anyscale.com>
Co-authored-by: Nikhil Ghosh <nikhil@anyscale.com>
Co-authored-by: Nikhil G <nrghosh@users.noreply.github.com>
Signed-off-by: Seiji Eicher <seiji@anyscale.com>
eicherseiji pushed a commit to eicherseiji/ray that referenced this pull request Oct 6, 2025
…tage (ray-project#56584)

Signed-off-by: Guy Stone <guys@spotify.com>
Signed-off-by: Nikhil Ghosh <nikhil@anyscale.com>
Co-authored-by: Nikhil Ghosh <nikhil@anyscale.com>
Co-authored-by: Nikhil G <nrghosh@users.noreply.github.com>
eicherseiji pushed a commit to eicherseiji/ray that referenced this pull request Oct 6, 2025
…tage (ray-project#56584)

Signed-off-by: Guy Stone <guys@spotify.com>
Signed-off-by: Nikhil Ghosh <nikhil@anyscale.com>
Co-authored-by: Nikhil Ghosh <nikhil@anyscale.com>
Co-authored-by: Nikhil G <nrghosh@users.noreply.github.com>
eicherseiji pushed a commit to eicherseiji/ray that referenced this pull request Oct 6, 2025
…tage (ray-project#56584)

Signed-off-by: Guy Stone <guys@spotify.com>
Signed-off-by: Nikhil Ghosh <nikhil@anyscale.com>
Co-authored-by: Nikhil Ghosh <nikhil@anyscale.com>
Co-authored-by: Nikhil G <nrghosh@users.noreply.github.com>
eicherseiji pushed a commit to eicherseiji/ray that referenced this pull request Oct 6, 2025
…tage (ray-project#56584)

Signed-off-by: Guy Stone <guys@spotify.com>
Signed-off-by: Nikhil Ghosh <nikhil@anyscale.com>
Co-authored-by: Nikhil Ghosh <nikhil@anyscale.com>
Co-authored-by: Nikhil G <nrghosh@users.noreply.github.com>
liulehui pushed a commit to liulehui/ray that referenced this pull request Oct 9, 2025
…tage (ray-project#56584)

Signed-off-by: Guy Stone <guys@spotify.com>
Signed-off-by: Nikhil Ghosh <nikhil@anyscale.com>
Co-authored-by: Nikhil Ghosh <nikhil@anyscale.com>
Co-authored-by: Nikhil G <nrghosh@users.noreply.github.com>
joshkodi pushed a commit to joshkodi/ray that referenced this pull request Oct 13, 2025
…tage (ray-project#56584)

Signed-off-by: Guy Stone <guys@spotify.com>
Signed-off-by: Nikhil Ghosh <nikhil@anyscale.com>
Co-authored-by: Nikhil Ghosh <nikhil@anyscale.com>
Co-authored-by: Nikhil G <nrghosh@users.noreply.github.com>
Signed-off-by: Josh Kodi <joshkodi@gmail.com>
justinyeh1995 pushed a commit to justinyeh1995/ray that referenced this pull request Oct 20, 2025
…tage (ray-project#56584)

Signed-off-by: Guy Stone <guys@spotify.com>
Signed-off-by: Nikhil Ghosh <nikhil@anyscale.com>
Co-authored-by: Nikhil Ghosh <nikhil@anyscale.com>
Co-authored-by: Nikhil G <nrghosh@users.noreply.github.com>
landscapepainter pushed a commit to landscapepainter/ray that referenced this pull request Nov 17, 2025
…tage (ray-project#56584)

Signed-off-by: Guy Stone <guys@spotify.com>
Signed-off-by: Nikhil Ghosh <nikhil@anyscale.com>
Co-authored-by: Nikhil Ghosh <nikhil@anyscale.com>
Co-authored-by: Nikhil G <nrghosh@users.noreply.github.com>
Aydin-ab pushed a commit to Aydin-ab/ray-aydin that referenced this pull request Nov 19, 2025
…tage (ray-project#56584)

Signed-off-by: Guy Stone <guys@spotify.com>
Signed-off-by: Nikhil Ghosh <nikhil@anyscale.com>
Co-authored-by: Nikhil Ghosh <nikhil@anyscale.com>
Co-authored-by: Nikhil G <nrghosh@users.noreply.github.com>
Signed-off-by: Aydin Abiar <aydin@anyscale.com>
Future-Outlier pushed a commit to Future-Outlier/ray that referenced this pull request Dec 7, 2025
…tage (ray-project#56584)

Signed-off-by: Guy Stone <guys@spotify.com>
Signed-off-by: Nikhil Ghosh <nikhil@anyscale.com>
Co-authored-by: Nikhil Ghosh <nikhil@anyscale.com>
Co-authored-by: Nikhil G <nrghosh@users.noreply.github.com>
Signed-off-by: Future-Outlier <eric901201@gmail.com>
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

community-contribution Contributed by the community go add ONLY when ready to merge, run all tests llm

Projects

None yet

Development

Successfully merging this pull request may close these issues.

5 participants