chore(deps): bump transformers from 5.7.0 to 5.8.0 in /cmd/runtimes/deepspeed by dependabot[bot] · Pull Request #3498 · kubeflow/trainer

dependabot · 2026-05-12T05:30:05Z

Bumps transformers from 5.7.0 to 5.8.0.

Release notes

Release 5.8.0

Release v5.8.0

New Model additions

DeepSeek-V4

DeepSeek-V4 is the next-generation MoE (Mixture of Experts) language model from DeepSeek that introduces several architectural innovations over DeepSeek-V3. The architecture replaces Multi-head Latent Attention (MLA) with a hybrid local + long-range attention design, swaps residual connections for Manifold-Constrained Hyper-Connections (mHC), and bootstraps the first few MoE layers with a static token-id → expert-id hash table. This implementation covers DeepSeek-V4-Flash, DeepSeek-V4-Pro, and their -Base pretrained variants, which share the same architecture but differ in width, depth, expert count and weights.

Links: Documentation | Paper

Add DeepSeek V4 (#45643) by @ArthurZucker in #45643

Gemma 4 Assistant

Gemma 4 Assistant is a small, text-only model that enables speculative decoding for Gemma 4 models using the Multi-Token Prediction (MTP) method and associated candidate generator. The model shares the same Gemma4TextModel backbone as other Gemma 4 models but uses KV sharing throughout the entire model, allowing it to reuse the KV cache populated by the target model and skip the pre-fill phase entirely. This architecture includes cross-attention to make the most of the target model's context, allowing the assistant to accurately predict more drafted tokens per drafting round.

Links: Documentation

First model (#45788) by @SindhuRaghuram97 in #45788

GraniteSpeechPlus

Granite Speech Plus is a variant of Granite Speech that enhances the projector by consuming the concatenation of the encoder's final hidden states with an arbitrary subset of its intermediate hidden states along the feature dimension. It is a multimodal speech-to-text model that can transcribe audio, provide speaker annotation and word level timestamps by responding to text prompts. The model inherits the same architecture components as Granite Speech including the speech encoder, query transformer projector, language model, and optional LoRA adapter.

Links: Documentation

Support for a new Granite-Speech-Plus model (#45695) by @zvik in #45695

Granite4Vision

Granite Vision 4.1 is a vision-language model from IBM Research designed for enterprise-grade document data extraction. It specializes in chart extraction (Chart2CSV, Chart2Summary, Chart2Code), table extraction (JSON, HTML, OTSL), and semantic key-value pair extraction. The model builds on LLaVA-NeXT with architectural innovations including SigLIP2 Vision Encoder, Window Q-Former Projectors, and DeepStack Feature Injection with 8 vision-to-LLM injection points.

Links: Documentation

Add Granite 4.1 Vision (granite4_vision) (#45597) by @artem-spector in #45597

EXAONE-4.5

EXAONE 4.5 is the first open-weight vision language model developed by LG AI Research, integrating a dedicated visual encoder into the existing EXAONE 4.0 framework to expand multimodal capabilities. The model features 33 billion parameters in total, including 1.2 billion parameters from the vision encoder, and achieves competitive performance in general benchmarks while outperforming similar-sized models in document understanding and Korean contextual reasoning. It builds on EXAONE 4.0 with key enhancements including an expanded vocabulary of 153,600 tokens, support for up to 256K token context windows, and a Multi-Token Prediction (MTP) mechanism.

Links: Documentation | Paper | Blog Post

Add EXAONE 4.5 implementations (#45471) by @nuxlear in #45471

PP-FormulaNet

... (truncated)

Commits

049d2bf v5.8.0
2871caf Add Granite 4.1 Vision (granite4_vision) (#45597)
aaec109 fix: correct spelling in continuous_api docstring (#45749)
7050d0e Fix link to modular transformers documentation (#45746)
df2f2b5 Gemma4: fix failed test cases (#45568)
2c7d385 First model (#45788)
a6ccf93 Fix CI: Allow more artifacts to be download in CI (#45785)
2c432d7 Add concurrency to PR CI workflow file (pr-ci-caller.yml) (#45786)
3db570f Reorder decorators for autodoc and dataclass (#45702)
136befe Unwrap text_config in AutoModelFor*.from_config (#45770)
Additional commits viewable in compare view

Dependabot will resolve any conflicts with this PR as long as you don't alter it yourself. You can also trigger a rebase manually by commenting @dependabot rebase.

Dependabot commands and options

You can trigger Dependabot actions by commenting on this PR:

@dependabot rebase will rebase this PR
@dependabot recreate will recreate this PR, overwriting any edits that have been made to it
@dependabot show <dependency name> ignore conditions will show all of the ignore conditions of the specified dependency
@dependabot ignore this major version will close this PR and stop Dependabot creating any more for this major version (unless you reopen the PR or upgrade to it yourself)
@dependabot ignore this minor version will close this PR and stop Dependabot creating any more for this minor version (unless you reopen the PR or upgrade to it yourself)
@dependabot ignore this dependency will close this PR and stop Dependabot creating any more for this dependency (unless you reopen the PR or upgrade to it yourself)

Bumps [transformers](https://github.com/huggingface/transformers) from 5.7.0 to 5.8.0. - [Release notes](https://github.com/huggingface/transformers/releases) - [Commits](huggingface/transformers@v5.7.0...v5.8.0) --- updated-dependencies: - dependency-name: transformers dependency-version: 5.8.0 dependency-type: direct:production update-type: version-update:semver-minor ... Signed-off-by: dependabot[bot] <support@github.com>

Copilot

Copilot encountered an error and was unable to review this pull request. You can try again by re-requesting a review.

github-actions · 2026-05-12T05:30:18Z

🎉 Welcome to the Kubeflow Trainer! 🎉

Thanks for opening your first PR! We're happy to have you as part of our community 🚀

Here's what happens next:

If you haven't already, please check out our Contributing Guide for repo-specific guidelines and the Kubeflow Contributor Guide for general community standards.
Our team will review your PR soon! cc @kubeflow/kubeflow-trainer-team

Join the community:

Slack: Join our #kubeflow-trainer Slack channel.
Meetings: Attend the Kubeflow AutoML and Training Working Group bi-weekly meetings.

Feel free to ask questions in the comments if you need any help or clarification!
Thanks again for contributing to Kubeflow! 🙏

tenzen-y

/lgtm
/approve

google-oss-prow · 2026-05-12T08:13:35Z

[APPROVALNOTIFIER] This PR is APPROVED

This pull-request has been approved by: tenzen-y

The full list of commands accepted by this bot can be found here.

The pull request process is described here

Details

Needs approval from an approver in each of these files:

~~OWNERS~~ [tenzen-y]

Approvers can indicate their approval by writing /approve in a comment
Approvers can cancel approval by writing /approve cancel in a comment

dependabot Bot added dependencies Pull requests that update a dependency file python Pull requests that update Python code labels May 12, 2026

Copilot AI review requested due to automatic review settings May 12, 2026 05:30

dependabot Bot added dependencies Pull requests that update a dependency file python Pull requests that update Python code labels May 12, 2026

Copilot AI reviewed May 12, 2026

View reviewed changes

google-oss-prow Bot requested review from akshaychitneni and jinchihe May 12, 2026 05:30

google-oss-prow Bot added the size/XS label May 12, 2026

tenzen-y reviewed May 12, 2026

View reviewed changes

google-oss-prow Bot assigned tenzen-y May 12, 2026

google-oss-prow Bot added the lgtm label May 12, 2026

google-oss-prow Bot added the approved label May 12, 2026

google-oss-prow Bot merged commit 809c404 into master May 12, 2026
35 of 38 checks passed

google-oss-prow Bot deleted the dependabot/pip/cmd/runtimes/deepspeed/transformers-5.8.0 branch May 12, 2026 08:15

google-oss-prow Bot added this to the v2.3 milestone May 12, 2026

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

chore(deps): bump transformers from 5.7.0 to 5.8.0 in /cmd/runtimes/deepspeed#3498

chore(deps): bump transformers from 5.7.0 to 5.8.0 in /cmd/runtimes/deepspeed#3498
google-oss-prow[bot] merged 1 commit into
masterfrom
dependabot/pip/cmd/runtimes/deepspeed/transformers-5.8.0

dependabot Bot commented on behalf of github May 12, 2026

Uh oh!

Copilot AI left a comment

Uh oh!

github-actions Bot commented May 12, 2026

Uh oh!

tenzen-y left a comment

Uh oh!

google-oss-prow Bot commented May 12, 2026

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants

Conversation

dependabot Bot commented on behalf of github May 12, 2026

Release 5.8.0

Release v5.8.0

New Model additions

DeepSeek-V4

Gemma 4 Assistant

GraniteSpeechPlus

Granite4Vision

EXAONE-4.5

PP-FormulaNet

Uh oh!

Copilot AI left a comment

Choose a reason for hiding this comment

Uh oh!

github-actions Bot commented May 12, 2026

Uh oh!

tenzen-y left a comment

Choose a reason for hiding this comment

Uh oh!

google-oss-prow Bot commented May 12, 2026

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants