docs: add continuous batching page #41847

McPatate · 2025-10-24T15:47:40Z

No description provided.

ArthurZucker · 2025-10-29T11:01:27Z

docs/source/en/continuous_batching.md

+Nothing to do, it comes built-in with `transformers`! :nice:
+
+## API Reference
+


Maybe we can explain here what we do in the codebase:

we have a scheduler

a manager

a mixin
how do each interact with the other and on a high level tthe motivations for each class / func

will add in a subsequent PR

docs/source/en/continuous_batching.md

LysandreJik

Cool, I think it's good for a first PR actually! I'd merge quick and open other PRs to complete what's lacking afterwards

LysandreJik · 2025-10-31T14:12:26Z

docs/source/en/continuous_batching.md

+- [ ] CB usage examples
+- [ ] CB API reference
+- [x] light refresher on what is CB + links to blog post
+
+- [x] installation / setup instructions
+
+- [x] open telemetry support
+
+- [ ] subsection in Transformers > Inference
+
+- [x] supported & unsupported features
+
+- [ ] performance considerations
+  - [ ] note on benchmarks (CI + space)
+  - [ ] cuda graphs
+  - [ ] compile
+  - [ ] attn impl
+
+- [x] explicit intended use cases, the why of CB in transformers
+
+- [x] integration with serving


yes, but let's do it in multiple PRs -> done is better than perfect here; I'd focus on usage examples/API reference before perf considerations

docs/source/en/continuous_batching.md

HuggingFaceDocBuilderDev · 2025-11-03T14:12:18Z

The docs for this PR live here. All of your documentation changes will be reflected on that endpoint. The docs are available until 30 days after the last update.

* docs: add continuous batching page * docs(cb): add `generate_batch` example * docs(cb): add `opentelemtry` and `serving` section * feat: add `TODO` note about opentelemetry dependency * docs(cb): add supported features * docs(cb): add unsupported features * docs(cb): add `ContinuousBatchingManager` example * docs(cb): x reference CB in optimizing inference

* remove attributes and add all missing sub processors to their auto classes * remove all mentions of .attributes * cleanup * fix processor tests * fix modular * remove last attributes * fixup * fixes after merge * fix wrong tokenizer in auto florence2 * fix missing audio_processor + nits * Override __init__ in NewProcessor and change hf-internal-testing-repo (temporarily) * fix auto tokenizer test * add init to markup_lm * update CustomProcessor in custom_processing * remove print * nit * fix test modeling owlv2 * fix test_processing_layoutxlm * Fix owlv2, wav2vec2, markuplm, voxtral issues * add support for loading and saving multiple tokenizer natively * remove exclude_attributes from save_pretrained * Run slow v2 (#41914) * Super * Super * Super * Super --------- Co-authored-by: ydshieh <[email protected]> * Fix `detectron2` installation in docker files (#41975) * detectron2 - part 1 * detectron2 - part 2 --------- Co-authored-by: ydshieh <[email protected]> * Fix `autoawq[kernels]` installation in quantization docker file (#41978) fix autoawq[kernels] Co-authored-by: ydshieh <[email protected]> * add support for saving encoder only so any parakeet model can be loaded for inference (#41969) * add support for saving encoder only so any decoder model can be loaded Signed-off-by: nithinraok <[email protected]> * use convolution_bias * convert modular * convolution_bias in convertion script --------- Signed-off-by: nithinraok <[email protected]> Co-authored-by: Eustache Le Bihan <[email protected]> Co-authored-by: eustlb <[email protected]> * Use indices as position_ids in modernebert (#41789) * Use indices as position_ids in modernebert * Move position_ids init to the branch * test tensor parallel: make tests for dense model more robust (#41968) * make test forward and backward more robust * refactor compile part of test tensor parallel * linting * pass rank around instead of calling it over and over * Run slow v2 (#41914) * Super * Super * Super * Super --------- Co-authored-by: ydshieh <[email protected]> * Fix `detectron2` installation in docker files (#41975) * detectron2 - part 1 * detectron2 - part 2 --------- Co-authored-by: ydshieh <[email protected]> * Fix `autoawq[kernels]` installation in quantization docker file (#41978) fix autoawq[kernels] Co-authored-by: ydshieh <[email protected]> * add support for saving encoder only so any parakeet model can be loaded for inference (#41969) * add support for saving encoder only so any decoder model can be loaded Signed-off-by: nithinraok <[email protected]> * use convolution_bias * convert modular * convolution_bias in convertion script --------- Signed-off-by: nithinraok <[email protected]> Co-authored-by: Eustache Le Bihan <[email protected]> Co-authored-by: eustlb <[email protected]> --------- Signed-off-by: nithinraok <[email protected]> Co-authored-by: Yih-Dar <[email protected]> Co-authored-by: ydshieh <[email protected]> Co-authored-by: Nithin Rao <[email protected]> Co-authored-by: Eustache Le Bihan <[email protected]> Co-authored-by: eustlb <[email protected]> * fix: dict[RopeParameters] to dict[str, RopeParameters] (#41963) * docs: add continuous batching page (#41847) * docs: add continuous batching page * docs(cb): add `generate_batch` example * docs(cb): add `opentelemtry` and `serving` section * feat: add `TODO` note about opentelemetry dependency * docs(cb): add supported features * docs(cb): add unsupported features * docs(cb): add `ContinuousBatchingManager` example * docs(cb): x reference CB in optimizing inference * Fix `torchcodec` version in quantization docker file (#41988) check Co-authored-by: ydshieh <[email protected]> * [kernels] Add Tests & CI for kernels (#41765) * first commit * add tests * add kernel config * add more tests * add ci * small fix * change branch name * update tests * nit * change test name * revert jobs * addressing review * reenable all jobs * address second review * Move the Mi355 to regular docker (#41989) * Move the Mi355 to regular docker * Disable gfx950 compilation for FA on AMD * More data in benchmarking (#41848) * Reduce scope of cross-generate * Rm generate_sall configs * Workflow benchmarks more * Prevent crash when FA is not installed * fix (CI): Refactor SSH runners (#41991) * Change ssh runner type * Add wait step to SSH runner workflow * Rename wait step to wait2 in ssh-runner.yml * Remove wait step from ssh-runner.yml Removed the wait step from the SSH runner workflow. * Update runner type for single GPU A10 instance * Update SSH runner version to 1.90.3 * Add sha256sum to ssh-runner workflow * Update runner type and remove unused steps * fix 3 failed test cases for video_llama_3 model on Intel XPU (#41931) * fix 3 failed test cases for video_llama_3 model on Intel XPU Signed-off-by: Liu, Kaixuan <[email protected]> * update Signed-off-by: Liu, Kaixuan <[email protected]> * adjust format Signed-off-by: Liu, Kaixuan <[email protected]> * update code Signed-off-by: Liu, Kaixuan <[email protected]> --------- Signed-off-by: Liu, Kaixuan <[email protected]> * Integrate colqwen2.5 using colqwen2 modelling code (#40600) * adding option for 2.5 * minor - arg in conversion script * getting started on modelling.py * minor - shouldve been using modular * adressing comments + fixing datatype/device _get method * minor * commiting suggestion Co-authored-by: Yoni Gozlan <[email protected]> * docs + first test * ruff fix * minor fix * ruff fix * model fix * model fix * fine-grained check, with a hardcoded score from the original Hf implementation. * minor ruff * update tests values with CI hardware * adding 2.5 to conversion script * Apply style fixes --------- Co-authored-by: Sahil Kabir <[email protected]> Co-authored-by: Yoni Gozlan <[email protected]> Co-authored-by: yonigozlan <[email protected]> Co-authored-by: github-actions[bot] <github-actions[bot]@users.noreply.github.com> * Fixed wrong padding value in OWLv2 (#41938) * Update image_processing_owlv2_fast.py fixed padding value * fixed padding value * Change padding constant value from 0.5 to 0.0 * Fixed missed padding value in modular_owlv2.py --------- Co-authored-by: Yoni Gozlan <[email protected]> * Fix `run slow v2`: empty report when there is only one model (#42002) fix Co-authored-by: ydshieh <[email protected]> * [kernels] change import time in KernelConfig (#42004) * change import time * style * DOC Fix typo in argument name: pseudoquant (#41994) The correct argument name is pseudoquantization. Since there is no error on passing wrong arguments name (which is arguably an anti-pattern), this is difficult for users to debug. * Fix `torch+deepspeed` docker file (#41985) * fix * delete --------- Co-authored-by: ydshieh <[email protected]> * Correct syntax error in trainer.md (#42001) A comma is missing between two parameters in the signature of compute_loss function. * Reduce the number of benchmark in the CI (#42008) Changed how benchmark cfgs are chosen * Fix continuous batching tests (#42012) * Fix continuous batching tests * make fixup * add back `logging_dir` (#42013) * add back * Apply style fixes --------- Co-authored-by: github-actions[bot] <github-actions[bot]@users.noreply.github.com> * Fix issue with from pretrained and kwargs in image processors (#41997) * accept kwargs in image proc from_pretrained * only use kwargs that are in cls.valid_kwargs * remove specific logic for _from_auto * add image_seq_length to Images_kwargs for backward compatibility * fix missing image kwargs in pix2struct * Fix default image_rows and image_cols initialization in Idefics3 and SmolVLM processors (#41871) * Fix default image_rows and image_cols initialization in Idefics3 and SmolVLM processors * Fix default initialization of image_rows and image_cols in Idefics3 and SmolVLM processors * Add GLPNImageProcessorFast (#41725) * Add GLPNImageProcessorFast for torch backend * Address review feedback - Simplified to_dict() method - Keep tensors as torch instead of converting to numpy for heterogeneous shapes - Removed unnecessary shape guards in post_process_depth_estimation - Improved variable names (tgt -> target_size, d -> resized) - Removed unnecessary GLPNImageProcessorKwargs class * Address review feedback - Simplified to_dict() method - Keep tensors as torch instead of converting to numpy for heterogeneous shapes - Removed unnecessary shape guards in post_process_depth_estimation - Improved variable names (tgt -> target_size, d -> resized) - Removed unnecessary GLPNImageProcessorKwargs class * commits after 2nd review * Address all review feedback and add explicit batched test - Simplified to_dict() with descriptive variable names (d->output_dict) - Fixed resize operation: changed from crop to proper resize with interpolation - Added padding for heterogeneous batch shapes in both slow and fast processors - Fused rescale and normalize operations for efficiency - Improved all variable names (tgt->target_size, d->depth_4d->resized) - Added GLPNImageProcessorKwargs class in slow processor and imported in fast - Renamed test_equivalence_slow_fast to test_slow_fast_equivalence - Added explicit test_slow_fast_equivalence_batched test - All 20 tests passing * using padding from utils * simplify glpn image processor fast * fix docstring --------- Co-authored-by: yonigozlan <[email protected]> Co-authored-by: Yoni Gozlan <[email protected]> * add fuyu fast image processors (#41817) * added fast processor for fuyu (#36978) * updated docs for fuyu model (#36978) * updated test_image_processing and image_processing_fuyu_fast * updated fuyu.md and image_processing_fuyu_fast (#36978) * updated test_image_processing_fuyu (#36978) * formatted image_processing_fuyu_fast and test_image_processing_fuyu (#36978) * updated tests and fuyu fast image processing (#36978) * Merge branch 'fuyu-fast-image-processors' of https://github.com/DeXtAr47-oss/transformers into fuyu-fast-image-processors * fixed format (#36978) * formatted files (#36978) * formatted files * revert unnecessary changes * clean up and process by group --------- Co-authored-by: yonigozlan <[email protected]> * [kernels] Fix XPU layernorm kernel (#41583) * fix * add comment * better fix * style * Update src/transformers/modeling_utils.py Co-authored-by: Marc Sun <[email protected]> --------- Co-authored-by: Marc Sun <[email protected]> * [v5] Deprecate Text2Text and related pipelines (#41996) * Deprecate Text2Text and related pipelines * Try a restructure * make fixup * logging -> logger * [FPQuant] MXFP8 and MXFP4 backwards support (#41897) * FP-Quant backwards * fp-quant v0.3.0 docker * availability version bump * fp_quant==0.3.1 * fp_quant v0.3.2 * add working auto_docstring for processors * add auto_docstring to processors first part * add auto_docstring to processors part 2 * modifs after review * fully working auto_docstring and check_docstring with placeholder docstrings * Working check_docstrings for Typed dicts * Add recurring processor args to auto_docstring and add support for removing redundant docstring and placeholders * replace placeholders with real docstrings * fix copies * fixup * remove unwanted changes * fix unprotected imports * Fix unprotected imports * fix unprotected imports * Add __call__ to all docs of processors * nits docs --------- Signed-off-by: nithinraok <[email protected]> Signed-off-by: Liu, Kaixuan <[email protected]> Co-authored-by: Yih-Dar <[email protected]> Co-authored-by: ydshieh <[email protected]> Co-authored-by: Nithin Rao <[email protected]> Co-authored-by: Eustache Le Bihan <[email protected]> Co-authored-by: eustlb <[email protected]> Co-authored-by: Rémi Ouazan <[email protected]> Co-authored-by: Ferdinand Mom <[email protected]> Co-authored-by: Ryan Mullins <[email protected]> Co-authored-by: Luc Georges <[email protected]> Co-authored-by: Mohamed Mekkouri <[email protected]> Co-authored-by: Guillaume LEGENDRE <[email protected]> Co-authored-by: kaixuanliu <[email protected]> Co-authored-by: Sahil Kabir <[email protected]> Co-authored-by: Sahil Kabir <[email protected]> Co-authored-by: github-actions[bot] <github-actions[bot]@users.noreply.github.com> Co-authored-by: James <[email protected]> Co-authored-by: Benjamin Bossan <[email protected]> Co-authored-by: Yacklin Wong <[email protected]> Co-authored-by: Matt <[email protected]> Co-authored-by: Marc Sun <[email protected]> Co-authored-by: MilkClouds <[email protected]> Co-authored-by: ARAVINDHAN T <[email protected]> Co-authored-by: Pritam Das <[email protected]> Co-authored-by: Andrei Panferov <[email protected]>

McPatate requested review from ArthurZucker and remi-or October 24, 2025 15:47

McPatate force-pushed the docs/add_continuous_batching branch from ce9d16b to 58e1942 Compare October 27, 2025 13:04

McPatate added 4 commits October 27, 2025 14:47

docs: add continuous batching page

b84acca

docs(cb): add generate_batch example

35feaf3

docs(cb): add opentelemtry and serving section

25e0625

feat: add TODO note about opentelemetry dependency

3dbc372

McPatate force-pushed the docs/add_continuous_batching branch from db7d68a to 3dbc372 Compare October 27, 2025 13:47

McPatate added 2 commits October 27, 2025 14:54

docs(cb): add supported features

4ba9208

docs(cb): add unsupported features

cb436b6

ArthurZucker reviewed Oct 29, 2025

View reviewed changes

LysandreJik approved these changes Oct 31, 2025

View reviewed changes

docs(cb): add ContinuousBatchingManager example

7023aeb

McPatate force-pushed the docs/add_continuous_batching branch from 236f630 to 7023aeb Compare November 3, 2025 13:55

docs(cb): x reference CB in optimizing inference

6a1a3bb

McPatate marked this pull request as ready for review November 3, 2025 14:04

McPatate merged commit 22e39df into main Nov 3, 2025
16 checks passed

McPatate deleted the docs/add_continuous_batching branch November 3, 2025 14:19

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

docs: add continuous batching page #41847

docs: add continuous batching page #41847

McPatate commented Oct 24, 2025

Uh oh!

ArthurZucker Oct 29, 2025

Uh oh!

McPatate Nov 3, 2025

Uh oh!

Uh oh!

Uh oh!

LysandreJik left a comment

Uh oh!

LysandreJik Oct 31, 2025

Uh oh!

Uh oh!

HuggingFaceDocBuilderDev commented Nov 3, 2025

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

5 participants

		Nothing to do, it comes built-in with `transformers`! :nice:

		## API Reference

docs: add continuous batching page #41847

docs: add continuous batching page #41847

Conversation

McPatate commented Oct 24, 2025

Uh oh!

ArthurZucker Oct 29, 2025

Choose a reason for hiding this comment

Uh oh!

McPatate Nov 3, 2025

Choose a reason for hiding this comment

Uh oh!

Uh oh!

Uh oh!

LysandreJik left a comment

Choose a reason for hiding this comment

Uh oh!

LysandreJik Oct 31, 2025

Choose a reason for hiding this comment

Uh oh!

Uh oh!

HuggingFaceDocBuilderDev commented Nov 3, 2025

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

5 participants