-
-
Notifications
You must be signed in to change notification settings - Fork 12.6k
[Core] Dynamic image size support for VLMs #5276
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Merged
youkaichao
merged 242 commits into
vllm-project:main
from
DarkLight1337:mm-image-tokenizer-2
Jul 3, 2024
Merged
Changes from 147 commits
Commits
Show all changes
242 commits
Select commit
Hold shift + click to select a range
34bfa79
Introduce a higher level `INPUT_REGISTRY`
DarkLight1337 df2aa19
Move dummy data generation to input registry
DarkLight1337 c72d2b3
Update docs
DarkLight1337 d8c6488
Rename `process_input` to `map_input`
DarkLight1337 f18de48
Reorder arguments
DarkLight1337 653537d
Apply input processor
DarkLight1337 a2f5a3c
Remove `VisionLanguageConfig` from input mapper
DarkLight1337 378ad80
Fix bad use of `functools.partial`
DarkLight1337 7aa3778
Use default input processor
DarkLight1337 c774168
Merge branch 'upstream' into mm-image-tokenizer
DarkLight1337 532f863
Fix wrong arguments
DarkLight1337 080d40c
Use pillow image instead of tensor to avoid bypassing the processor b…
DarkLight1337 662693a
Update interface of dummy data factory and input processor
DarkLight1337 9bc5fcc
Use `InputContext` to handle checked type cast of config types
DarkLight1337 911cac7
Add input processor for injecting image tokens; fix docs
DarkLight1337 a38b347
Add new documentation pages
DarkLight1337 29c3bb3
Fix LLaVA-NeXT input processor and cleanup code
DarkLight1337 9cfbcce
Fix LLaVA-NeXT input processor and cleanup code
DarkLight1337 7bb6cbf
Add sanity check
DarkLight1337 ccf49c4
Merge branch 'upstream' into mm-image-tokenizer
DarkLight1337 3482d32
Merge branch 'upstream' into mm-image-tokenizer
DarkLight1337 8ea8468
Merge branch 'upstream' into mm-image-tokenizer
DarkLight1337 be3d64f
Merge branch 'upstream' into mm-image-tokenizer
DarkLight1337 2ff5be6
Merge branch 'upstream' into mm-image-tokenizer
DarkLight1337 8e2ff86
Update LLaVA-NeXT
DarkLight1337 553f684
Merge branch 'upstream' into mm-image-tokenizer
DarkLight1337 b134dfc
Update name
DarkLight1337 1efa480
Merge branch 'mm-image-tokenizer' into mm-image-tokenizer-2
DarkLight1337 1a08444
Update LLaVA-NeXT
DarkLight1337 7e33706
Merge branch 'upstream' into mm-image-tokenizer
DarkLight1337 cfc31fd
Merge branch 'upstream' into mm-image-tokenizer-2
DarkLight1337 3fb622c
Remove `MULTIMODAL` convenience property as it was causing some (impo…
DarkLight1337 da85ab2
Merge branch 'mm-image-tokenizer' into mm-image-tokenizer-2
DarkLight1337 383bea1
Update docs
DarkLight1337 80a09f2
Remove double processing of image tokens
DarkLight1337 6a70e4f
Add docs
DarkLight1337 8322ecb
Add docs
DarkLight1337 52a0116
Add docs
DarkLight1337 c1733dd
Add docs
DarkLight1337 b7a8683
Merge branch 'upstream' into mm-image-tokenizer
DarkLight1337 9fb5e72
Remove more instances of double processing; update docs
DarkLight1337 25f9949
Merge branch 'upstream' into mm-image-tokenizer
DarkLight1337 03c7e65
Merge branch 'mm-image-tokenizer' into mm-image-tokenizer-2
DarkLight1337 3932b3f
Remove xfail
DarkLight1337 7fa877a
Fix missing image token in OpenAI API serving
DarkLight1337 092e550
Fix LLaVA-NeXT test
DarkLight1337 7a19862
Remove duplicate processing in async engine
DarkLight1337 fd7d954
Merge branch 'upstream' into mm-image-tokenizer
DarkLight1337 49dac3e
Merge branch 'upstream' into mm-image-tokenizer
DarkLight1337 b2c6832
Merge branch 'mm-image-tokenizer' into mm-image-tokenizer-2
DarkLight1337 0104218
Merge branch 'upstream' into mm-image-tokenizer
DarkLight1337 18cc7e0
Set up dummy data factory for phi3v
DarkLight1337 2291617
Move dummy data factories to model files
DarkLight1337 adf5503
Merge branch 'upstream' into mm-image-tokenizer
DarkLight1337 e5a94e4
Merge branch 'mm-image-tokenizer' into mm-image-tokenizer-2
DarkLight1337 9b0386d
Move input processors to model files
DarkLight1337 4e656e7
Set up input processor for phi3v
DarkLight1337 fecf1f0
Fix wrong feature size
DarkLight1337 086e0fe
Fix wrong feature size
DarkLight1337 8c26a18
Merge branch 'mm-image-tokenizer' into mm-image-tokenizer-2
DarkLight1337 81522fe
Fix wrong feature size
DarkLight1337 c036b86
Merge branch 'upstream' into mm-image-tokenizer
DarkLight1337 f75e1ab
Merge branch 'mm-image-tokenizer' into mm-image-tokenizer-2
DarkLight1337 b24e8d9
Update validation
DarkLight1337 8569d35
Fix image feature calculation for phi3v
DarkLight1337 bfa5aa9
Remove redundant code
DarkLight1337 dc34121
Merge branch 'mm-image-tokenizer' into mm-image-tokenizer-2
DarkLight1337 07e695d
Apply isort
DarkLight1337 8a43a77
Merge branch 'mm-image-tokenizer' into mm-image-tokenizer-2
DarkLight1337 825401d
Apply yapf
DarkLight1337 4a0d4d1
Reduce `max_tokens` so that test still passes
DarkLight1337 8d22fe0
Fix vllm to hf output (+ rename)
DarkLight1337 2e1ee2f
Fix wrong arguments
DarkLight1337 7229b07
Move `DummyImageDataFactories` into CLIP model file
DarkLight1337 17800fd
Merge branch 'mm-image-tokenizer' into mm-image-tokenizer-2
DarkLight1337 50f994b
Move `input_processor_for_clip` into CLIP
DarkLight1337 838aa9b
Remove some magic numbers
DarkLight1337 e7a5564
Test multiscale inputs for LLaVA-NeXT
DarkLight1337 36e8001
Handle multiscale inputs (different number of patches per batch) in L…
DarkLight1337 39e6d42
Fix wrong feature size
DarkLight1337 0d7f18f
Apply formatter
DarkLight1337 8e5dc7c
Merge branch 'upstream' into mm-image-tokenizer-2
DarkLight1337 d9a4150
Merge branch 'upstream' into mm-image-tokenizer
DarkLight1337 6849236
Merge branch 'mm-image-tokenizer' into mm-image-tokenizer-2
DarkLight1337 6d02491
Revert max_tokens
DarkLight1337 76ddea4
Add more tests for input mapper
DarkLight1337 4b20e66
Sanity check: Also test multiscale inputs for LLaVA-1.5
DarkLight1337 784af1a
Do not auto-convert image dtype to model's dtype
DarkLight1337 8e5fb12
Update prompts
DarkLight1337 4b947ad
Merge branch 'upstream' into mm-image-tokenizer
DarkLight1337 e7397ee
Merge branch 'mm-image-tokenizer' into mm-image-tokenizer-2
DarkLight1337 865be7a
Fix mapper tests w.r.t. dtype change
DarkLight1337 9e82a26
Clarify docs and add todo
DarkLight1337 46391de
Merge branch 'mm-image-tokenizer' into mm-image-tokenizer-2
DarkLight1337 a4733f9
Remove TODO since vision config will be removed soon
DarkLight1337 6b19e6c
Expand docs
DarkLight1337 be326f2
Merge branch 'mm-image-tokenizer' into mm-image-tokenizer-2
DarkLight1337 f451668
Add ref
DarkLight1337 5c0c8cf
Merge branch 'mm-image-tokenizer' into mm-image-tokenizer-2
DarkLight1337 3d7b795
Update docs
DarkLight1337 1abb8a7
Add docs
DarkLight1337 428d420
Merge branch 'mm-image-tokenizer' into mm-image-tokenizer-2
DarkLight1337 698830f
Fix name
DarkLight1337 ac9ea9a
Merge branch 'mm-image-tokenizer' into mm-image-tokenizer-2
DarkLight1337 334b1a9
Add `MultiModalInputs` to docs
DarkLight1337 36ab12d
Fix and add links
DarkLight1337 af01e97
Merge branch 'mm-image-tokenizer' into mm-image-tokenizer-2
DarkLight1337 c303421
Fix `is_multiscale` not provided anymore
DarkLight1337 0a0c0e3
Also test multiscale input for phi3v
DarkLight1337 60517a7
Revert max_tokens for phi3v as numerical error still persists
DarkLight1337 57df434
Improve error message
DarkLight1337 ffe0675
Log the full output for easier reference
DarkLight1337 4f7b210
[VLM] Remove support for pixel_values and image_features.
xwjiang2010 c7a2a66
Update xfail to be more efficient
DarkLight1337 598e0e3
Also xfail llava test
DarkLight1337 174ca90
address comments
xwjiang2010 5b3e9aa
remove image_input_type altogether.
xwjiang2010 b7acf3a
types
xwjiang2010 f22b219
format
xwjiang2010 f84d87a
Update comment
DarkLight1337 5dfb6fc
Update docs
DarkLight1337 bbeff03
Merge branch 'mm-image-tokenizer' into mm-image-tokenizer-2
DarkLight1337 bf3281c
modify llava_next
ywang96 56e2d3b
Update comment
DarkLight1337 d2f8c6d
Update docs
DarkLight1337 7c197d2
Use dynamic image feature size calculation
DarkLight1337 f5ffd3e
Fix phi3v not handling `image_sizes` correctly
DarkLight1337 66aad21
Apply formatter
DarkLight1337 d1c68c0
Merge branch 'mm-image-tokenizer' into mm-image-tokenizer-2
DarkLight1337 5f32d53
Add see also
DarkLight1337 15df4ef
Update examples prompt format
DarkLight1337 f2e4633
Merge branch 'upstream' into mm-image-tokenizer
DarkLight1337 095e008
Merge branch 'mm-image-tokenizer' into mm-image-tokenizer-2
DarkLight1337 a6e3162
Merge branch 'upstream' into mm-image-tokenizer
DarkLight1337 28922af
Merge branch 'mm-image-tokenizer' into mm-image-tokenizer-2
DarkLight1337 ce06541
Fix config
DarkLight1337 cdcc2d4
Fix config
DarkLight1337 4212abf
Update docs
DarkLight1337 07c08e3
Update docs
DarkLight1337 f3f5854
Fix `MultiModalInputs` not working in Python 3.8
DarkLight1337 bebf9e7
Fix `_ImageAssets` not working in Python 3.8
DarkLight1337 7e80ecc
Merge branch 'upstream' into mm-image-tokenizer
DarkLight1337 487d742
Merge branch 'upstream' into mm-image-tokenizer
DarkLight1337 36f72b6
Merge branch 'mm-image-tokenizer' into mm-image-tokenizer-2
DarkLight1337 43350b8
update example
ywang96 57791de
update doc
ywang96 b2b1e11
Merge branch 'mm-image-tokenizer' into mm-image-tokenizer-2
DarkLight1337 fbc5f70
Update docs
DarkLight1337 4292ccb
Merge branch 'upstream' into mm-image-tokenizer-2
DarkLight1337 5d23a96
Apply formatter
DarkLight1337 78064e0
Fix OpenAI server not working for phi3v
DarkLight1337 4cb809c
Preemptively handle upcoming models
DarkLight1337 754e238
Add more models
DarkLight1337 9edb53c
Update feature size for dummy data
DarkLight1337 91d6c1e
Merge branch 'main' of https://github.com/vllm-project/vllm into remo…
xwjiang2010 f84b793
format
xwjiang2010 a934663
ExternalMultiModalDataDict
xwjiang2010 2144d3a
mention schema
xwjiang2010 2795b16
Use a less strict check
DarkLight1337 86ffd60
Fix phi3v test
DarkLight1337 f339dd1
Update default length as the dummy image feature size is increased
DarkLight1337 59a7a4c
Raise full error if output is completely different
DarkLight1337 62952e1
Fix phi3v not using input processor
DarkLight1337 0ce3ecb
Move size factors outside
DarkLight1337 b43e8c3
Apply formatter
DarkLight1337 9023794
Fix some outputs not being checked
DarkLight1337 fc5549c
Merge branch 'upstream' into mm-image-tokenizer-2
DarkLight1337 f6c8061
Also test no image
DarkLight1337 15cc847
Merge branch 'upstream' into mm-image-tokenizer-2
DarkLight1337 235c8a9
Batch by size factors
DarkLight1337 b98d924
Factor out xfail code
DarkLight1337 2c2558b
Fix unused args
DarkLight1337 ec28eca
Check logprobs instead of xfailing
DarkLight1337 5a337f5
Merge branch 'upstream' into mm-image-tokenizer-2
DarkLight1337 2eb3490
Fix different scales not being in the same batch
DarkLight1337 6301a52
Apply suggestions from code review
DarkLight1337 14f10fc
Add link
DarkLight1337 7c335c3
Use `self.multi_modal_projector` directly
DarkLight1337 33c860e
Allow users to send image token formatted prompt directly
DarkLight1337 e03bc57
Factor out the code for placeholder token IDs
DarkLight1337 b270ac3
Remove `-rx` flag
DarkLight1337 3161221
Fix distributed tests
DarkLight1337 85d108a
Fix string mismatch warning
DarkLight1337 d648e32
Relax phi3v test; add TODO for llava tests
DarkLight1337 fde5f26
Fix distributed tests
DarkLight1337 d432934
address comments
xwjiang2010 83cfada
Merge branch 'main' of https://github.com/vllm-project/vllm into remo…
xwjiang2010 ab347bc
format
xwjiang2010 404700f
rm ctx
xwjiang2010 6a4014e
Merge branch 'upstream' into mm-image-tokenizer-2
DarkLight1337 95a1fc5
Fix distributed test
DarkLight1337 1e87823
Update docs about prompt formatting
DarkLight1337 55ab3e4
Remove unused parameter
DarkLight1337 21da5b8
Remove unused import
DarkLight1337 525fe8f
Fix distributed test
DarkLight1337 04ebb68
rm ImageData and MultiModalData
xwjiang2010 31b8b09
rm external
xwjiang2010 a4b5617
comments
xwjiang2010 045674d
fix dist gpu test.
xwjiang2010 c8fa150
address comments
xwjiang2010 58ab8e9
Further avoid cuda init
DarkLight1337 6975caa
Add warnings for repeated image tokens
DarkLight1337 b1f1813
docs
xwjiang2010 b8b636d
Update vllm/multimodal/base.py
xwjiang2010 2c1d291
format
xwjiang2010 b6401d3
Reword
DarkLight1337 0f6f64c
Merge branch 'remove_image_features_2' of https://github.com/xwjiang2…
DarkLight1337 89f1103
Remove useless test
DarkLight1337 47fbdba
Unify test API between HfRunner and VllmRunner
DarkLight1337 c1c5a4d
Fix import error
DarkLight1337 fde4b25
Fix attribute error
DarkLight1337 4278fed
fix import error
ywang96 d9a2908
update llava next example
ywang96 d61e8af
Merge branch 'remove_image_features_2' of https://github.com/xwjiang2…
DarkLight1337 abd56fc
Update comments
DarkLight1337 ce2516e
Merge branch 'upstream' into mm-image-tokenizer-2
DarkLight1337 38042ab
Remove some unnecessary deferred imports
DarkLight1337 7a6d895
Merge branch 'upstream' into mm-image-tokenizer-2
DarkLight1337 9a49d2c
Use more precise type annotation
DarkLight1337 ac6f4fa
Fix wrong feature size
DarkLight1337 3f95778
Fix wrong image
DarkLight1337 90e80c4
Remove unnecessary lazy import
DarkLight1337 ea622c7
Check for conflicting kwargs in `map_input`
DarkLight1337 18740c2
Avoid unnecessary processing
DarkLight1337 a0db2c7
Update doc
DarkLight1337 526a871
Avoid cuda init
DarkLight1337 a5174da
Remove unused logger
DarkLight1337 6cf34e4
Remove unnecessary deferred imports
DarkLight1337 feff395
Merge branch 'upstream' into mm-image-tokenizer-2
DarkLight1337 aacb5d0
Fix typo
DarkLight1337 13f43bd
Address comments
DarkLight1337 00e9e39
Add comment
DarkLight1337 288bfb9
Merge branch 'main' into mm-image-tokenizer-2
ywang96 284fca8
Merge branch 'upstream' into mm-image-tokenizer-2
DarkLight1337 a231eaf
Update XPU runner's multimodal logic
DarkLight1337 ec74121
Fix unused import
DarkLight1337 d16d3c8
Fix feature size calculation
DarkLight1337 aaa0f1f
Add extra image to test
DarkLight1337 cc540c3
Support multimodal data for neuron and tpu
DarkLight1337 48489ef
Fix broadcasting
DarkLight1337 2adc41f
Fix OpenVINO model runner for multimodal data
DarkLight1337 0e6845f
Cleanup
DarkLight1337 File filter
Filter by extension
Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
There are no files selected for viewing
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
| Original file line number | Diff line number | Diff line change |
|---|---|---|
| @@ -0,0 +1,123 @@ | ||
| .. _adding_a_new_multimodal_model: | ||
|
|
||
| Adding a New Multimodal Model | ||
| ============================= | ||
|
|
||
| This document provides a high-level guide on integrating a :ref:`multi-modal model <multi_modality>` into vLLM. | ||
|
|
||
| .. note:: | ||
| The complexity of adding a new model depends heavily on the model's architecture. | ||
| The process is considerably straightforward if the model shares a similar architecture with an existing model in vLLM. | ||
| However, for models that include new operators (e.g., a new attention mechanism), the process can be a bit more complex. | ||
|
|
||
| .. tip:: | ||
| If you are encountering issues while integrating your model into vLLM, feel free to open an issue on our `GitHub <https://github.com/vllm-project/vllm/issues>`_ repository. | ||
| We will be happy to help you out! | ||
|
|
||
|
|
||
| 1. Set up the base vLLM model | ||
| ----------------------------- | ||
|
|
||
| As usual, follow :ref:`these steps <adding_a_new_model>` to implement the model in vLLM, but note the following: | ||
|
|
||
| - You should additionally implement the :class:`~vllm.model_executor.models.interfaces.SupportsVision` interface. | ||
|
|
||
| .. code-block:: diff | ||
| + from vllm.model_executor.models.interfaces import SupportsVision | ||
| - class YourModelForImage2Seq(nn.Module): | ||
| + class YourModelForImage2Seq(nn.Module, SupportsVision): | ||
| .. note:: | ||
| The model class does not have to be named :code:`*ForCausalLM`. | ||
| Check out `the HuggingFace Transformers documentation <https://huggingface.co/docs/transformers/model_doc/auto#multimodal>`__ for some examples. | ||
|
|
||
| - While implementing the :meth:`~torch.nn.Module.forward` method, reserve a keyword parameter | ||
| for each input tensor that corresponds to a multi-modal input, as shown in the following example: | ||
|
|
||
| .. code-block:: diff | ||
| def forward( | ||
| self, | ||
| input_ids: torch.Tensor, | ||
| positions: torch.Tensor, | ||
| kv_caches: List[torch.Tensor], | ||
| attn_metadata: AttentionMetadata, | ||
| + pixel_values: torch.Tensor, | ||
| ) -> SamplerOutput: | ||
| 2. Register input mappers | ||
| ------------------------- | ||
|
|
||
| For each modality type to support, decorate the model class with :meth:`MULTIMODAL_REGISTRY.register_input_mapper <vllm.multimodal.MultiModalRegistry.register_input_mapper>`. | ||
| This decorator accepts a function that maps multi-modal inputs to the keyword arguments you have previously defined in :meth:`~torch.nn.Module.forward`. | ||
|
|
||
| .. code-block:: diff | ||
| from vllm.model_executor.models.interfaces import SupportsVision | ||
| + from vllm.multimodal import MULTIMODAL_REGISTRY | ||
| + @MULTIMODAL_REGISTRY.register_image_feature_input_mapper() | ||
| + @MULTIMODAL_REGISTRY.register_image_pixel_input_mapper() | ||
| class YourModelForImage2Seq(nn.Module, SupportsVision): | ||
| A default mapper is available for each modality in the core vLLM library. This input mapper will be used if you do not provide your own function. | ||
|
|
||
| .. seealso:: | ||
| :ref:`input_processing_pipeline` | ||
|
|
||
|
|
||
| 3. (Optional) Register dummy data | ||
| --------------------------------- | ||
|
|
||
| During startup, dummy data is passed to the vLLM model to allocate memory. This only consists of text input by default, which may not be applicable to multi-modal models. | ||
| In such cases, you can define your own dummy data by registering a factory method via :meth:`INPUT_REGISTRY.register_dummy_data <vllm.inputs.registry.InputRegistry.register_dummy_data>`. | ||
|
|
||
| .. code-block:: diff | ||
| from vllm.inputs import INPUT_REGISTRY | ||
| from vllm.model_executor.models.interfaces import SupportsVision | ||
| from vllm.multimodal import MULTIMODAL_REGISTRY | ||
| @MULTIMODAL_REGISTRY.register_image_feature_input_mapper() | ||
| @MULTIMODAL_REGISTRY.register_image_pixel_input_mapper() | ||
| + @INPUT_REGISTRY.register_dummy_data(<your_dummy_data_factory>) | ||
| class YourModelForImage2Seq(nn.Module, SupportsVision): | ||
| Here are some examples: | ||
|
|
||
| - Image inputs (static feature size): `LLaVA-1.5 Model <https://github.com/vllm-project/vllm/blob/main/vllm/model_executor/models/llava.py>`__ | ||
| - Image inputs (dynamic feature size): `LLaVA-NeXT Model <https://github.com/vllm-project/vllm/blob/main/vllm/model_executor/models/llava_next.py>`__ | ||
|
|
||
| .. seealso:: | ||
| :ref:`input_processing_pipeline` | ||
|
|
||
|
|
||
| 4. (Optional) Register input processor | ||
| -------------------------------------- | ||
|
|
||
| Sometimes, there is a need to process inputs at the :class:`~vllm.LLMEngine` level before they are passed to the model executor. | ||
| You can register input processors via :meth:`INPUT_REGISTRY.register_input_processor <vllm.inputs.registry.InputRegistry.register_input_processor>`. | ||
DarkLight1337 marked this conversation as resolved.
Outdated
Show resolved
Hide resolved
|
||
|
|
||
| .. code-block:: diff | ||
| from vllm.inputs import INPUT_REGISTRY | ||
| from vllm.model_executor.models.interfaces import SupportsVision | ||
| from vllm.multimodal import MULTIMODAL_REGISTRY | ||
| @MULTIMODAL_REGISTRY.register_image_feature_input_mapper() | ||
| @MULTIMODAL_REGISTRY.register_image_pixel_input_mapper() | ||
| @INPUT_REGISTRY.register_dummy_data(<your_dummy_data_factory>) | ||
| + @INPUT_REGISTRY.register_input_processor(<your_input_processor>) | ||
| class YourModelForImage2Seq(nn.Module, SupportsVision): | ||
| A common use case of input processors is inserting placeholder tokens to leverage the vLLM framework for attention mask generation. | ||
| Here are some examples: | ||
|
|
||
| - Insert static number of image tokens: `LLaVA-1.5 Model <https://github.com/vllm-project/vllm/blob/main/vllm/model_executor/models/llava.py>`__ | ||
| - Insert dynamic number of image tokens: `LLaVA-NeXT Model <https://github.com/vllm-project/vllm/blob/main/vllm/model_executor/models/llava_next.py>`__ | ||
|
|
||
| .. seealso:: | ||
| :ref:`input_processing_pipeline` | ||
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Oops, something went wrong.
Add this suggestion to a batch that can be applied as a single commit.
This suggestion is invalid because no changes were made to the code.
Suggestions cannot be applied while the pull request is closed.
Suggestions cannot be applied while viewing a subset of changes.
Only one suggestion per line can be applied in a batch.
Add this suggestion to a batch that can be applied as a single commit.
Applying suggestions on deleted lines is not supported.
You must change the existing code in this line in order to create a valid suggestion.
Outdated suggestions cannot be applied.
This suggestion has been applied or marked resolved.
Suggestions cannot be applied from pending reviews.
Suggestions cannot be applied on multi-line comments.
Suggestions cannot be applied while the pull request is queued to merge.
Suggestion cannot be applied right now. Please check back later.
Uh oh!
There was an error while loading. Please reload this page.