Skip to content

Conversation

@yonigozlan
Copy link
Member

@yonigozlan yonigozlan commented Feb 13, 2025

What does this PR do?

Refactor slow image processor of Got-OCR 2 in order to make only one call to preprocess, and not a separate call to crop_to_patches
Also add a fast image processor, with some nice speedups :)

def benchmark_image_processor(image_processor, images,benchmark_it=10, warmup_it=10):
    # warm up
    for _ in range(warmup_it):
        _ = image_processor(images=images, return_tensors="pt", device=device)
    # benchmark
    start_time = time.time()
    for _ in range(benchmark_it):
        _ = image_processor(images=images, return_tensors="pt", device=device)
    end_time = time.time()

    return (end_time - start_time) / benchmark_it

image = Image.open(requests.get("http://images.cocodataset.org/val2017/000000039769.jpg", stream=True).raw)
checkpoint = "stepfun-ai/GOT-OCR-2.0-hf"
image_processor_fast = AutoImageProcessor.from_pretrained(checkpoint, use_fast=True)
image_processor_slow = AutoImageProcessor.from_pretrained(checkpoint)
device = "cuda"
batch_size = 4

slow_time_one = benchmark_image_processor(image_processor_slow, image, benchmark_it=10)
fast_time_one = benchmark_image_processor(image_processor_fast, image, benchmark_it=10)
slow_time_batch = benchmark_image_processor(image_processor_slow, [image]*batch_size, benchmark_it=10)
fast_time_batch = benchmark_image_processor(image_processor_fast, [image]*batch_size, benchmark_it=10)

print(f"slow_time_one: {slow_time_one}, fast_time_one: {fast_time_one}, speedup: {slow_time_one/fast_time_one}")
print(f"slow_time_batch: {slow_time_batch}, fast_time_batch: {fast_time_batch}, speedup: {slow_time_batch/fast_time_batch}")

CPU:

slow_time_one: 0.03414120674133301, fast_time_one: 0.007321953773498535, speedup: 4.662854723954348
slow_time_batch: 0.1300074815750122, fast_time_batch: 0.030838775634765624, speedup: 4.215714758417654

CUDA:

slow_time_one: 0.03233206272125244, fast_time_one: 0.0006550788879394531, speedup: 49.35598340369778
slow_time_batch: 0.1255408763885498, fast_time_batch: 0.0017477989196777344, speedup: 71.82798603153816

Would be great to merge soon as this image processor will also be used for InternVL!

Who can review?

Anyone in the community is free to review the PR once the tests have passed. Feel free to tag
members/contributors who may be interested in your PR.

@HuggingFaceDocBuilderDev

The docs for this PR live here. All of your documentation changes will be reflected on that endpoint. The docs are available until 30 days after the last update.

Copy link
Collaborator

@ArthurZucker ArthurZucker left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Just 1 comment sorry that it took so long

@yonigozlan yonigozlan merged commit 2c5d038 into huggingface:main Mar 1, 2025
23 checks passed
garrett361 pushed a commit to garrett361/transformers that referenced this pull request Mar 4, 2025
…#36185)

* refactor image processor slow got ocr

* add working image processor fast

* fix fast image processor, update doc

* use one big loop for processing patches
garrett361 pushed a commit to garrett361/transformers that referenced this pull request Mar 4, 2025
…#36185)

* refactor image processor slow got ocr

* add working image processor fast

* fix fast image processor, update doc

* use one big loop for processing patches
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

3 participants