Add FAST #35476

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

Sign up for GitHub

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Jump to bottom

Open

jadechoghari wants to merge 162 commits into huggingface:main from jadechoghari:add-fast

Contributor

jadechoghari commented Jan 1, 2025 •

edited

Loading

What does this PR do?

This PR adds FAST: Faster Arbitrarily-Shaped Text Detector with Minimalist Kernel Representation.

It should be merged after the first pr for its backbone, textnet, is merged: #34979

Colab to replicate the author's logits: https://colab.research.google.com/drive/1bdkNiRI2bl7rBcgGYXe2UeobX78TUGYY?usp=sharing

What's left:

Fix make quality failing due to a doc issue
Complete full model documentation

raghavanone added 30 commits

November 9, 2023 07:17

WIP

10cd7a8


          Add config and modeling for Fast model

77b58fd


          Refactor modeling and add tests

ed1f4e0


          More changes

f53e255

WIP

d1511dd


          Add tests

f07268a


          Add conversion script

1e90463


          Add conversion scripts, integration tests, image processor

087e6cd


          Fix style and copies

b1a323e


          Add fast model to init

67126d2


          Add fast model in docs and other places

39d1442


          Fix import of cv2

cf539f0


          Rename image processing method

110299d


          Fix build

5aed765


          Fix Build

3d28bb4


          fix style and fix copies

8fb4df3


          Fix build

53ac622


          Fix build

0bbd05c


          Fix Build

acd68e6


          Clean up docstrings

403b388


          Fix Build

b59b1e3


          Fix Build

aca107b


          Fix Build

539b0c6


          Fix build

51ec119


          Add test for image_processing_fast and add documentation tests

e8fc511


          some refactorings

1b77ee6


          Fix failing tests

6726b21


          Incorporate PR feedbacks


          Incorporate PR feedbacks

e605f47


          Incorporate PR feedbacks

eec235d


          Merge branch 'main' into add-fast

b9dfe12

jadechoghari requested a review from qubvel

April 16, 2025 19:50


          Merge branch 'main' into add-fast

c972d2e

NielsRogge mentioned this pull request

TrOCR (image-to-text) produces incorrect output (':') on 12th Gen Intel CPU (i7-1260P) even with simple input #37639

Closed

4 tasks

NielsRogge and others added 5 commits

April 21, 2025 14:46


          Merge branch 'main' into add-fast

5e7bf3f


          Merge branch 'main' into add-fast

3e18e2b


          regenerated image_processing_fast.py from modular_fast.py and fixed b…

4f3172b

…ug in test file test_image_processing_fast


          fixed up formating in test_image_processing_fast.py

512f603


          Merge branch 'main' into add-fast

9d5087b

jadechoghari commented

View reviewed changes

src/transformers/models/fast/convert_fast_original_to_hf.py Outdated

Comment on lines 35 to 93

+              rename_key_mappings = {
+                  "module.backbone": "backbone.textnet",
+                  "first_conv": "stem",
+                  "bn": "batch_norm",
+                  "ver": "vertical",
+                  "hor": "horizontal",
+                  "module.neck": "neck",
+                  "module.det_head": "text_detection_head",
+                  "neck.reduce_layer1": "neck.reduce_layers.0",
+                  "neck.reduce_layer2": "neck.reduce_layers.1",
+                  "neck.reduce_layer3": "neck.reduce_layers.2",
+                  "neck.reduce_layer4": "neck.reduce_layers.3",
+                  "final.conv.weight": "final_conv.weight",
+                  "neck.reduce_layers.1.rbr_identity.weight": "neck.reduce_layers.1.identity.weight",
+                  "neck.reduce_layers.1.rbr_identity.bias": "neck.reduce_layers.1.identity.bias",
+                  "neck.reduce_layers.1.rbr_identity.running_mean": "neck.reduce_layers.1.identity.running_mean",
+                  "neck.reduce_layers.1.rbr_identity.running_var": "neck.reduce_layers.1.identity.running_var",
+                  "neck.reduce_layers.1.rbr_identity.num_batches_tracked": "neck.reduce_layers.1.identity.num_batches_tracked",
+              }
+              def get_model_config(model_config, model_type, size, min_area, bounding_box_type, loss_bg):
+                  model_config_map = {
+                      "tiny": {
+                          "config_url": tiny_config_url,
+                          "expected_logits": torch.tensor([-9.9181, -13.0701, -12.5045, -12.6523]),
+                          "expected_boxes": [(151, 151), (160, 56), (355, 74), (346, 169)],
+                      },
+                      "small": {
+                          "config_url": small_config_url,
+                          "expected_logits": torch.tensor([-13.1852, -17.2011, -16.9553, -16.8269]),
+                          "expected_boxes": [(154, 151), (155, 61), (351, 63), (350, 153)],
+                      },
+                      "base": {
+                          "config_url": base_config_url,
+                          "expected_logits": torch.tensor([-28.7481, -34.1635, -25.7430, -22.0260]),
+                          "expected_boxes": [(157, 149), (158, 66), (348, 68), (347, 151)],
+                      },
+                  }
+                  if model_type not in model_config_map:
+                      raise ValueError(f"Unknown model type: {model_type}")
+                  logits_config = model_config_map[model_type]
+                  config = prepare_config(
+                      logits_config["config_url"],
+                      size,
+                      model_config["detection_head"]["pooling_size"],
+                      min_area,
+                      bounding_box_type,
+                      loss_bg,
+                  )
+                  return config, logits_config["expected_logits"], logits_config["expected_boxes"]
+              def prepare_config(size_config_url, size, pooling_size, min_area, bounding_box_type, loss_bg):
+                  config_dict = json.loads(requests.get(size_config_url).text)

Contributor Author

jadechoghari Jun 17, 2025

There is a much much simpler way to convert weigths now, see convert_mllama:

transformers/src/transformers/models/mllama/convert_mllama_weights_to_hf.py

Line 41 in 9d4657c

ORIGINAL_TO_CONVERTED_KEY_MAPPING = {

ORIGINAL_TO_CONVERTED_KEY_MAPPING = {

that is now standard in transformers!

src/transformers/models/fast/convert_fast_original_to_hf.py Show resolved Hide resolved

TeddyLiang01 added 2 commits

June 18, 2025 01:51


          changed the way weights are converted

d1fc481


          Added optional type to pass test

6b68226

jadechoghari commented

View reviewed changes

Contributor Author

jadechoghari left a comment •

edited

Loading

Have you tested the convert file?

python src/transformers/models/fast/convert_fast_original_to_hf.py --checkpoint_url https://github.com/czczup/FAST/releases/download/release/fast_tiny_ic17mlt_640.pth --checkpoint_config_filename fast_tiny_ic17mlt_640.py

And for different sizes :
Replace ‘tiny’ with base and small to test

src/transformers/models/fast/convert_fast_original_to_hf.py Outdated Show resolved Hide resolved

TeddyLiang01 added 6 commits

June 27, 2025 13:39


          Merge remote-tracking branch 'origin/main' into add-fast

7a065a5


          Regenerated Fast modular code and applied typing fixes

9e6267b


          fixed style

e596a44


          added back bounding_box_type="boxes"

19ff44e


          fixed style

1a83b40


          changed expected boxes to match result

6a94099

Contributor Author

jadechoghari commented Jul 3, 2025

And have you tested the convert file works? on all three ckpts :)?

jadechoghari commented

View reviewed changes

src/transformers/models/fast/convert_fast_original_to_hf.py Outdated

Comment on lines 84 to 97

    
                          "expected_boxes": [(148, 151), (157, 53), (357, 72), (347, 170)],

                      },

                      "small": {

                          "config_url": small_config_url,

                          "expected_logits": torch.tensor([-13.1852, -17.2011, -16.9553, -16.8269]),

                          "expected_boxes": [(154, 151), (155, 61), (351, 63), (350, 153)],

                          "expected_boxes": [(151, 152), (152, 58), (352, 60), (351, 154)],

                      },

                      "base": {

                          "config_url": base_config_url,

                          "expected_logits": torch.tensor([-28.7481, -34.1635, -25.7430, -22.0260]),

                          "expected_boxes": [(157, 149), (158, 66), (348, 68), (347, 151)],

                          "expected_boxes": [(154, 150), (155, 63), (349, 65), (349, 152)],

                      },

                  }

Contributor Author

jadechoghari Jul 3, 2025

Why expected_boxes are changed here ?, we must make sure the boxes match the origninal implementation, and I recall the one you changed used to match the og fast repo!

Contributor Author

jadechoghari Jul 3, 2025

It's mentioned on top of the PR:
Colab to replicate the author's logits: https://colab.research.google.com/drive/1bdkNiRI2bl7rBcgGYXe2UeobX78TUGYY?usp=sharing

src/transformers/models/fast/convert_fast_original_to_hf.py Outdated Show resolved Hide resolved

jadechoghari and others added 5 commits

July 3, 2025 11:04


          Merge branch 'main' into add-fast

1eaf088


          Merge branch 'main' into add-fast

41b624d


          Revert "changed expected boxes to match result"

This reverts commit 6a94099.


          changed to output_type

7a87aee


          style formatted

b9bfd2b

Contributor

github-actions bot commented Jul 4, 2025

[For maintainers] Suggested jobs to run (before merge)

run-slow: auto, fast

TeddyLiang01 reviewed

View reviewed changes

src/transformers/models/fast/convert_fast_original_to_hf.py Outdated Show resolved Hide resolved

TeddyLiang01 commented Jul 4, 2025

For each of the 3 types I am off by a little bit. The logits are correct, could it be the post processing and rounding is different at the end? Does the result have to match the results from Colab exactly?

python src/transformers/models/fast/convert_fast_original_to_hf.py --checkpoint_url https://github.com/czczup/FAST/releases/download/release/fast_tiny_ic17mlt_640.pth --checkpoint_config_filename fast_tiny_ic17mlt_640.py
Traceback (most recent call last):
File "/Users/teddy/transformers/src/transformers/models/fast/convert_fast_original_to_hf.py", line 355, in
convert_fast_checkpoint(
File "/Users/teddy/transformers/src/transformers/models/fast/convert_fast_original_to_hf.py", line 311, in convert_fast_checkpoint
raise ValueError(f"Expected {expected_slice_boxes}, but got {text_locations[0]['boxes'][0]}")
ValueError: Expected [(151, 151), (160, 56), (355, 74), (346, 169)], but got [(148, 151), (157, 53), (357, 72), (347, 170)]

Contributor Author

jadechoghari commented Jul 4, 2025

If they don;t match, you should at the post processing logic!

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

New model Vision