Skip to content

Partition Mutable Buffer as Core ML State #5165

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Merged
merged 7 commits into from
Sep 10, 2024

Conversation

YifanShenSZ
Copy link
Collaborator

@YifanShenSZ YifanShenSZ commented Sep 8, 2024

#4830 opens the door toward delegate mutable buffer. Now in this PR, we tag the mutable buffers in Core ML partitioner, to delegate them as Core ML state

With #5143, we are able to run the stateful Core ML delegate

Copy link

pytorch-bot bot commented Sep 8, 2024

🔗 Helpful Links

🧪 See artifacts and rendered test results at hud.pytorch.org/pr/pytorch/executorch/5165

Note: Links to docs will display an error until the docs builds have been completed.

❌ 1 New Failure, 1 Unrelated Failure

As of commit 0bfa422 with merge base 13da62b (image):

NEW FAILURE - The following job has failed:

FLAKY - The following job failed but was likely due to flakiness present on trunk:

This comment was automatically generated by Dr. CI and updates every 15 minutes.

@facebook-github-bot facebook-github-bot added the CLA Signed This label is managed by the Facebook bot. Authors need to sign the CLA before a PR can be reviewed. label Sep 8, 2024
@YifanShenSZ
Copy link
Collaborator Author

@cymbalrush @cccclai

@facebook-github-bot
Copy link
Contributor

@cccclai has imported this pull request. If you are a Meta employee, you can view this diff on Phabricator.

@cccclai
Copy link
Contributor

cccclai commented Sep 8, 2024

Thanks! lintrunner seems failing. Could you help fixing it?

@YifanShenSZ
Copy link
Collaborator Author

Thanks! lintrunner seems failing. Could you help fixing it?

Fixed

@cccclai
Copy link
Contributor

cccclai commented Sep 8, 2024

The coreml end to end CI breaks in this PR, is it expected?

2024-09-08T04:30:41.7032910Z I 00:00:00.006578 executorch:cpuinfo_utils.cpp:62] Reading file /sys/devices/soc0/image_version
2024-09-08T04:30:41.7033530Z I 00:00:00.006641 executorch:cpuinfo_utils.cpp:78] Failed to open midr file /sys/devices/soc0/image_version
2024-09-08T04:30:41.7034130Z I 00:00:00.006656 executorch:cpuinfo_utils.cpp:158] Number of efficient cores 4
2024-09-08T04:30:41.7034630Z I 00:00:00.006658 executorch:main.cpp:65] Resetting threadpool with num threads = 4
2024-09-08T04:30:41.7035260Z I 00:00:00.010802 executorch:runner.cpp:58] Creating LLaMa runner: model_path=llama2.pte, tokenizer_path=tokenizer.bin
2024-09-08T04:30:41.7040040Z E 00:00:00.907857 executorch:coreml_backend_delegate.mm:163] CoreMLBackend: Failed to init the model.
2024-09-08T04:30:41.7040640Z E 00:00:00.907866 executorch:method.cpp:106] Init failed for backend CoreMLBackend: 0x23
2024-09-08T04:30:41.7041030Z ++ date +%H:%M:%S

Does it mean it needs @cymbalrush pr for the CI job?

@YifanShenSZ
Copy link
Collaborator Author

YifanShenSZ commented Sep 8, 2024

Llama runner runs well on my local. Is CI machine MacOS 14?

State is a new feature in MacOS 15, so failure on older MacOS is expected. I can make the stateful llama no longer the default option.

@cccclai
Copy link
Contributor

cccclai commented Sep 8, 2024

Is CI machine MacOS 14?

Yeah CI machine is still MacOS 14. Looks like it's passing now. Thanks!

@facebook-github-bot
Copy link
Contributor

@cccclai has imported this pull request. If you are a Meta employee, you can view this diff on Phabricator.

Copy link
Contributor

@cccclai cccclai left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Looks great. Thanks! Just some minor comments

@cccclai
Copy link
Contributor

cccclai commented Sep 8, 2024

Looks like there are still lint error:

>>> Lint for backends/apple/coreml/test/test_coreml_partitioner.py:

  Warning (FLAKE8) F401
    'pytest' imported but unused
    See https://www.flake8rules.com/rules/F401.html.

          8  |
          9  |import executorch.exir
         10  |
    >>>  11  |import pytest
         12  |
         13  |import torch
         14  |import torchvision

…eful llama until CI machine upgraded to MacOS 15
@YifanShenSZ
Copy link
Collaborator Author

YifanShenSZ commented Sep 8, 2024

Lint error fixed, comments addressed

@facebook-github-bot
Copy link
Contributor

@cccclai has imported this pull request. If you are a Meta employee, you can view this diff on Phabricator.

@cccclai
Copy link
Contributor

cccclai commented Sep 9, 2024

The test failing in trunk / test-coreml-model / macos-job (push) seems legit. Could you help fixing it?

+ python -m examples.apple.coreml.scripts.export --model_name=mobilebert
  Torch version 2.5.0.dev20240901 has not been tested with coremltools. You may run into unexpected errors. Torch 2.3.0 is the most recent version that has been tested.
  Traceback (most recent call last):
    File "<frozen runpy>", line 198, in _run_module_as_main
    File "<frozen runpy>", line 88, in _run_code
    File "/Users/ec2-user/runner/_work/executorch/executorch/pytorch/executorch/examples/apple/coreml/scripts/export.py", line 160, in <module>
      model, example_inputs, _ = EagerModelFactory.create_model(
                                 ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
    File "/Users/ec2-user/runner/_work/executorch/executorch/pytorch/executorch/examples/models/model_factory.py", line 38, in create_model
      module = importlib.import_module(
               ^^^^^^^^^^^^^^^^^^^^^^^^
    File "/Users/ec2-user/runner/_work/_temp/conda_environment_10762964118/lib/python3.11/importlib/__init__.py", line 126, in import_module
      return _bootstrap._gcd_import(name[level:], package, level)
             ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
    File "<frozen importlib._bootstrap>", line 1204, in _gcd_import
    File "<frozen importlib._bootstrap>", line 1176, in _find_and_load
    File "<frozen importlib._bootstrap>", line 1147, in _find_and_load_unlocked
    File "<frozen importlib._bootstrap>", line 690, in _load_unlocked
    File "<frozen importlib._bootstrap_external>", line 940, in exec_module
    File "<frozen importlib._bootstrap>", line 241, in _call_with_frames_removed
    File "/Users/ec2-user/runner/_work/executorch/executorch/pytorch/executorch/examples/models/mobilebert/__init__.py", line 7, in <module>
      from .model import MobileBertModelExample
    File "/Users/ec2-user/runner/_work/executorch/executorch/pytorch/executorch/examples/models/mobilebert/model.py", line 11, in <module>
      from transformers import AutoTokenizer, MobileBertModel  # @manual
      ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^

@YifanShenSZ
Copy link
Collaborator Author

YifanShenSZ commented Sep 9, 2024

Fixed, it's due to coremltools 8.0b2 starts to support numpy 2.0, but the mobilebert test is using an old transformers that requires numpy 1.x

Temporarily added a numpy downgrade in coreml install_requirements.sh. Will remove it once executorch migrates to numpy 2.0

@facebook-github-bot
Copy link
Contributor

@cccclai has imported this pull request. If you are a Meta employee, you can view this diff on Phabricator.

@cccclai cccclai merged commit f471556 into pytorch:main Sep 10, 2024
100 of 104 checks passed
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
ciflow/trunk CLA Signed This label is managed by the Facebook bot. Authors need to sign the CLA before a PR can be reviewed.
Projects
None yet
Development

Successfully merging this pull request may close these issues.

3 participants