@@ -82,7 +82,7 @@ For more information, see [Setting Up ExecuTorch](https://pytorch.org/executorch
82
82
83
83
## Running a Large Language Model Locally
84
84
85
- This example uses Karpathy’s [ NanoGPT ] ( https://github.com/karpathy/nanoGPT ) , which is a minimal implementation of
85
+ This example uses Karpathy’s [ nanoGPT ] ( https://github.com/karpathy/nanoGPT ) , which is a minimal implementation of
86
86
GPT-2 124M. This guide is applicable to other language models, as ExecuTorch is model-invariant.
87
87
88
88
There are two steps to running a model with ExecuTorch:
@@ -100,7 +100,7 @@ ExecuTorch runtime.
100
100
101
101
Exporting takes a PyTorch model and converts it into a format that can run efficiently on consumer devices.
102
102
103
- For this example, you will need the NanoGPT model and the corresponding tokenizer vocabulary.
103
+ For this example, you will need the nanoGPT model and the corresponding tokenizer vocabulary.
104
104
105
105
::::{tab-set}
106
106
:::{tab-item} curl
@@ -377,12 +377,12 @@ specific hardware (delegation), and because it is doing all of the calculations
377
377
While ExecuTorch provides a portable, cross-platform implementation for all
378
378
operators, it also provides specialized backends for a number of different
379
379
targets. These include, but are not limited to, x86 and ARM CPU acceleration via
380
- the XNNPACK backend, Apple acceleration via the CoreML backend and Metal
380
+ the XNNPACK backend, Apple acceleration via the Core ML backend and Metal
381
381
Performance Shader (MPS) backend, and GPU acceleration via the Vulkan backend.
382
382
383
383
Because optimizations are specific to a given backend, each pte file is specific
384
384
to the backend(s) targeted at export. To support multiple devices, such as
385
- XNNPACK acceleration for Android and CoreML for iOS, export a separate PTE file
385
+ XNNPACK acceleration for Android and Core ML for iOS, export a separate PTE file
386
386
for each backend.
387
387
388
388
To delegate to a backend at export time, ExecuTorch provides the ` to_backend() `
@@ -394,11 +394,11 @@ acceleration and optimization. Any portions of the computation graph not
394
394
delegated will be executed by the ExecuTorch operator implementations.
395
395
396
396
To delegate the exported model to the specific backend, we need to import its
397
- partitioner as well as edge compile config from ExecuTorch Codebase first, then
397
+ partitioner as well as edge compile config from ExecuTorch codebase first, then
398
398
call ` to_backend ` with an instance of partitioner on the ` EdgeProgramManager `
399
399
object ` to_edge ` function created.
400
400
401
- Here's an example of how to delegate NanoGPT to XNNPACK (if you're deploying to an Android Phone for instance):
401
+ Here's an example of how to delegate nanoGPT to XNNPACK (if you're deploying to an Android phone for instance):
402
402
403
403
``` python
404
404
# export_nanogpt.py
@@ -417,7 +417,7 @@ from torch._export import capture_pre_autograd_graph
417
417
418
418
from model import GPT
419
419
420
- # Load the NanoGPT model.
420
+ # Load the nanoGPT model.
421
421
model = GPT .from_pretrained(' gpt2' )
422
422
423
423
# Create example inputs. This is used in the export process to provide
@@ -523,7 +523,7 @@ For more information regarding backend delegateion, see the ExecuTorch guides
523
523
for the
524
524
[ XNNPACK Backend] ( https://pytorch.org/executorch/stable/tutorial-xnnpack-delegate-lowering.html )
525
525
and
526
- [ CoreML Backend] ( https://pytorch.org/executorch/stable/build-run-coreml.html ) .
526
+ [ Core ML Backend] ( https://pytorch.org/executorch/stable/build-run-coreml.html ) .
527
527
528
528
## Quantization
529
529
@@ -633,15 +633,15 @@ df = delegation_info.get_operator_delegation_dataframe()
633
633
print (tabulate(df, headers = " keys" , tablefmt = " fancy_grid" ))
634
634
```
635
635
636
- For NanoGPT targeting the XNNPACK backend, you might see the following:
636
+ For nanoGPT targeting the XNNPACK backend, you might see the following:
637
637
```
638
638
Total delegated subgraphs: 86
639
639
Number of delegated nodes: 473
640
640
Number of non-delegated nodes: 430
641
641
```
642
642
643
643
644
- | | op_type | occurrences_in_delegated_graphs | occurrences_in_non_delegated_graphs |
644
+ | | op_type | # in_delegated_graphs | # in_non_delegated_graphs |
645
645
| ----| ---------------------------------| ------- | -----|
646
646
| 0 | aten__ softmax_default | 12 | 0 |
647
647
| 1 | aten_add_tensor | 37 | 0 |
@@ -663,7 +663,7 @@ print(print_delegated_graph(graph_module))
663
663
This may generate a large amount of output for large models. Consider using "Control+F" or "Command+F" to locate the operator you’re interested in
664
664
(e.g. “aten_view_copy_default”). Observe which instances are not under lowered graphs.
665
665
666
- In the fragment of the output for NanoGPT below, observe that embedding and add operators are delegated to XNNPACK while the sub operator is not.
666
+ In the fragment of the output for nanoGPT below, observe that embedding and add operators are delegated to XNNPACK while the sub operator is not.
667
667
668
668
```
669
669
%aten_unsqueeze_copy_default_22 : [num_users=1] = call_function[target=executorch.exir.dialects.edge._ops.aten.unsqueeze_copy.default](args = (%aten_arange_start_step_23, -2), kwargs = {})
@@ -879,7 +879,7 @@ def replace_linear_with_custom_linear(module):
879
879
880
880
The remaining steps are the same as the normal flow. Now you can run this module in eager mode as well as export to ExecuTorch.
881
881
882
- ## How to build Mobile Apps
882
+ ## How to Build Mobile Apps
883
883
You can execute an LLM using ExecuTorch on iOS and Android.
884
884
885
885
** For iOS see the [ iLLaMA App] ( https://pytorch.org/executorch/main/llm/llama-demo-ios.html ) .**
0 commit comments