Skip to content

Commit 400c45a

Browse files
Gasoonjiafacebook-github-bot
authored andcommitted
add more instructions and examples on Delegation (#2973)
Summary: as title. Differential Revision: D55988177
1 parent c322685 commit 400c45a

File tree

1 file changed

+120
-23
lines changed

1 file changed

+120
-23
lines changed

docs/source/llm/getting-started.md

+120-23
Original file line numberDiff line numberDiff line change
@@ -374,46 +374,102 @@ specific hardware (delegation), and because it is doing all of the calculations
374374

375375
## Delegation
376376

377-
While ExecuTorch provides a portable, cross-platform implementation for all operators, it also provides specialized
378-
backends for a number of different targets. These include, but are not limited to, x86 and ARM CPU acceleration via
379-
the XNNPACK backend, Apple acceleration via the CoreML backend and Metal Performance Shader (MPS) backend, and GPU
380-
acceleration via the Vulkan backend.
381-
382-
Because optimizations are specific to a given backend, each pte file is specific to the backend(s) targeted at
383-
export. To support multiple devices, such as XNNPACK acceleration for Android and CoreML for iOS, export a separate
384-
PTE file for each backend.
385-
386-
To delegate to a backend at export time, ExecuTorch provides the `to_backend()` function, which takes a backend-
387-
specific partitioner object. The partitioner is responsible for finding parts of the computation graph that can
388-
be accelerated by the target backend. Any portions of the computation graph not delegated will be executed by the
389-
portable or optimized ExecuTorch implementations.
390-
391-
To delegate to the XNNPACK backend, call `to_backend` with an instance of `XnnpackPartitioner()`.
377+
While ExecuTorch provides a portable, cross-platform implementation for all
378+
operators, it also provides specialized backends for a number of different
379+
targets. These include, but are not limited to, x86 and ARM CPU acceleration via
380+
the XNNPACK backend, Apple acceleration via the CoreML backend and Metal
381+
Performance Shader (MPS) backend, and GPU acceleration via the Vulkan backend.
382+
383+
Because optimizations are specific to a given backend, each pte file is specific
384+
to the backend(s) targeted at export. To support multiple devices, such as
385+
XNNPACK acceleration for Android and CoreML for iOS, export a separate PTE file
386+
for each backend.
387+
388+
To delegate to a backend at export time, ExecuTorch provides the `to_backend()`
389+
function in the `EdgeProgramManager` object, which takes a backend-specific
390+
partitioner object. The partitioner is responsible for finding parts of the
391+
computation graph that can be accelerated by the target backend,and
392+
`to_backend()` function will delegate matched part to given backend for
393+
acceleration and optimization. Any portions of the computation graph not
394+
delegated will be executed by the ExecuTorch operator implementations.
395+
396+
To delegate the exported model to the specific backend, we need to import its
397+
partitioner as well as edge compile config from Executorch Codebase first, then
398+
call `to_backend` with an instance of partitioner on the `EdgeProgramManager`
399+
object `to_edge` function created.
400+
401+
Here's an example of how to delegate NanoGPT to XNNPACK (if you're deploying to an Android Phone for instance):
392402

393403
```python
394404
# export_nanogpt.py
395405

406+
# Load partitioner for Xnnpack backend
396407
from executorch.backends.xnnpack.partition.xnnpack_partitioner import XnnpackPartitioner
408+
409+
# Model to be delegated to specific backend should use specific edge compile config
397410
from executorch.backends.xnnpack.utils.configs import get_xnnpack_edge_compile_config
411+
from executorch.exir import EdgeCompileConfig, to_edge
412+
413+
import torch
414+
from torch.export import export
415+
from torch.nn.attention import sdpa_kernel, SDPBackend
416+
from torch._export import capture_pre_autograd_graph
417+
418+
from model import GPT
419+
420+
# Load the NanoGPT model.
421+
model = GPT.from_pretrained('gpt2')
398422

399-
#...
423+
# Create example inputs. This is used in the export process to provide
424+
# hints on the expected shape of the model input.
425+
example_inputs = (
426+
torch.randint(0, 100, (1, 8), dtype=torch.long),
427+
)
428+
429+
# Trace the model, converting it to a portable intermediate representation.
430+
# The torch.no_grad() call tells PyTorch to exclude training-specific logic.
431+
with torch.nn.attention.sdpa_kernel([SDPBackend.MATH]), torch.no_grad():
432+
m = capture_pre_autograd_graph(model, example_inputs)
433+
traced_model = export(m, example_inputs)
400434

435+
# Convert the model into a runnable ExecuTorch program.
436+
# To be further lowered to Xnnpack backend, `traced_model` needs xnnpack-specific edge compile config
401437
edge_config = get_xnnpack_edge_compile_config()
402438
edge_manager = to_edge(traced_model, compile_config=edge_config)
403439

404-
# Delegate to the XNNPACK backend.
440+
# Delegate exported model to Xnnpack backend by invoking `to_backend` function with Xnnpack partitioner.
405441
edge_manager = edge_manager.to_backend(XnnpackPartitioner())
406-
407442
et_program = edge_manager.to_executorch()
408443

444+
# Save the Xnnpack-delegated ExecuTorch program to a file.
445+
with open("nanogpt.pte", "wb") as file:
446+
file.write(et_program.buffer)
447+
448+
409449
```
410450

411-
Additionally, update CMakeLists.txt to build and link the XNNPACK backend.
451+
Additionally, update CMakeLists.txt to build and link the XNNPACK backend to
452+
ExecuTorch runner.
412453

413454
```
414-
option(EXECUTORCH_BUILD_XNNPACK "" ON)
455+
cmake_minimum_required(VERSION 3.19)
456+
project(nanogpt_runner)
415457
416-
# ...
458+
set(CMAKE_CXX_STANDARD 17)
459+
set(CMAKE_CXX_STANDARD_REQUIRED True)
460+
461+
# Set options for executorch build.
462+
option(EXECUTORCH_BUILD_EXTENSION_DATA_LOADER "" ON)
463+
option(EXECUTORCH_BUILD_EXTENSION_MODULE "" ON)
464+
option(EXECUTORCH_BUILD_OPTIMIZED "" ON)
465+
option(EXECUTORCH_BUILD_XNNPACK "" ON) # Build with Xnnpack backend
466+
467+
# Include the executorch subdirectory.
468+
add_subdirectory(
469+
${CMAKE_CURRENT_SOURCE_DIR}/third-party/executorch
470+
${CMAKE_BINARY_DIR}/executorch)
471+
472+
# include_directories(${CMAKE_CURRENT_SOURCE_DIR}/src)
417473
418474
add_executable(nanogpt_runner main.cpp)
419475
target_link_libraries(
@@ -423,11 +479,52 @@ target_link_libraries(
423479
extension_module_static # Provides the Module class
424480
optimized_native_cpu_ops_lib # Provides baseline cross-platform kernels
425481
xnnpack_backend) # Provides the XNNPACK CPU acceleration backend
482+
```
483+
484+
Rest code keep the same as non-delegate examples. Please
485+
refer
486+
[Exporting to Executorch](https://pytorch.org/executorch/main/llm/getting-started.html#step-1-exporting-to-executorch)
487+
and
488+
[Invoking the Runtime](https://pytorch.org/executorch/main/llm/getting-started.html#step-2-invoking-the-runtime)
489+
for more details
426490

491+
At this point, the working directory should contain the following files:
492+
493+
- CMakeLists.txt
494+
- main.cpp
495+
- basic_tokenizer.h
496+
- basic_sampler.h
497+
- managed_tensor.h
498+
- export_nanogpt.py
499+
- model.py
500+
- vocab.json
501+
502+
If all of these are present, you can now export Xnnpack delegated pte model:
503+
```bash
504+
python export_nanogpt.py
427505
```
428506

429-
For more information, see the ExecuTorch guides for the [XNNPACK Backend](https://pytorch.org/executorch/stable/tutorial-xnnpack-delegate-lowering.html)
430-
and [CoreML Backend](https://pytorch.org/executorch/stable/build-run-coreml.html).
507+
It will generate `nanogpt.pte`, under the same working directory.
508+
509+
Then we can build and run the model by:
510+
```bash
511+
(rm -rf cmake-out && mkdir cmake-out && cd cmake-out && cmake ..)
512+
cmake --build cmake-out -j10
513+
./cmake-out/nanogpt_runner
514+
```
515+
516+
You should see something like the following:
517+
518+
```
519+
Once upon a time, there was a man who was a member of the military...
520+
```
521+
522+
523+
For more information regarding backend delegateion, see the ExecuTorch guides
524+
for the
525+
[XNNPACK Backend](https://pytorch.org/executorch/stable/tutorial-xnnpack-delegate-lowering.html)
526+
and
527+
[CoreML Backend](https://pytorch.org/executorch/stable/build-run-coreml.html).
431528

432529
## Quantization
433530

0 commit comments

Comments
 (0)