Skip to content

Commit ea35103

Browse files
Gasoonjiafacebook-github-bot
authored andcommitted
add more instructions and examples on Delegation
Summary: as title. Differential Revision: D55988177
1 parent 554cd27 commit ea35103

File tree

1 file changed

+89
-24
lines changed

1 file changed

+89
-24
lines changed

docs/source/llm/getting-started.md

+89-24
Original file line numberDiff line numberDiff line change
@@ -374,46 +374,102 @@ specific hardware (delegation), and because it is doing all of the calculations
374374

375375
## Delegation
376376

377-
While ExecuTorch provides a portable, cross-platform implementation for all operators, it also provides specialized
378-
backends for a number of different targets. These include, but are not limited to, x86 and ARM CPU acceleration via
379-
the XNNPACK backend, Apple acceleration via the CoreML backend and Metal Performance Shader (MPS) backend, and GPU
380-
acceleration via the Vulkan backend.
381-
382-
Because optimizations are specific to a given backend, each pte file is specific to the backend(s) targeted at
383-
export. To support multiple devices, such as XNNPACK acceleration for Android and CoreML for iOS, export a separate
384-
PTE file for each backend.
385-
386-
To delegate to a backend at export time, ExecuTorch provides the `to_backend()` function, which takes a backend-
387-
specific partitioner object. The partitioner is responsible for finding parts of the computation graph that can
388-
be accelerated by the target backend. Any portions of the computation graph not delegated will be executed by the
389-
portable or optimized ExecuTorch implementations.
390-
391-
To delegate to the XNNPACK backend, call `to_backend` with an instance of `XnnpackPartitioner()`.
377+
While ExecuTorch provides a portable, cross-platform implementation for all
378+
operators, it also provides specialized backends for a number of different
379+
targets. These include, but are not limited to, x86 and ARM CPU acceleration via
380+
the XNNPACK backend, Apple acceleration via the CoreML backend and Metal
381+
Performance Shader (MPS) backend, and GPU acceleration via the Vulkan backend.
382+
383+
Because optimizations are specific to a given backend, each pte file is specific
384+
to the backend(s) targeted at export. To support multiple devices, such as
385+
XNNPACK acceleration for Android and CoreML for iOS, export a separate PTE file
386+
for each backend.
387+
388+
To delegate to a backend at export time, ExecuTorch provides the `to_backend()`
389+
function in the `EdgeProgramManager` object, which takes a backend-specific
390+
partitioner object. The partitioner is responsible for finding parts of the
391+
computation graph that can be accelerated by the target backend,and
392+
`to_backend()` function will delegate matched part to given backend for
393+
acceleration and optimization. Any portions of the computation graph not
394+
delegated will be executed by the ExecuTorch operator implementations.
395+
396+
To delegate the exported model to the specific backend, we need to import its
397+
partitioner as well as edge compile config from Executorch Codebase first, then
398+
call `to_backend` with an instance of partitioner on the `EdgeProgramManager`
399+
object `to_edge` function created.
400+
401+
Here's an example of how to delegate NanoGPT to Xnnpack backend:
392402

393403
```python
394404
# export_nanogpt.py
395405

406+
# Load partitioner for Xnnpack backend
396407
from executorch.backends.xnnpack.partition.xnnpack_partitioner import XnnpackPartitioner
408+
409+
# Model to be delegated to specific backend should use specific edge compile config
397410
from executorch.backends.xnnpack.utils.configs import get_xnnpack_edge_compile_config
411+
from executorch.exir import EdgeCompileConfig, to_edge
412+
413+
import torch
414+
from torch.export import export
415+
from torch.nn.attention import sdpa_kernel, SDPBackend
416+
from torch._export import capture_pre_autograd_graph
398417

399-
#...
418+
from model import GPT
400419

420+
# Load the NanoGPT model.
421+
model = GPT.from_pretrained('gpt2')
422+
423+
# Create example inputs. This is used in the export process to provide
424+
# hints on the expected shape of the model input.
425+
example_inputs = (
426+
torch.randint(0, 100, (1, 8), dtype=torch.long),
427+
)
428+
429+
# Trace the model, converting it to a portable intermediate representation.
430+
# The torch.no_grad() call tells PyTorch to exclude training-specific logic.
431+
with torch.nn.attention.sdpa_kernel([SDPBackend.MATH]), torch.no_grad():
432+
m = capture_pre_autograd_graph(model, example_inputs)
433+
traced_model = export(m, example_inputs)
434+
435+
# Convert the model into a runnable ExecuTorch program.
436+
# To be further lowered to Xnnpack backend, `traced_model` needs xnnpack-specific edge compile config
401437
edge_config = get_xnnpack_edge_compile_config()
402438
edge_manager = to_edge(traced_model, compile_config=edge_config)
403439

404-
# Delegate to the XNNPACK backend.
440+
# Delegate exported model to Xnnpack backend by invoking `to_backend` function with Xnnpack partitioner.
405441
edge_manager = edge_manager.to_backend(XnnpackPartitioner())
406-
407442
et_program = edge_manager.to_executorch()
408443

444+
# Save the Xnnpack-delegated ExecuTorch program to a file.
445+
with open("nanogpt.pte", "wb") as file:
446+
file.write(et_program.buffer)
447+
448+
409449
```
410450

411-
Additionally, update CMakeLists.txt to build and link the XNNPACK backend.
451+
Additionally, update CMakeLists.txt to build and link the XNNPACK backend to
452+
ExecuTorch runner.
412453

413454
```
414-
option(EXECUTORCH_BUILD_XNNPACK "" ON)
455+
cmake_minimum_required(VERSION 3.19)
456+
project(nanogpt_runner)
415457
416-
# ...
458+
set(CMAKE_CXX_STANDARD 17)
459+
set(CMAKE_CXX_STANDARD_REQUIRED True)
460+
461+
# Set options for executorch build.
462+
option(EXECUTORCH_BUILD_EXTENSION_DATA_LOADER "" ON)
463+
option(EXECUTORCH_BUILD_EXTENSION_MODULE "" ON)
464+
option(EXECUTORCH_BUILD_OPTIMIZED "" ON)
465+
option(EXECUTORCH_BUILD_XNNPACK "" ON) # Build with Xnnpack backend
466+
467+
# Include the executorch subdirectory.
468+
add_subdirectory(
469+
${CMAKE_CURRENT_SOURCE_DIR}/third-party/executorch
470+
${CMAKE_BINARY_DIR}/executorch)
471+
472+
# include_directories(${CMAKE_CURRENT_SOURCE_DIR}/src)
417473
418474
add_executable(nanogpt_runner main.cpp)
419475
target_link_libraries(
@@ -423,11 +479,20 @@ target_link_libraries(
423479
extension_module_static # Provides the Module class
424480
optimized_native_cpu_ops_lib # Provides baseline cross-platform kernels
425481
xnnpack_backend) # Provides the XNNPACK CPU acceleration backend
426-
427482
```
428483

429-
For more information, see the ExecuTorch guides for the [XNNPACK Backend](https://pytorch.org/executorch/stable/tutorial-xnnpack-delegate-lowering.html)
430-
and [CoreML Backend](https://pytorch.org/executorch/stable/build-run-coreml.html).
484+
Rest code and execute procedure keep the same as non-delegate examples. Please
485+
refer
486+
[Exporting to Executorch](https://pytorch.org/executorch/main/llm/getting-started.html#step-1-exporting-to-executorch)
487+
and
488+
[Invoking the Runtime](https://pytorch.org/executorch/main/llm/getting-started.html#step-2-invoking-the-runtime)
489+
for more details
490+
491+
For more information regarding backend delegateion, see the ExecuTorch guides
492+
for the
493+
[XNNPACK Backend](https://pytorch.org/executorch/stable/tutorial-xnnpack-delegate-lowering.html)
494+
and
495+
[CoreML Backend](https://pytorch.org/executorch/stable/build-run-coreml.html).
431496

432497
## Quantization
433498

0 commit comments

Comments
 (0)