@@ -374,46 +374,102 @@ specific hardware (delegation), and because it is doing all of the calculations
374
374
375
375
## Delegation
376
376
377
- While ExecuTorch provides a portable, cross-platform implementation for all operators, it also provides specialized
378
- backends for a number of different targets. These include, but are not limited to, x86 and ARM CPU acceleration via
379
- the XNNPACK backend, Apple acceleration via the CoreML backend and Metal Performance Shader (MPS) backend, and GPU
380
- acceleration via the Vulkan backend.
381
-
382
- Because optimizations are specific to a given backend, each pte file is specific to the backend(s) targeted at
383
- export. To support multiple devices, such as XNNPACK acceleration for Android and CoreML for iOS, export a separate
384
- PTE file for each backend.
385
-
386
- To delegate to a backend at export time, ExecuTorch provides the ` to_backend() ` function, which takes a backend-
387
- specific partitioner object. The partitioner is responsible for finding parts of the computation graph that can
388
- be accelerated by the target backend. Any portions of the computation graph not delegated will be executed by the
389
- portable or optimized ExecuTorch implementations.
390
-
391
- To delegate to the XNNPACK backend, call ` to_backend ` with an instance of ` XnnpackPartitioner() ` .
377
+ While ExecuTorch provides a portable, cross-platform implementation for all
378
+ operators, it also provides specialized backends for a number of different
379
+ targets. These include, but are not limited to, x86 and ARM CPU acceleration via
380
+ the XNNPACK backend, Apple acceleration via the CoreML backend and Metal
381
+ Performance Shader (MPS) backend, and GPU acceleration via the Vulkan backend.
382
+
383
+ Because optimizations are specific to a given backend, each pte file is specific
384
+ to the backend(s) targeted at export. To support multiple devices, such as
385
+ XNNPACK acceleration for Android and CoreML for iOS, export a separate PTE file
386
+ for each backend.
387
+
388
+ To delegate to a backend at export time, ExecuTorch provides the ` to_backend() `
389
+ function in the ` EdgeProgramManager ` object, which takes a backend-specific
390
+ partitioner object. The partitioner is responsible for finding parts of the
391
+ computation graph that can be accelerated by the target backend,and
392
+ ` to_backend() ` function will delegate matched part to given backend for
393
+ acceleration and optimization. Any portions of the computation graph not
394
+ delegated will be executed by the ExecuTorch operator implementations.
395
+
396
+ To delegate the exported model to the specific backend, we need to import its
397
+ partitioner as well as edge compile config from Executorch Codebase first, then
398
+ call ` to_backend ` with an instance of partitioner on the ` EdgeProgramManager `
399
+ object ` to_edge ` function created.
400
+
401
+ Here's an example of how to delegate NanoGPT to Xnnpack backend:
392
402
393
403
``` python
394
404
# export_nanogpt.py
395
405
406
+ # Load partitioner for Xnnpack backend
396
407
from executorch.backends.xnnpack.partition.xnnpack_partitioner import XnnpackPartitioner
408
+
409
+ # Model to be delegated to specific backend should use specific edge compile config
397
410
from executorch.backends.xnnpack.utils.configs import get_xnnpack_edge_compile_config
411
+ from executorch.exir import EdgeCompileConfig, to_edge
412
+
413
+ import torch
414
+ from torch.export import export
415
+ from torch.nn.attention import sdpa_kernel, SDPBackend
416
+ from torch._export import capture_pre_autograd_graph
398
417
399
- # ...
418
+ from model import GPT
400
419
420
+ # Load the NanoGPT model.
421
+ model = GPT .from_pretrained(' gpt2' )
422
+
423
+ # Create example inputs. This is used in the export process to provide
424
+ # hints on the expected shape of the model input.
425
+ example_inputs = (
426
+ torch.randint(0 , 100 , (1 , 8 ), dtype = torch.long),
427
+ )
428
+
429
+ # Trace the model, converting it to a portable intermediate representation.
430
+ # The torch.no_grad() call tells PyTorch to exclude training-specific logic.
431
+ with torch.nn.attention.sdpa_kernel([SDPBackend.MATH ]), torch.no_grad():
432
+ m = capture_pre_autograd_graph(model, example_inputs)
433
+ traced_model = export(m, example_inputs)
434
+
435
+ # Convert the model into a runnable ExecuTorch program.
436
+ # To be further lowered to Xnnpack backend, `traced_model` needs xnnpack-specific edge compile config
401
437
edge_config = get_xnnpack_edge_compile_config()
402
438
edge_manager = to_edge(traced_model, compile_config = edge_config)
403
439
404
- # Delegate to the XNNPACK backend.
440
+ # Delegate exported model to Xnnpack backend by invoking `to_backend` function with Xnnpack partitioner .
405
441
edge_manager = edge_manager.to_backend(XnnpackPartitioner())
406
-
407
442
et_program = edge_manager.to_executorch()
408
443
444
+ # Save the Xnnpack-delegated ExecuTorch program to a file.
445
+ with open (" nanogpt.pte" , " wb" ) as file :
446
+ file .write(et_program.buffer)
447
+
448
+
409
449
```
410
450
411
- Additionally, update CMakeLists.txt to build and link the XNNPACK backend.
451
+ Additionally, update CMakeLists.txt to build and link the XNNPACK backend to
452
+ ExecuTorch runner.
412
453
413
454
```
414
- option(EXECUTORCH_BUILD_XNNPACK "" ON)
455
+ cmake_minimum_required(VERSION 3.19)
456
+ project(nanogpt_runner)
415
457
416
- # ...
458
+ set(CMAKE_CXX_STANDARD 17)
459
+ set(CMAKE_CXX_STANDARD_REQUIRED True)
460
+
461
+ # Set options for executorch build.
462
+ option(EXECUTORCH_BUILD_EXTENSION_DATA_LOADER "" ON)
463
+ option(EXECUTORCH_BUILD_EXTENSION_MODULE "" ON)
464
+ option(EXECUTORCH_BUILD_OPTIMIZED "" ON)
465
+ option(EXECUTORCH_BUILD_XNNPACK "" ON) # Build with Xnnpack backend
466
+
467
+ # Include the executorch subdirectory.
468
+ add_subdirectory(
469
+ ${CMAKE_CURRENT_SOURCE_DIR}/third-party/executorch
470
+ ${CMAKE_BINARY_DIR}/executorch)
471
+
472
+ # include_directories(${CMAKE_CURRENT_SOURCE_DIR}/src)
417
473
418
474
add_executable(nanogpt_runner main.cpp)
419
475
target_link_libraries(
@@ -423,11 +479,20 @@ target_link_libraries(
423
479
extension_module_static # Provides the Module class
424
480
optimized_native_cpu_ops_lib # Provides baseline cross-platform kernels
425
481
xnnpack_backend) # Provides the XNNPACK CPU acceleration backend
426
-
427
482
```
428
483
429
- For more information, see the ExecuTorch guides for the [ XNNPACK Backend] ( https://pytorch.org/executorch/stable/tutorial-xnnpack-delegate-lowering.html )
430
- and [ CoreML Backend] ( https://pytorch.org/executorch/stable/build-run-coreml.html ) .
484
+ Rest code and execute procedure keep the same as non-delegate examples. Please
485
+ refer
486
+ [ Exporting to Executorch](https:// pytorch.org/executorch/main/llm/getting-started.html#step-1-exporting-to-executorch)
487
+ and
488
+ [Invoking the Runtime](https:// pytorch.org/executorch/main/llm/getting-started.html#step-2-invoking-the-runtime)
489
+ for more details
490
+
491
+ For more information regarding backend delegateion, see the ExecuTorch guides
492
+ for the
493
+ [XNNPACK Backend](https:// pytorch.org/executorch/stable/tutorial-xnnpack-delegate-lowering.html)
494
+ and
495
+ [CoreML Backend](https:// pytorch.org/executorch/stable/build-run-coreml.html).
431
496
432
497
## Quantization
433
498
0 commit comments