Skip to content

SDK + Inspector output time format is inconsistent with delegates #4504

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Closed
Aeranthes opened this issue Aug 1, 2024 · 8 comments
Closed

SDK + Inspector output time format is inconsistent with delegates #4504

Aeranthes opened this issue Aug 1, 2024 · 8 comments
Assignees
Labels
module: devtools Issues related to developer tools and code under devtools/ triaged This issue has been looked at a team member, and triaged and prioritized into an appropriate module

Comments

@Aeranthes
Copy link

Aeranthes commented Aug 1, 2024

Hi again - I'm think this is probably setup-agnostic, but as quick rundown, I'm running on an Android device with SDK enabled. I've got a binary application that can generate an etdump file while registering the XNNPack delegate, then I'm running a version of mobilenet_v2 that has been partitioned for the same delegate.

When I print out the resulting event blocks to find the operator timings, I'm seeing the following pattern:

event_block_name event_name raw p10 (ms)
Default Method::init [18.762083] 18.76208
Default Program::load_method [18.774635] 18.77464
Execute Transpose (ND, X32) #1 [1172000.0] 1172000
Execute Convolution (NHWC, F32) IGEMM #1 [802000.0] 802000
Execute Convolution (NHWC, F32) DWConv #1 [439000.0] 439000
Execute Convolution (NHWC, F32) GEMM #1 [357000.0] 357000
Execute Convolution (NHWC, F32) GEMM #2 [911000.0] 911000

The subject of the issue is specifically that the time format seems inconsistent - the Executorch calls and column headers are recorded in milliseconds, while from what I gather from XNNProfiler.cpp, the delegate calls are recorded in "PAL ticks" / system ticks?

Is this a bug, and/or is there a way to get the output in a more comparable time unit? Thanks for any assistance.

cc @Olivia-liu @Jack-Khuu

@mcr229
Copy link
Contributor

mcr229 commented Aug 1, 2024

from what I understand the recorded timings in the xnnpack events are ticks, and the above can be interpreted as nanoseconds. But I think @Olivia-liu @GregoryComer can speak more on this, and any plans on unifying the timing units.

@DzAvril
Copy link

DzAvril commented Aug 2, 2024

You can write a custom time converter like this

def convert_xnnpack_delegate_time(
    event_name: Union[str, int], input_time: Union[int, float]
) -> Union[int, float]:
    return input_time / (1000 * 1000)

And pass the convert to the Inspector like this

    inspector = Inspector(
        etdump_path=args.etdump_path,
        etrecord=args.etrecord_path,
        debug_buffer_path=args.debug_buffer_path,
        source_time_scale=TimeScale(args.source_time_scale),
        target_time_scale=TimeScale(args.target_time_scale),
        delegate_time_scale_converter=convert_xnnpack_delegate_time,
    )

This should be work.

@Aeranthes
Copy link
Author

Thanks for the prompt response.

From the structure of the profiling output I generally see that a block of delegate calls is followed by a "DELEGATE_CALL" event in the profiling. Maybe I'm mistaken about this representing the time spent in the delegate from Executorch's side.

If I add up the delegate profiling times for the inference and interpret them as nanoseconds, the preceding block of XNNPack operators' overall time is quite different to the time given by DELEGATE_CALL (20.25ms vs 28.1ms).

This ~40% missing inference time from the overall method::execute time was one of the reasons I thought that I couldn't interpret the results as nanoseconds.

Full table below:
  event_block_name event_name raw p10 (ms)
0 Default Method::init [18.762083] 18.76208
1 Default Program::load_method [18.774635] 18.77464
2 Execute Transpose (ND, X32) #1 [1172000.0] 1172000
3 Execute Convolution (NHWC, F32) IGEMM #1 [802000.0] 802000
4 Execute Convolution (NHWC, F32) DWConv #1 [439000.0] 439000
5 Execute Convolution (NHWC, F32) GEMM #1 [357000.0] 357000
6 Execute Convolution (NHWC, F32) GEMM #2 [911000.0] 911000
7 Execute Convolution (NHWC, F32) DWConv #2 [471000.0] 471000
8 Execute Convolution (NHWC, F32) GEMM #3 [306000.0] 306000
9 Execute Convolution (NHWC, F32) GEMM #4 [646000.0] 646000
10 Execute Convolution (NHWC, F32) DWConv #3 [521000.0] 521000
11 Execute Convolution (NHWC, F32) GEMM #5 [445000.0] 445000
12 Execute Add (ND, F32) #1 [98000.0] 98000
13 Execute Convolution (NHWC, F32) GEMM #6 [491000.0] 491000
14 Execute Convolution (NHWC, F32) DWConv #4 [194000.0] 194000
15 Execute Convolution (NHWC, F32) GEMM #7 [154000.0] 154000
16 Execute Convolution (NHWC, F32) GEMM #8 [218000.0] 218000
17 Execute Convolution (NHWC, F32) DWConv #5 [155000.0] 155000
18 Execute Convolution (NHWC, F32) GEMM #9 [203000.0] 203000
19 Execute Add (ND, F32) #2 [37000.0] 37000
20 Execute Convolution (NHWC, F32) GEMM #10 [218000.0] 218000
21 Execute Convolution (NHWC, F32) DWConv #6 [152000.0] 152000
22 Execute Convolution (NHWC, F32) GEMM #11 [202000.0] 202000
23 Execute Add (ND, F32) #3 [34000.0] 34000
24 Execute Convolution (NHWC, F32) GEMM #12 [216000.0] 216000
25 Execute Convolution (NHWC, F32) DWConv #7 [111000.0] 111000
26 Execute Convolution (NHWC, F32) GEMM #13 [113000.0] 113000
27 Execute Convolution (NHWC, F32) GEMM #14 [578000.0] 578000
28 Execute Convolution (NHWC, F32) DWConv #8 [190000.0] 190000
29 Execute Convolution (NHWC, F32) GEMM #15 [209000.0] 209000
30 Execute Add (ND, F32) #4 [26000.0] 26000
31 Execute Convolution (NHWC, F32) GEMM #16 [245000.0] 245000
32 Execute Convolution (NHWC, F32) DWConv #9 [181000.0] 181000
33 Execute Convolution (NHWC, F32) GEMM #17 [209000.0] 209000
34 Execute Add (ND, F32) #5 [25000.0] 25000
35 Execute Convolution (NHWC, F32) GEMM #18 [236000.0] 236000
36 Execute Convolution (NHWC, F32) DWConv #10 [185000.0] 185000
37 Execute Convolution (NHWC, F32) GEMM #19 [213000.0] 213000
38 Execute Add (ND, F32) #6 [23000.0] 23000
39 Execute Convolution (NHWC, F32) GEMM #20 [236000.0] 236000
40 Execute Convolution (NHWC, F32) DWConv #11 [183000.0] 183000
41 Execute Convolution (NHWC, F32) GEMM #21 [306000.0] 306000
42 Execute Convolution (NHWC, F32) GEMM #22 [478000.0] 478000
43 Execute Convolution (NHWC, F32) DWConv #12 [283000.0] 283000
44 Execute Convolution (NHWC, F32) GEMM #23 [522000.0] 522000
45 Execute Add (ND, F32) #7 [33000.0] 33000
46 Execute Convolution (NHWC, F32) GEMM #24 [482000.0] 482000
47 Execute Convolution (NHWC, F32) DWConv #13 [291000.0] 291000
48 Execute Convolution (NHWC, F32) GEMM #25 [452000.0] 452000
49 Execute Add (ND, F32) #8 [30000.0] 30000
50 Execute Convolution (NHWC, F32) GEMM #26 [477000.0] 477000
51 Execute Convolution (NHWC, F32) DWConv #14 [151000.0] 151000
52 Execute Convolution (NHWC, F32) GEMM #27 [224000.0] 224000
53 Execute Convolution (NHWC, F32) GEMM #28 [370000.0] 370000
54 Execute Convolution (NHWC, F32) DWConv #15 [244000.0] 244000
55 Execute Convolution (NHWC, F32) GEMM #29 [365000.0] 365000
56 Execute Add (ND, F32) #9 [18000.0] 18000
57 Execute Convolution (NHWC, F32) GEMM #30 [369000.0] 369000
58 Execute Convolution (NHWC, F32) DWConv #16 [1001000.0] 1001000
59 Execute Convolution (NHWC, F32) GEMM #31 [375000.0] 375000
60 Execute Add (ND, F32) #10 [17000.0] 17000
61 Execute Convolution (NHWC, F32) GEMM #32 [382000.0] 382000
62 Execute Convolution (NHWC, F32) DWConv #17 [281000.0] 281000
63 Execute Convolution (NHWC, F32) GEMM #33 [717000.0] 717000
64 Execute Convolution (NHWC, F32) GEMM #34 [935000.0] 935000
65 Execute Global Average Pooling (NWC, F32) #1 [43000.0] 43000
66 Execute Transpose (ND, X32) #2 [3000.0] 3000
67 Execute DELEGATE_CALL [28.097344] 28.09734
68 Execute OPERATOR_CALL [0.003646] 0.003646
69 Execute native_call_clone.out [0.122396] 0.122396
70 Execute OPERATOR_CALL [0.126719] 0.126719
71 Execute Fully Connected (NC, F32) GEMM #1 [889000.0] 889000
72 Execute DELEGATE_CALL [0.92] 0.92
73 Execute Method::execute [29.22375] 29.22375

@Olivia-liu
Copy link
Contributor

Hey @Aeranthes thanks again for your question! Always glad to see someone is trying out the tool :). I think there's some overhead of making the delegate call, so that's why the total time in greater than the sums of the operators. @cccclai do you have any knowledge on this?

@cccclai
Copy link
Contributor

cccclai commented Aug 5, 2024

Method::execute is the total time for all DELEGATE_CALL and OPERATOR_CALL. I don't think delegate call itself will have 8 ms overhead.

@Olivia-liu
Copy link
Contributor

It should be correct to interpret those values of the delegated operators to be nanoseconds for xnnpack. We're working on making the time units consistent for delegates in the Inspector output. The 20.25ms vs 28.1ms gap is indeed weird. There might be some bugs unrelated to the time scale that's causing it. I don't think you did anything wrong since I can reproduce this on my side as well.

Olivia-liu added a commit to Olivia-liu/executorch-1 that referenced this issue Sep 4, 2024
Summary:
Because the unit of delegate ops are different from non-delegates, this has caused confusion for users: [post](https://fb.workplace.com/groups/pytorch.edge.users/permalink/1562687731268181/), [Issue pytorch#4504](pytorch#4504).

This diff gives a quick solution for the problem to unblock Beta. The rationale is that since for non-delegated ops, we have the default time scale:
```
source_time_scale: TimeScale = TimeScale.NS,
target_time_scale: TimeScale = TimeScale.MS,
```
And the conversion rate is 10^6 (because 1 MS = 10^6 NS), so it makes sense to set a default converter function for delegated ops to be `input_time/(1000 * 1000)` as well. Similar converter functions have been used by [coreml](https://www.internalfb.com/code/fbsource/[76be64aab17c]/fbcode/executorch/examples/apple/coreml/scripts/inspector_utils.py?lines=259-262) and [vulkan](https://fburl.com/code/5cu6fx2b) already.

The proper solution to write the convertion in etdump is tracked by T198369344 and T198369419

Differential Revision: D62160650
Olivia-liu added a commit to Olivia-liu/executorch-1 that referenced this issue Sep 4, 2024
Summary:
Pull Request resolved: pytorch#5076

By default, the non-delegated events have a time scale converter, but delegated events do not, and this has caused confusion for users: [post](https://fb.workplace.com/groups/pytorch.edge.users/permalink/1562687731268181/), [Issue pytorch#4504](pytorch#4504).

This diff gives a quick solution for the problem to unblock ET Beta. The rationale is that since for non-delegated ops, we have the default time scale:
```
source_time_scale: TimeScale = TimeScale.NS,
target_time_scale: TimeScale = TimeScale.MS,
```
And the conversion rate is 10^6 (because 1 MS = 10^6 NS), so it makes sense to set a default converter function for delegated ops to be `input_time/(1000 * 1000)` as well. Similar converter functions have been used by [coreml](https://www.internalfb.com/code/fbsource/[76be64aab17c]/fbcode/executorch/examples/apple/coreml/scripts/inspector_utils.py?lines=259-262) and [vulkan](https://fburl.com/code/5cu6fx2b) already.

The proper solution to write the convertion in etdump is tracked by T198369344 and T198369419

Differential Revision: D62160650
Olivia-liu added a commit to Olivia-liu/executorch-1 that referenced this issue Sep 4, 2024
Summary:
Pull Request resolved: pytorch#5076

By default, the non-delegated events have a time scale converter, but delegated events do not, and this has caused confusion for users: [post](https://fb.workplace.com/groups/pytorch.edge.users/permalink/1562687731268181/), [Issue pytorch#4504](pytorch#4504).

This diff gives a quick solution for the problem to unblock ET Beta. When no `delegate_time_scale_converter` is provided by the user, just use the same converting ratio as the non-delegated events.

The proper solution to write the convertion in etdump is tracked by T198369344 and T198369419

Differential Revision: D62160650
Olivia-liu added a commit to Olivia-liu/executorch-1 that referenced this issue Sep 9, 2024
Summary:
Pull Request resolved: pytorch#5076

By default, the non-delegated events have a time scale converter, but delegated events do not, and this has caused confusion for users: [post](https://fb.workplace.com/groups/pytorch.edge.users/permalink/1562687731268181/), [Issue pytorch#4504](pytorch#4504).

This diff gives a quick solution for the problem to unblock ET Beta. When no `delegate_time_scale_converter` is provided by the user, just use the same converting ratio as the non-delegated events.

The proper solution to write the convertion in etdump is tracked by T198369344 and T198369419

Reviewed By: dbort

Differential Revision: D62160650
Olivia-liu added a commit to Olivia-liu/executorch-1 that referenced this issue Sep 9, 2024
Summary:
Pull Request resolved: pytorch#5076

By default, the non-delegated events have a time scale converter, but delegated events do not, and this has caused confusion for users: [post](https://fb.workplace.com/groups/pytorch.edge.users/permalink/1562687731268181/), [Issue pytorch#4504](pytorch#4504).

This diff gives a quick solution for the problem to unblock ET Beta. When no `delegate_time_scale_converter` is provided by the user, just use the same converting ratio as the non-delegated events.

The proper solution to write the convertion in etdump is tracked by T198369344 and T198369419

Reviewed By: dbort

Differential Revision: D62160650
Olivia-liu added a commit to Olivia-liu/executorch-1 that referenced this issue Sep 9, 2024
Summary:
Pull Request resolved: pytorch#5076

By default, the non-delegated events have a time scale converter, but delegated events do not, and this has caused confusion for users: [post](https://fb.workplace.com/groups/pytorch.edge.users/permalink/1562687731268181/), [Issue pytorch#4504](pytorch#4504).

This diff gives a quick solution for the problem to unblock ET Beta. When no `delegate_time_scale_converter` is provided by the user, just use the same converting ratio as the non-delegated events.

The proper solution to write the convertion in etdump is tracked by T198369344 and T198369419

Reviewed By: dbort

Differential Revision: D62160650
@digantdesai
Copy link
Contributor

@Olivia-liu is this resolved?

@digantdesai digantdesai added module: devtools Issues related to developer tools and code under devtools/ triaged This issue has been looked at a team member, and triaged and prioritized into an appropriate module labels Feb 4, 2025
@Olivia-liu
Copy link
Contributor

This should solve it #5076

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
module: devtools Issues related to developer tools and code under devtools/ triaged This issue has been looked at a team member, and triaged and prioritized into an appropriate module
Projects
None yet
Development

No branches or pull requests

6 participants