-
Notifications
You must be signed in to change notification settings - Fork 2
Description
I profiled the powerletrics -i 1000
command with py-spy.
Once with RAPL and once where I segemented the file open calls for reading memory and cmdline.
Without RAPL and with segemented file open
%Own %Total OwnTime TotalTime Function (filename)
25.00% 25.00% 34.34s 34.34s _shutdown (threading.py)
5.00% 5.00% 13.21s 24.03s get_data (powerletrics/powerletrics.py)
0.00% 0.00% 3.23s 3.23s next (bcc/table.py)
0.00% 0.00% 1.63s 1.63s __getitem__ (bcc/table.py)
0.00% 0.00% 1.04s 1.04s get_cmdline (powerletrics/powerletrics.py)
0.00% 0.00% 0.980s 0.980s __delitem__ (bcc/table.py)
0.00% 0.00% 0.960s 2.92s get_table (bcc/__init__.py)
0.00% 0.00% 0.550s 0.820s print_text (powerletrics/powerletrics.py)
0.00% 0.00% 0.440s 0.470s _assert_is_bytes (bcc/utils.py)
0.00% 0.00% 0.420s 0.420s __init__ (bcc/table.py)
0.00% 0.00% 0.350s 0.460s decode (json/decoder.py)
0.00% 0.00% 0.230s 0.740s Table (bcc/table.py)
0.00% 0.00% 0.220s 1.13s __contains__ (<frozen _collections_abc>)
0.00% 0.00% 0.190s 0.190s _decode_table_type (bcc/__init__.py)
0.00% 0.00% 0.110s 0.110s raw_decode (json/decoder.py)
0.00% 0.00% 0.100s 0.560s loads (json/__init__.py)
0.00% 0.00% 0.100s 0.100s <genexpr> (powerletrics/powerletrics.py)
0.00% 0.00% 0.060s 0.090s __eq__ (<frozen _collections_abc>)
0.00% 0.00% 0.050s 3.94s clear (bcc/table.py)
0.00% 0.00% 0.030s 0.050s energy_impact (powerletrics/powerletrics.py)
0.00% 0.00% 0.030s 0.170s cpu_utilization (powerletrics/powerletrics.py)
0.00% 0.00% 0.030s 0.030s __instancecheck__ (<frozen abc>)
0.00% 0.00% 0.020s 0.050s <lambda> (powerletrics/powerletrics.py)
0.00% 0.00% 0.020s 0.020s __bytes__ (bcc/utils.py)
0.00% 0.00% 0.010s 0.010s __init__ (bcc/utils.py)
0.00% 0.00% 0.010s 0.010s __iter__ (bcc/table.py)
0.00% 0.00% 0.010s 3.24s __next__ (bcc/table.py)
0.00% 0.00% 0.000s 0.140s get (<frozen _collections_abc>)
0.00% 5.00% 0.000s 24.03s run (threading.py)
0.00% 5.00% 0.000s 24.03s _bootstrap (threading.py)
0.00% 0.00% 0.000s 0.010s keys (bcc/table.py)
0.00% 5.00% 0.000s 24.03s _bootstrap_inner (threading.py)
What stands out is that the threading has either a large overhead or all of eBPF is counted into it.
py-spy
definitely has an issue with applying percentages as none of those make any sense since they do not add up to 100%.
I am thus only looking on TotalTime.
In the user space get_data
uses the most time. Suprisingly the file open calls are not that costly.
Also interesting is that get_table
takes a lot of time. Why? For a simple lookup / memory copy? How big are these tables? can we maybe clear them more often?
The sorting also does not seem to be an issue.
So all in all: Looks fine, apart from the wild eBPF functions and threading overhead.
With RAPL data
%Own %Total OwnTime TotalTime Function (filename)
25.00% 25.00% 41.29s 41.29s _shutdown (threading.py)
9.00% 9.00% 16.04s 20.30s get_data (powerletrics/powerletrics.py)
7.00% 7.00% 12.54s 12.54s rapl_metrics_provider_thread (powerletrics/powerletrics.py)
0.00% 0.00% 1.20s 1.20s next (bcc/table.py)
0.00% 0.00% 0.850s 0.850s __getitem__ (bcc/table.py)
0.00% 0.00% 0.440s 0.610s print_text (powerletrics/powerletrics.py)
0.00% 0.00% 0.380s 0.380s __delitem__ (bcc/table.py)
0.00% 0.00% 0.330s 1.05s get_table (bcc/__init__.py)
0.00% 0.00% 0.180s 0.180s __init__ (bcc/table.py)
0.00% 0.00% 0.150s 0.150s _assert_is_bytes (bcc/utils.py)
0.00% 0.00% 0.130s 0.190s decode (json/decoder.py)
0.00% 0.00% 0.110s 0.350s Table (bcc/table.py)
0.00% 0.00% 0.090s 0.090s <genexpr> (powerletrics/powerletrics.py)
0.00% 0.00% 0.070s 0.620s __contains__ (<frozen _collections_abc>)
0.00% 0.00% 0.060s 1.55s clear (bcc/table.py)
0.00% 0.00% 0.060s 0.060s raw_decode (json/decoder.py)
0.00% 0.00% 0.050s 1.25s __next__ (bcc/table.py)
0.00% 0.00% 0.050s 0.060s energy_impact (powerletrics/powerletrics.py)
0.00% 0.00% 0.030s 0.060s __eq__ (<frozen _collections_abc>)
0.00% 0.00% 0.030s 0.030s __instancecheck__ (<frozen abc>)
0.00% 0.00% 0.020s 0.050s cpu_utilization (powerletrics/powerletrics.py)
0.00% 0.00% 0.020s 0.020s _decode_table_type (bcc/__init__.py)
0.00% 0.00% 0.010s 0.200s loads (json/__init__.py)
0.00% 0.00% 0.010s 0.010s (珂)
0.00% 16.00% 0.000s 32.84s _bootstrap (threading.py)
0.00% 0.00% 0.000s 0.120s get (<frozen _collections_abc>)
0.00% 16.00% 0.000s 32.84s _bootstrap_inner (threading.py)
0.00% 16.00% 0.000s 32.84s run (threading.py)
0.00% 0.00% 0.000s 0.020s <lambda> (powerletrics/powerletrics.py)
The rapl_metrics_provider_thread
costs a lot of CPU time. Something is wrong here I believe ...