Skip to content

Profiling with py-spy revelations #14

@ArneTR

Description

@ArneTR

I profiled the powerletrics -i 1000 command with py-spy.

Once with RAPL and once where I segemented the file open calls for reading memory and cmdline.

Without RAPL and with segemented file open

  %Own   %Total  OwnTime  TotalTime  Function (filename)
 25.00%  25.00%   34.34s    34.34s   _shutdown (threading.py)
  5.00%   5.00%   13.21s    24.03s   get_data (powerletrics/powerletrics.py)
  0.00%   0.00%    3.23s     3.23s   next (bcc/table.py)
  0.00%   0.00%    1.63s     1.63s   __getitem__ (bcc/table.py)
  0.00%   0.00%    1.04s     1.04s   get_cmdline (powerletrics/powerletrics.py)
  0.00%   0.00%   0.980s    0.980s   __delitem__ (bcc/table.py)
  0.00%   0.00%   0.960s     2.92s   get_table (bcc/__init__.py)
  0.00%   0.00%   0.550s    0.820s   print_text (powerletrics/powerletrics.py)
  0.00%   0.00%   0.440s    0.470s   _assert_is_bytes (bcc/utils.py)
  0.00%   0.00%   0.420s    0.420s   __init__ (bcc/table.py)
  0.00%   0.00%   0.350s    0.460s   decode (json/decoder.py)
  0.00%   0.00%   0.230s    0.740s   Table (bcc/table.py)
  0.00%   0.00%   0.220s     1.13s   __contains__ (<frozen _collections_abc>)
  0.00%   0.00%   0.190s    0.190s   _decode_table_type (bcc/__init__.py)
  0.00%   0.00%   0.110s    0.110s   raw_decode (json/decoder.py)
  0.00%   0.00%   0.100s    0.560s   loads (json/__init__.py)
  0.00%   0.00%   0.100s    0.100s   <genexpr> (powerletrics/powerletrics.py)
  0.00%   0.00%   0.060s    0.090s   __eq__ (<frozen _collections_abc>)
  0.00%   0.00%   0.050s     3.94s   clear (bcc/table.py)
  0.00%   0.00%   0.030s    0.050s   energy_impact (powerletrics/powerletrics.py)
  0.00%   0.00%   0.030s    0.170s   cpu_utilization (powerletrics/powerletrics.py)
  0.00%   0.00%   0.030s    0.030s   __instancecheck__ (<frozen abc>)
  0.00%   0.00%   0.020s    0.050s   <lambda> (powerletrics/powerletrics.py)
  0.00%   0.00%   0.020s    0.020s   __bytes__ (bcc/utils.py)
  0.00%   0.00%   0.010s    0.010s   __init__ (bcc/utils.py)
  0.00%   0.00%   0.010s    0.010s   __iter__ (bcc/table.py)
  0.00%   0.00%   0.010s     3.24s   __next__ (bcc/table.py)
  0.00%   0.00%   0.000s    0.140s   get (<frozen _collections_abc>)
  0.00%   5.00%   0.000s    24.03s   run (threading.py)
  0.00%   5.00%   0.000s    24.03s   _bootstrap (threading.py)
  0.00%   0.00%   0.000s    0.010s   keys (bcc/table.py)
  0.00%   5.00%   0.000s    24.03s   _bootstrap_inner (threading.py)

What stands out is that the threading has either a large overhead or all of eBPF is counted into it.

py-spy definitely has an issue with applying percentages as none of those make any sense since they do not add up to 100%.

I am thus only looking on TotalTime.

In the user space get_data uses the most time. Suprisingly the file open calls are not that costly.

Also interesting is that get_table takes a lot of time. Why? For a simple lookup / memory copy? How big are these tables? can we maybe clear them more often?

The sorting also does not seem to be an issue.

So all in all: Looks fine, apart from the wild eBPF functions and threading overhead.

With RAPL data

  %Own   %Total  OwnTime  TotalTime  Function (filename)
 25.00%  25.00%   41.29s    41.29s   _shutdown (threading.py)
  9.00%   9.00%   16.04s    20.30s   get_data (powerletrics/powerletrics.py)
  7.00%   7.00%   12.54s    12.54s   rapl_metrics_provider_thread (powerletrics/powerletrics.py)
  0.00%   0.00%    1.20s     1.20s   next (bcc/table.py)
  0.00%   0.00%   0.850s    0.850s   __getitem__ (bcc/table.py)
  0.00%   0.00%   0.440s    0.610s   print_text (powerletrics/powerletrics.py)
  0.00%   0.00%   0.380s    0.380s   __delitem__ (bcc/table.py)
  0.00%   0.00%   0.330s     1.05s   get_table (bcc/__init__.py)
  0.00%   0.00%   0.180s    0.180s   __init__ (bcc/table.py)
  0.00%   0.00%   0.150s    0.150s   _assert_is_bytes (bcc/utils.py)
  0.00%   0.00%   0.130s    0.190s   decode (json/decoder.py)
  0.00%   0.00%   0.110s    0.350s   Table (bcc/table.py)
  0.00%   0.00%   0.090s    0.090s   <genexpr> (powerletrics/powerletrics.py)
  0.00%   0.00%   0.070s    0.620s   __contains__ (<frozen _collections_abc>)
  0.00%   0.00%   0.060s     1.55s   clear (bcc/table.py)
  0.00%   0.00%   0.060s    0.060s   raw_decode (json/decoder.py)
  0.00%   0.00%   0.050s     1.25s   __next__ (bcc/table.py)
  0.00%   0.00%   0.050s    0.060s   energy_impact (powerletrics/powerletrics.py)
  0.00%   0.00%   0.030s    0.060s   __eq__ (<frozen _collections_abc>)
  0.00%   0.00%   0.030s    0.030s   __instancecheck__ (<frozen abc>)
  0.00%   0.00%   0.020s    0.050s   cpu_utilization (powerletrics/powerletrics.py)
  0.00%   0.00%   0.020s    0.020s   _decode_table_type (bcc/__init__.py)
  0.00%   0.00%   0.010s    0.200s   loads (json/__init__.py)
  0.00%   0.00%   0.010s    0.010s   񆋀 (񱓐珂)
  0.00%  16.00%   0.000s    32.84s   _bootstrap (threading.py)
  0.00%   0.00%   0.000s    0.120s   get (<frozen _collections_abc>)
  0.00%  16.00%   0.000s    32.84s   _bootstrap_inner (threading.py)
  0.00%  16.00%   0.000s    32.84s   run (threading.py)
  0.00%   0.00%   0.000s    0.020s   <lambda> (powerletrics/powerletrics.py)

The rapl_metrics_provider_thread costs a lot of CPU time. Something is wrong here I believe ...

Metadata

Metadata

Assignees

No one assigned

    Labels

    No labels
    No labels

    Type

    No type

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions