Skip to content

routine performance checks part 1 #2184

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Merged
merged 11 commits into from
Mar 28, 2025
Merged

routine performance checks part 1 #2184

merged 11 commits into from
Mar 28, 2025

Conversation

zackattack01
Copy link
Contributor

@zackattack01 zackattack01 commented Mar 28, 2025

I'm breaking off this bit from the larger effort to routinely collect cpu/mem profiles so that we can get some discussion going about criteria for triggering these and ensure we're gathering all of the statistics we want correctly.

This does a few things:

  • upgrades our gopsutil package to v4. I didn't see any significant changes for our purposes but there were some bug fixes and memory leaks noted that I thought would make it worth upgrading now while we're looking at all of this closely. there were two small backwards compatibility breaks (updated in osquerylogs/logs.go and desktop/runner/runner_linux.go)
  • adds a new performance package for us to collect some documentation and helper functionality for handling these statistics. we do a lot of similar things in several places but if we're going to start taking actions based on this data we will likely need some stricter patterns/expectations/types to work off of. Let me know if there is more you'd like to see added here, it was just meant as a starting point
    • this is very intentional in the data it collects for the PerformanceStats it exposes. All fields are expected to be supported on all platforms, and it does not include noisier things like the runtime GC stats, to allow us to add this to a log checkpoint without too much noise
  • Adds a new checkup which reports this data. I had originally only meant it to be a logCheckup since a lot of this information is covered in our existing runtime checkup, but it seemed helpful as a quicker reference while looking at flares/doctor so I included it there too. happy to remove if people disagree, I could have gone either way

In it's current form, this will emit checkup logs like this once an hour:

checkup log

{
  "time": "2025-03-28T16:21:22.834035Z",
  "level": "DEBUG",
  "source": {
    "function": "github.com/kolide/launcher/ee/debug/checkups.(*logCheckPointer).Once",
    "file": "/Users/zackolson/go/src/launcher/ee/debug/checkups/checkpoint.go",
    "line": 68
  },
  "msg": "ran checkup",
  "run_id": "01JQERXN2V9FF2ZVJ7GMS0G1QE",
  "component": "log_checkpoint",
  "checkup": "Performance",
  "summary": "process 52226 is using 13.84% CPU, RSS: 53.34 MB (0.08% memory). Note CPU will be higher while running flare.",
  "data": {
    "stats": {
      "pid": 52226,
      "exe": "/Users/zackolson/go/src/launcher/build/launcher",
      "mem_info": {
        "rss_bytes": 55934976,
        "vms_bytes": 422118621184,
        "heap_total_bytes": 11067392,
        "go_mem_bytes": 21505288,
        "non_go_mem_bytes": 34429688,
        "mem_percent": 0.0813961
      },
      "cpu_percent": 13.839578399624044
    }
  },
  "status": "Informational"
}

An informational line will also be added to doctor like so:
Performance: process 50853 is using 16.88% CPU, RSS: 40.66 MB (0.06% memory). Note CPU will be higher while running flare.

@zackattack01
Copy link
Contributor Author

there seems to be a data race introduced from the gopsutil update, it looks related to their shift from cgo to purego and difficult to avoid. I am going to try upgrading us to the release just before that happened instead

@@ -109,6 +109,7 @@ func checkupsFor(k types.Knapsack, target targetBits) []checkupInt {
{&desktopMenu{k: k}, flareSupported},
{&coredumpCheckup{}, doctorSupported | flareSupported},
{&downloadDirectory{}, flareSupported},
{&perfCheckup{}, doctorSupported | flareSupported | logSupported},
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think we maybe either don't want this in doctor, or would only want a modified version in doctor? Since doctor is always run via the command line, the runtime stats will never be accurate, right?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

was thinking it would be nice in the doctor output in flares and didn't even consider that 🤦 you're absolutely right, will pull it

- process.MemoryInfoStat (e.g. RSS, VMS). these help to give a picture of the usage from the OS perspective
- runtime.MemStats (e.g. heap stats). these help to give a picture of the go runtime usage
Used together, we are able to estimate things like allocations outside of our golang runtime (see nonGoMemUsage).
The runtime MemStats can be confusing - here is an attempt at outlining some of the fields we're interested in:
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thank you for this, really helpful

@zackattack01 zackattack01 added this pull request to the merge queue Mar 28, 2025
Merged via the queue into main with commit 5f4ce67 Mar 28, 2025
36 checks passed
@zackattack01 zackattack01 deleted the zack/periodic_mem_checks branch March 28, 2025 21:03
Copy link
Contributor

@directionless directionless left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

nice

Used together, we are able to estimate things like allocations outside of our golang runtime (see nonGoMemUsage).
The runtime MemStats can be confusing - here is an attempt at outlining some of the fields we're interested in:

Sys is total bytes of memory obtained from the OS. The virtual address space reserved by the Go runtime for:
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

love the comment, the whitespace is a little fishy...

CPUPercent float64 `json:"cpu_percent"`
}

func CurrentProcessStats(ctx context.Context) (*PerformanceStats, error) {
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

One caveat here, is that this gathers stats for the current process. Which omits any desktop processes, or any osquery processes.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

3 participants