-
Notifications
You must be signed in to change notification settings - Fork 104
routine performance checks part 1 #2184
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Conversation
there seems to be a data race introduced from the gopsutil update, it looks related to their shift from cgo to purego and difficult to avoid. I am going to try upgrading us to the release just before that happened instead |
ee/debug/checkups/checkups.go
Outdated
@@ -109,6 +109,7 @@ func checkupsFor(k types.Knapsack, target targetBits) []checkupInt { | |||
{&desktopMenu{k: k}, flareSupported}, | |||
{&coredumpCheckup{}, doctorSupported | flareSupported}, | |||
{&downloadDirectory{}, flareSupported}, | |||
{&perfCheckup{}, doctorSupported | flareSupported | logSupported}, |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I think we maybe either don't want this in doctor, or would only want a modified version in doctor? Since doctor is always run via the command line, the runtime stats will never be accurate, right?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
was thinking it would be nice in the doctor output in flares and didn't even consider that 🤦 you're absolutely right, will pull it
- process.MemoryInfoStat (e.g. RSS, VMS). these help to give a picture of the usage from the OS perspective | ||
- runtime.MemStats (e.g. heap stats). these help to give a picture of the go runtime usage | ||
Used together, we are able to estimate things like allocations outside of our golang runtime (see nonGoMemUsage). | ||
The runtime MemStats can be confusing - here is an attempt at outlining some of the fields we're interested in: |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thank you for this, really helpful
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
nice
Used together, we are able to estimate things like allocations outside of our golang runtime (see nonGoMemUsage). | ||
The runtime MemStats can be confusing - here is an attempt at outlining some of the fields we're interested in: | ||
|
||
Sys is total bytes of memory obtained from the OS. The virtual address space reserved by the Go runtime for: |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
love the comment, the whitespace is a little fishy...
CPUPercent float64 `json:"cpu_percent"` | ||
} | ||
|
||
func CurrentProcessStats(ctx context.Context) (*PerformanceStats, error) { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
One caveat here, is that this gathers stats for the current process. Which omits any desktop processes, or any osquery processes.
I'm breaking off this bit from the larger effort to routinely collect cpu/mem profiles so that we can get some discussion going about criteria for triggering these and ensure we're gathering all of the statistics we want correctly.
This does a few things:
PerformanceStats
it exposes. All fields are expected to be supported on all platforms, and it does not include noisier things like the runtime GC stats, to allow us to add this to a log checkpoint without too much noiselogCheckup
since a lot of this information is covered in our existing runtime checkup, but it seemed helpful as a quicker reference while looking at flares/doctor so I included it there too. happy to remove if people disagree, I could have gone either wayIn it's current form, this will emit checkup logs like this once an hour:
checkup log
An informational line will also be added to doctor like so:
Performance: process 50853 is using 16.88% CPU, RSS: 40.66 MB (0.06% memory). Note CPU will be higher while running flare.