-
Notifications
You must be signed in to change notification settings - Fork 95
Profiling tools and initial optimizations #245
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Closed
Closed
Conversation
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
2350047 to
66e486c
Compare
ahesford
reviewed
Jan 10, 2022
Member
ahesford
left a comment
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Nice work, this will be nice to have in the test setup.
6d76f0c to
7cd4af7
Compare
ahesford
requested changes
Jan 12, 2022
848558e to
45b8f3c
Compare
ahesford
approved these changes
Jan 13, 2022
Member
ahesford
left a comment
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Even without the profiling stuff, this refactors a lot of things quite nicely.
Create /lib/profiling-lib.sh, which can be sourced by any Bash script. Once sourced, a debug trap is installed that dumps the stack (but not the command being executed) to a serial port, which collects the data on the host side into a log file. After the VM exits, the performance data is converted into a format expected by flamegraph.pl (https://github.com/brendangregg/FlameGraph) and then both a flamegraph and a flamechart are created. If profiling isn't enabled, a no-op /lib/profiling-lib.sh is created. This file must always exist, and must always be sourceable. Optimizations/performance changes include: - setting the loglevel as from the KCL as early as possible, to preserve all log messages > zwarn - validating that the loglevel is an integer, otherwise defaulting to 4 - validating that zbm.timeout/timeout is an integer, otherwise defaulting to 10 - adding re-import guards to libraries - removing the zlog() function and doing the logging work in the zerror/zwarn/... function instead. This means that everything but zdebug can be as lean as possible. - gating a couple serial zdebug calls behind a single zdebug check - added a fast-fail to timed_prompt to return 0 if delay isn't greater than 0. Any default values for delay/prompt are assigned after this test. - add a short-circuit to get_zbm_arg that uses bashre to check if the kcl option string is present at all in the KCL. If it's not, the next option passed to get_zbm_arg is evaluated. - Avoid pre-computing a return value until it's actually needed - cut i18n out of testing builds unless a graphical interface is requested - cut crypt-ssh and network-legacy out of testing builds unless SSH access is requested - cut out 'nvdimm fs-lib rootfs-block dm dmraid lunmask' from testing builds. These can likely be cut from release builds as well.
45b8f3c to
60771c8
Compare
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Add this suggestion to a batch that can be applied as a single commit.
This suggestion is invalid because no changes were made to the code.
Suggestions cannot be applied while the pull request is closed.
Suggestions cannot be applied while viewing a subset of changes.
Only one suggestion per line can be applied in a batch.
Add this suggestion to a batch that can be applied as a single commit.
Applying suggestions on deleted lines is not supported.
You must change the existing code in this line in order to create a valid suggestion.
Outdated suggestions cannot be applied.
This suggestion has been applied or marked resolved.
Suggestions cannot be applied from pending reviews.
Suggestions cannot be applied on multi-line comments.
Suggestions cannot be applied while the pull request is queued to merge.
Suggestion cannot be applied right now. Please check back later.
If other languages can do performance profiling, why can't
bash? Our debug log format already had the bulk of the information we need to generate per-function timing information.run.sh -Forrun.sh -Ecan now introduce a build-time variable,trace_term, which points to a serial device in the VM. On the host side, the serial port is attached to a FIFO which dumps data into a log file in the test environment directory. On the ZBM side, adebugtrap points to the serial device. Every command executed in the interpreter causes the trap to fire, which dumps the current function stack and source (file) stack along with the current unix TS with nanosecond precision to the serial port.Once the VM exits, a small perl tool re-assembles the trace/performance data to the format expected by Flamegraph.
flamegraph.plshould be in $PATH on your local install. It'll generate${FLAMECHART}/flamechart.svgand upload it ifwebifyis on your system.To protect against profiling code landing in a production release, a no-op
/lib/profiling-lib.shis installed by default that simply executesreturn 0when sourced. The variablezfsbootmenu_enable_tracingmust be enabled in a Dracut configuration file for the full-weight profiling library to be used.Based on the flamechart generated, I've made the following changes:
network-legacyin the test environment, it spends a bunch of time messing with the KCL, etcrootfs-block,dm,dmraidin the test environment - nothing we do needs what they provide and they're slowget_zbm_arg. When one lookup key is sent, use bashre to check if it's present in the KCL and return immediately if it's not.zlog- preferring to directly echo to/dev/kmsgin anything but zdebug .zinfohas gone from ~24ms to 4ms.getarg loglevelcheck to earlier in the KCL parsing, so we don't potentially miss zinfo logs (controlling terminal zinfo now works again!)Gate all zdebug calls in functions in the quick boot path withzdebug &&, avoiding building a stringReplace aget_zbm_bool primary secondarycall with two separate calls, to take advantage of fast failsDon't go into the countdown prompt iftimeout=0(zbm.skip).Before: https://somebits.link/u/p/d62bdcd0da/stream
After: Flamegraph Flamechart
This is generally tested, but it could use a serious review of how the timing diff is generated and if it's valid. The downside to the profiler is that the trap, even as minimal as it is, adds a computation overhead. The timing values are consistent between runs, but they shouldn't be taken as absolute truth - they're inflated due to the time the trap takes to execute.
There are a number of Dracut optimizations that can probably still be done.