-
Notifications
You must be signed in to change notification settings - Fork 183
netkvm: add fastinit regarding lazy memory allocation #4334
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
base: master
Are you sure you want to change the base?
Conversation
98e220a to
7001b6b
Compare
|
Related to: #4175 |
|
Depends on: avocado-framework/avocado-vt#4160 |
7001b6b to
5f2aed4
Compare
5f2aed4 to
6b065a6
Compare
This test case wants to add some checkpoints to verify long initialization time of NETKVM device driver (NDIS miniport). Signed-off-by: Wenkang Ji <[email protected]>
6b065a6 to
6426d5a
Compare
WalkthroughAdds a new Windows-focused NetKVM FastInit test: introduces a qemu test config (netkvm_fastinit.cfg) defining variants, NIC counts/queues, thresholds, and verifier/WMI settings; and a Python test (netkvm_fastinit.py) executing cold boot, optional Driver Verifier toggle, per-NIC FastInit configuration, WMI log parsing, and timing/threshold validation. Changes
Sequence Diagram(s)sequenceDiagram
autonumber
participant Tester as Avocado Test
participant VM as Guest VM
participant GuestOS as Windows (Guest)
participant WMI as NetKVM WMI
participant DV as Driver Verifier
Tester->>VM: Cold boot (create/start)
VM-->>Tester: Boot complete
alt Driver Verifier toggle configured
Tester->>GuestOS: Query DV state
GuestOS->>DV: Enable/Disable as needed
DV-->>GuestOS: Applied
Tester->>VM: Reboot
VM-->>Tester: Up with DV state verified
end
Tester->>GuestOS: Inspect NIC FastInit on NIC0
alt Mismatch to desired FastInit
GuestOS->>GuestOS: Set FastInit on NIC0 (and propagate to others if needed)
end
Tester->>WMI: Dump NetKVM log/config
WMI-->>Tester: InitTimeMs, LazyAllocTimeMs, Active NICs
Tester->>Tester: Validate thresholds and counts
alt Violations found
Tester-->>Tester: Fail test with error_context
else Success
Tester-->>Tester: Log metrics and pass
end
Tester->>VM: Optional hot reboot and re-validate (same steps)
Estimated code review effort🎯 4 (Complex) | ⏱️ ~60 minutes Poem
Pre-merge checks and finishing touches✅ Passed checks (3 passed)
✨ Finishing touches
🧪 Generate unit tests
Tip 👮 Agentic pre-merge checks are now available in preview!Pro plan users can now enable pre-merge checks in their settings to enforce checklists before merging PRs.
Please see the documentation for more information. Example: reviews:
pre_merge_checks:
custom_checks:
- name: "Undocumented Breaking Changes"
mode: "warning"
instructions: |
Pass/fail criteria: All breaking changes to public APIs, CLI flags, environment variables, configuration keys, database schemas, or HTTP/GraphQL endpoints must be documented in the "Breaking Change" section of the PR description and in CHANGELOG.md. Exclude purely internal or private changes (e.g., code not exported from package entry points or explicitly marked as internal).Please share your feedback with us on this Discord post. Comment |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Actionable comments posted: 5
🧹 Nitpick comments (4)
qemu/tests/cfg/netkvm_fastinit.cfg (1)
21-46: Ensure queue size parameters align with expected RX/TX symmetry.For
queue4_size1024you setnics_paramtorx_queue_size=1024,tx_queue_size=256, whilenic_extra_params_nic1mirrors the same asymmetry. If the intention is a symmetric 1024×1024 queue, adjust the TX size; otherwise, document why TX stays at 256 so future updates do not “fix” it inadvertently.qemu/tests/netkvm_fastinit.py (3)
92-135: Track multiple lazy allocation breaches deterministically.
lazy_failurestriggerstest.erroronly after the second breach, but the loop keeps running, resulting in redundant errors for the same NIC on subsequent iterations. Collect offending NICs and emit a single consolidated failure once the loop completes to keep logs actionable.
197-201: Stop duplicating NIC extra parameters for nic1.
params["nic_extra_params_" + str(nics)] = nics_paramoverwrites the value for the first NIC each time through the loop. Use the loop index (nic%salready includes the number) to assign per-NIC parameters and leavenic_extra_params_nic1untouched.
223-237: Log boot timing deltas with explicit units.The three
time.strftimecalls show absolute timestamps, but the derived duration prints asHH:MM:SSwithout context. Append “(cold-boot duration)” or similar so automation parsing the log can differentiate between timestamps and elapsed time.
| nics_num = 27 | ||
| start_vm = no | ||
| nics_num_checking_cmd = 'ipconfig | find /c "Ethernet adapter"' | ||
| fastinit_name = "FastInit" |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Make the NIC count check resilient to localized guests.
Hard-coding 'ipconfig | find /c "Ethernet adapter"' will undercount adapters on non-English Windows builds and on systems where the interface name differs (e.g., “Carte Ethernet”). If this count gates the test, consider querying WMI (wmic nic where "Name like 'Red Hat VirtIO%'" get Name) or PowerShell (Get-NetAdapter) instead so the validation works across locales.
🤖 Prompt for AI Agents
In qemu/tests/cfg/netkvm_fastinit.cfg around lines 14 to 17, the
nics_num_checking_cmd currently uses a hard-coded, English-only ipconfig string
which will undercount adapters on localized Windows guests; replace this with a
locale-independent query (e.g., use WMI to filter by adapter Name matching the
VirtIO driver or use PowerShell Get-NetAdapter to enumerate adapters and count
them), update nics_num_checking_cmd to run that command and return the adapter
count, and optionally add a fallback path so the test still works if
PowerShell/WMI isn't available.
| enable_verifier = bool(params.get_numeric("enable_verifier", 1)) | ||
| query_cmd = params.get("driver_verifier_query", "verifier /querysettings") | ||
| enable_cmd = params.get( | ||
| "driver_verifier_enable", "verifier /standard /flags netkvm.sys ndis.sys" | ||
| ) | ||
| reset_cmd = params.get("driver_verifier_reset", "verifier /reset") | ||
|
|
||
| def verifier_is_on(session, query_cmd=query_cmd): | ||
| """ | ||
| Return True if Driver Verifier is currently enabled | ||
|
|
||
| Driver Verifier is considered *OFF* when the mask is | ||
| **0x00000000**; otherwise it is *ON*. | ||
| """ | ||
| output = session.cmd_output(query_cmd) | ||
| return "0x00000000" not in output | ||
|
|
||
| session = vm.wait_for_serial_login(timeout) | ||
| active = verifier_is_on(session) | ||
| if enable_verifier is not active: | ||
| if enable_verifier is True: | ||
| cmd = enable_cmd | ||
| else: | ||
| cmd = reset_cmd | ||
| session.cmd_status_output(cmd) | ||
| vm.reboot(method="shell", serial=True, timeout=timeout, session=session) | ||
| else: | ||
| return | ||
|
|
||
| session = vm.wait_for_serial_login(timeout) | ||
| test.log.info("Current Driver Verifier: %s", session.cmd_output(query_cmd)) | ||
| active = verifier_is_on(session) | ||
| if enable_verifier is not active: | ||
| test.error("Driver Verifier state MISmatch after reboot") | ||
| else: | ||
| test.log.info("Driver Verifier state MATCH after reboot") | ||
| session.close() | ||
|
|
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Refresh the verifier status helper to avoid stale sessions.
verifier_is_on() calls session.cmd_output() without retry/backoff, but you close the session at the end of disable_driver_verifier. If the guest reboots slower than expected, you risk using a session tied to the pre-reboot shell. Reacquire a fresh session (or reopen via vm.wait_for_serial_login) before checking the status post-reboot to avoid false mismatches.
| lines = log_text.strip().splitlines() | ||
| active_count = 0 | ||
| lazy_failures = 0 | ||
| expected_nics_count = params.get_numeric("nics_num", 27) | ||
| lazy_alloc_threshold = params.get_numeric("lazy_alloc_threshold", 0) | ||
| init_alloc_threshold = params.get_numeric("init_alloc_threshold", 0) | ||
|
|
||
| for i, ln in enumerate(lines): | ||
| if "Active=TRUE" in ln: | ||
| active_count += 1 | ||
| start = i | ||
| end = min(len(lines), i + 11) | ||
| block = "\n".join(lines[start:end]) | ||
|
|
||
| name_match = re.search(r"InstanceName=([^\n]+)", block) | ||
| name = (name_match or [None, "UNKNOWN"])[1].strip() | ||
| init = int((re.search(r"InitTimeMs=(-?\d+)", block) or [None, "0"])[1]) | ||
| lazy = int( | ||
| (re.search(r"LazyAllocTimeMs=(-?\d+)", block) or [None, "0"])[1] | ||
| ) | ||
| total = init + lazy | ||
|
|
||
| if lazy == -1: | ||
| test.error( | ||
| "test.fail: %s | LazyAllocTimeMs = -1 " | ||
| "(should not happen after wait)" % name | ||
| ) | ||
| elif init > init_alloc_threshold: | ||
| test.error( | ||
| "test.fail: %s | InitTimeMs(%s) > %d" | ||
| % (name, init, init_alloc_threshold) | ||
| ) | ||
| elif lazy > lazy_alloc_threshold: | ||
| lazy_failures += 1 | ||
| if lazy_failures > 1: | ||
| test.error( | ||
| "test.fail: %s | LazyAllocTimeMs(%s) > %d" | ||
| % (name, lazy, lazy_alloc_threshold) | ||
| ) | ||
| else: | ||
| test.log.warning( | ||
| "Warning (allowed): %s | LazyAllocTimeMs(%s) > %d", | ||
| name, | ||
| lazy, | ||
| lazy_alloc_threshold, | ||
| ) | ||
| else: | ||
| test.log.info( | ||
| "[OK] %s | Init=%s Lazy=%s Sum=%s", name, init, lazy, total | ||
| ) | ||
|
|
||
| # Verify the count of Active=TRUE NICs matches expected | ||
| if active_count != expected_nics_count: | ||
| test.error( | ||
| "Expected %s NICs with Active=TRUE, but found %s" | ||
| % (expected_nics_count, active_count) | ||
| ) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Handle non-numeric timing fields defensively.
int(re.search(...)[1]) will raise when the WMI output emits blank strings or “N/A” (observed on older virtio builds). Wrap the parse in a try/except and surface a clear TestError pointing to the offending NIC so the test fails gracefully instead of crashing.
🤖 Prompt for AI Agents
In qemu/tests/netkvm_fastinit.py around lines 85 to 141, the direct int(...)
conversions for InitTimeMs and LazyAllocTimeMs will raise on blank/"N/A" WMI
output; wrap parsing in a small try/except that validates the matched string is
numeric before converting, and if conversion fails raise a clear
test.error/TestError that names the offending NIC and includes the raw value.
Compute total only after successful parses, and keep the existing threshold
logic; the except should show which field (InitTimeMs/LazyAllocTimeMs), the raw
value, and the NIC name so the test fails gracefully instead of crashing.
| def fastinit_nics_operations(session, vm, params, test, timeout): | ||
| """ | ||
| Check first NIC FastInit status only. | ||
|
|
||
| :param session: VM session info | ||
| :param vm: QEMU test object | ||
| :param params: Dictionary with the test parameters | ||
| :param test: QEMU test object | ||
| :param timeout: VM login time value | ||
| """ | ||
| error_context.context("Checking first NIC FastInit status...", test.log.info) | ||
| fastinit_value = params.get_numeric("fastinit_value", 1) | ||
| first_nic_num = 0 # Only check first NIC to save time | ||
|
|
||
| # Only verify the first NIC's fastinit status to save time | ||
| current_value = utils_net.get_netkvm_param_value( | ||
| vm, fastinit_name, nic_index=first_nic_num | ||
| ) | ||
| test.log.info("NIC %d current FastInit value: %s", first_nic_num, current_value) | ||
|
|
||
| if str(current_value).strip() != str(fastinit_value): | ||
| for nic_num in range(0, nics_num - 1): | ||
| utils_net.set_netkvm_param_value( | ||
| vm, fastinit_name, fastinit_value, nic_index=nic_num | ||
| ) | ||
| output = utils_net.get_netkvm_param_value( | ||
| vm, fastinit_name, nic_index=nic_num | ||
| ) | ||
| test.log.info("NIC %d new FastInit value is %s", nic_num, output) | ||
|
|
||
| login_timeout = params.get_numeric("login_timeout", 3600) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Avoid reusing the same session after toggling FastInit.
utils_net.set_netkvm_param_value() can reboot the guest when Driver Verifier is active. Because you do not re-login after each call, later commands may act on a dead session. Acquire a fresh session inside the loop (or exit early if a reboot is detected) to keep operations robust.
🤖 Prompt for AI Agents
In qemu/tests/netkvm_fastinit.py around lines 162 to 192, the loop toggles
FastInit with utils_net.set_netkvm_param_value() which can trigger a guest
reboot (e.g. when Driver Verifier is active) but the code continues to reuse the
original session; update the code to obtain a fresh session after each set
operation (or detect a reboot and re-login) before calling any further guest
commands: after calling set_netkvm_param_value() call the appropriate VM
login/wait method to re-establish session (or break/return if reboot is
detected) and replace the session variable used for subsequent
get_netkvm_param_value() and logging so you never operate on a dead session.
| for i in range(2, nics_num + 1): | ||
| nics = "nic%s" % i | ||
| params["nics"] = " ".join([params["nics"], nics]) | ||
| params["nic_extra_params_" + str(nics)] = nics_param | ||
| params["start_vm"] = "yes" |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Guard against missing nics baseline before concatenation.
If params["nics"] is empty or undefined, the " ".join(...) call raises a KeyError. Initialize params.setdefault("nics", "nic1") (or similar) before the loop so the test config doesn’t have to predefine it.
🤖 Prompt for AI Agents
In qemu/tests/netkvm_fastinit.py around lines 197 to 201, the loop concatenates
onto params["nics"] without ensuring it exists; this can raise a KeyError if
params lacks a baseline. Before the loop, ensure a baseline by calling
params.setdefault("nics", "nic1") (or another sensible default) so subsequent "
".join operations are safe, and keep adding per-NIC extra params as currently
implemented.
This test case wants to add some checkpoints to verify the long initialization time of the NETKVM device driver (NDIS miniport).
ID: 2267
Signed-off-by: wji [email protected]
Summary by CodeRabbit
New Features
Tests