netkvm: add fastinit regarding lazy memory allocation #4334

heywji · 2025-07-08T06:43:59Z

This test case wants to add some checkpoints to verify the long initialization time of the NETKVM device driver (NDIS miniport).

ID: 2267

Signed-off-by: wji [email protected]

Summary by CodeRabbit

New Features
- Introduced NetKVM FastInit validation for Windows VMs, including enable/disable FastInit per NIC, boot-time measurements, and threshold checks. Supports Driver Verifier toggling and WMI-based reporting.
Tests
- Added comprehensive test coverage with variants for NIC counts, multiqueue settings, and queue sizes. Automates cold/hot boot flows, per-NIC FastInit synchronization, and metrics parsing. Provides detailed diagnostics and failure reporting when thresholds are exceeded.

heywji · 2025-07-08T07:41:53Z

Related to: #4175

heywji · 2025-07-08T08:06:16Z

Depends on: avocado-framework/avocado-vt#4160

This test case wants to add some checkpoints to verify long initialization time of NETKVM device driver (NDIS miniport). Signed-off-by: Wenkang Ji <[email protected]>

coderabbitai · 2025-09-29T03:52:19Z

Walkthrough

Adds a new Windows-focused NetKVM FastInit test: introduces a qemu test config (netkvm_fastinit.cfg) defining variants, NIC counts/queues, thresholds, and verifier/WMI settings; and a Python test (netkvm_fastinit.py) executing cold boot, optional Driver Verifier toggle, per-NIC FastInit configuration, WMI log parsing, and timing/threshold validation.

Changes

Cohort / File(s)	Summary of Changes
Test Configuration `qemu/tests/cfg/netkvm_fastinit.cfg`	New test config for Windows NetKVM FastInit. Sets host/type metadata, cdroms, backup/restore, timeouts, WMI path, nics=27, restricts to virtio_net, counts NICs via command, defines FastInit value wiring, and provides variant families for queue sizes (1x256, 4x1024) with per-variant thresholds and fastinit on/off sub-variants.
Functional Test `qemu/tests/netkvm_fastinit.py`	New test implementing VM cold boot, optional Driver Verifier disable/enable with reboot, per-NIC FastInit check/set (primarily first NIC with propagation if mismatched), WMI dump and log parsing to validate InitTimeMs/LazyAllocTimeMs against thresholds, NIC Active count checks, and extensive logging/error handling. Adds `run(test, params, env)`.

Sequence Diagram(s)

sequenceDiagram
  autonumber
  participant Tester as Avocado Test
  participant VM as Guest VM
  participant GuestOS as Windows (Guest)
  participant WMI as NetKVM WMI
  participant DV as Driver Verifier

  Tester->>VM: Cold boot (create/start)
  VM-->>Tester: Boot complete

  alt Driver Verifier toggle configured
    Tester->>GuestOS: Query DV state
    GuestOS->>DV: Enable/Disable as needed
    DV-->>GuestOS: Applied
    Tester->>VM: Reboot
    VM-->>Tester: Up with DV state verified
  end

  Tester->>GuestOS: Inspect NIC FastInit on NIC0
  alt Mismatch to desired FastInit
    GuestOS->>GuestOS: Set FastInit on NIC0 (and propagate to others if needed)
  end

  Tester->>WMI: Dump NetKVM log/config
  WMI-->>Tester: InitTimeMs, LazyAllocTimeMs, Active NICs

  Tester->>Tester: Validate thresholds and counts
  alt Violations found
    Tester-->>Tester: Fail test with error_context
  else Success
    Tester-->>Tester: Log metrics and pass
  end

  Tester->>VM: Optional hot reboot and re-validate (same steps)

Estimated code review effort

🎯 4 (Complex) | ⏱️ ~60 minutes

Poem

I toggled FastInit with a twitch of my ear,
Counted 27 NICs—oh dear, oh dear!
WMI whispers times in milliseconds neat,
Queues align, thresholds meet.
I thump the log—no flakes, all right!
Burrowed through bytes, the metrics are tight. 🐇✨

Pre-merge checks and finishing touches

✅ Passed checks (3 passed)

Check name	Status	Explanation
Description Check	✅ Passed	Check skipped - CodeRabbit’s high-level summary is enabled.
Title Check	✅ Passed	The title “netkvm: add fastinit regarding lazy memory allocation” concisely indicates the primary change of adding FastInit support for NetKVM’s lazy memory allocation and aligns with the modifications to the qemu test configurations and Python test logic described in the PR.
Docstring Coverage	✅ Passed	Docstring coverage is 100.00% which is sufficient. The required threshold is 80.00%.

✨ Finishing touches

📝 Generate Docstrings

🧪 Generate unit tests

Create PR with unit tests
Post copyable unit tests in a comment

Tip

👮 Agentic pre-merge checks are now available in preview!

Pro plan users can now enable pre-merge checks in their settings to enforce checklists before merging PRs.

Built-in checks – Quickly apply ready-made checks to enforce title conventions, require pull request descriptions that follow templates, validate linked issues for compliance, and more.
Custom agentic checks – Define your own rules using CodeRabbit’s advanced agentic capabilities to enforce organization-specific policies and workflows. For example, you can instruct CodeRabbit’s agent to verify that API documentation is updated whenever API schema files are modified in a PR. Note: Upto 5 custom checks are currently allowed during the preview period. Pricing for this feature will be announced in a few weeks.

Please see the documentation for more information.

Example:

reviews:
  pre_merge_checks:
    custom_checks:
      - name: "Undocumented Breaking Changes"
        mode: "warning"
        instructions: |
          Pass/fail criteria: All breaking changes to public APIs, CLI flags, environment variables, configuration keys, database schemas, or HTTP/GraphQL endpoints must be documented in the "Breaking Change" section of the PR description and in CHANGELOG.md. Exclude purely internal or private changes (e.g., code not exported from package entry points or explicitly marked as internal).

Please share your feedback with us on this Discord post.

_{Comment @coderabbitai help to get the list of available commands and usage tips.}

coderabbitai

Actionable comments posted: 5

🧹 Nitpick comments (4)

qemu/tests/cfg/netkvm_fastinit.cfg (1)

21-46: Ensure queue size parameters align with expected RX/TX symmetry.

For queue4_size1024 you set nics_param to rx_queue_size=1024,tx_queue_size=256, while nic_extra_params_nic1 mirrors the same asymmetry. If the intention is a symmetric 1024×1024 queue, adjust the TX size; otherwise, document why TX stays at 256 so future updates do not “fix” it inadvertently.

qemu/tests/netkvm_fastinit.py (3)

92-135: Track multiple lazy allocation breaches deterministically.

lazy_failures triggers test.error only after the second breach, but the loop keeps running, resulting in redundant errors for the same NIC on subsequent iterations. Collect offending NICs and emit a single consolidated failure once the loop completes to keep logs actionable.

197-201: Stop duplicating NIC extra parameters for nic1.

params["nic_extra_params_" + str(nics)] = nics_param overwrites the value for the first NIC each time through the loop. Use the loop index (nic%s already includes the number) to assign per-NIC parameters and leave nic_extra_params_nic1 untouched.

223-237: Log boot timing deltas with explicit units.

The three time.strftime calls show absolute timestamps, but the derived duration prints as HH:MM:SS without context. Append “(cold-boot duration)” or similar so automation parsing the log can differentiate between timestamps and elapsed time.

📜 Review details

Configuration used: CodeRabbit UI

Review profile: CHILL

Plan: Pro

📥 Commits

Reviewing files that changed from the base of the PR and between f4fface and 6426d5a.

📒 Files selected for processing (2)

qemu/tests/cfg/netkvm_fastinit.cfg (1 hunks)
qemu/tests/netkvm_fastinit.py (1 hunks)

coderabbitai · 2025-09-29T03:54:53Z

qemu/tests/cfg/netkvm_fastinit.cfg

+    nics_num = 27
+    start_vm = no
+    nics_num_checking_cmd = 'ipconfig | find /c "Ethernet adapter"'
+    fastinit_name = "FastInit"


⚠️ Potential issue | 🟠 Major

Make the NIC count check resilient to localized guests.

Hard-coding 'ipconfig | find /c "Ethernet adapter"' will undercount adapters on non-English Windows builds and on systems where the interface name differs (e.g., “Carte Ethernet”). If this count gates the test, consider querying WMI (wmic nic where "Name like 'Red Hat VirtIO%'" get Name) or PowerShell (Get-NetAdapter) instead so the validation works across locales.

🤖 Prompt for AI Agents

In qemu/tests/cfg/netkvm_fastinit.cfg around lines 14 to 17, the nics_num_checking_cmd currently uses a hard-coded, English-only ipconfig string which will undercount adapters on localized Windows guests; replace this with a locale-independent query (e.g., use WMI to filter by adapter Name matching the VirtIO driver or use PowerShell Get-NetAdapter to enumerate adapters and count them), update nics_num_checking_cmd to run that command and return the adapter count, and optionally add a fallback path so the test still works if PowerShell/WMI isn't available.

coderabbitai · 2025-09-29T03:54:53Z

qemu/tests/netkvm_fastinit.py

+        enable_verifier = bool(params.get_numeric("enable_verifier", 1))
+        query_cmd = params.get("driver_verifier_query", "verifier /querysettings")
+        enable_cmd = params.get(
+            "driver_verifier_enable", "verifier /standard /flags netkvm.sys ndis.sys"
+        )
+        reset_cmd = params.get("driver_verifier_reset", "verifier /reset")
+
+        def verifier_is_on(session, query_cmd=query_cmd):
+            """
+            Return True if Driver Verifier is currently enabled
+
+            Driver Verifier is considered *OFF* when the mask is
+            **0x00000000**; otherwise it is *ON*.
+            """
+            output = session.cmd_output(query_cmd)
+            return "0x00000000" not in output
+
+        session = vm.wait_for_serial_login(timeout)
+        active = verifier_is_on(session)
+        if enable_verifier is not active:
+            if enable_verifier is True:
+                cmd = enable_cmd
+            else:
+                cmd = reset_cmd
+            session.cmd_status_output(cmd)
+            vm.reboot(method="shell", serial=True, timeout=timeout, session=session)
+        else:
+            return
+
+        session = vm.wait_for_serial_login(timeout)
+        test.log.info("Current Driver Verifier: %s", session.cmd_output(query_cmd))
+        active = verifier_is_on(session)
+        if enable_verifier is not active:
+            test.error("Driver Verifier state MISmatch after reboot")
+        else:
+            test.log.info("Driver Verifier state MATCH after reboot")
+        session.close()
+


⚠️ Potential issue | 🟠 Major

Refresh the verifier status helper to avoid stale sessions.

verifier_is_on() calls session.cmd_output() without retry/backoff, but you close the session at the end of disable_driver_verifier. If the guest reboots slower than expected, you risk using a session tied to the pre-reboot shell. Reacquire a fresh session (or reopen via vm.wait_for_serial_login) before checking the status post-reboot to avoid false mismatches.

coderabbitai · 2025-09-29T03:54:53Z

qemu/tests/netkvm_fastinit.py

+        lines = log_text.strip().splitlines()
+        active_count = 0
+        lazy_failures = 0
+        expected_nics_count = params.get_numeric("nics_num", 27)
+        lazy_alloc_threshold = params.get_numeric("lazy_alloc_threshold", 0)
+        init_alloc_threshold = params.get_numeric("init_alloc_threshold", 0)
+
+        for i, ln in enumerate(lines):
+            if "Active=TRUE" in ln:
+                active_count += 1
+                start = i
+                end = min(len(lines), i + 11)
+                block = "\n".join(lines[start:end])
+
+                name_match = re.search(r"InstanceName=([^\n]+)", block)
+                name = (name_match or [None, "UNKNOWN"])[1].strip()
+                init = int((re.search(r"InitTimeMs=(-?\d+)", block) or [None, "0"])[1])
+                lazy = int(
+                    (re.search(r"LazyAllocTimeMs=(-?\d+)", block) or [None, "0"])[1]
+                )
+                total = init + lazy
+
+                if lazy == -1:
+                    test.error(
+                        "test.fail: %s | LazyAllocTimeMs = -1 "
+                        "(should not happen after wait)" % name
+                    )
+                elif init > init_alloc_threshold:
+                    test.error(
+                        "test.fail: %s | InitTimeMs(%s) > %d"
+                        % (name, init, init_alloc_threshold)
+                    )
+                elif lazy > lazy_alloc_threshold:
+                    lazy_failures += 1
+                    if lazy_failures > 1:
+                        test.error(
+                            "test.fail: %s | LazyAllocTimeMs(%s) > %d"
+                            % (name, lazy, lazy_alloc_threshold)
+                        )
+                    else:
+                        test.log.warning(
+                            "Warning (allowed): %s | LazyAllocTimeMs(%s) > %d",
+                            name,
+                            lazy,
+                            lazy_alloc_threshold,
+                        )
+                else:
+                    test.log.info(
+                        "[OK] %s | Init=%s Lazy=%s Sum=%s", name, init, lazy, total
+                    )
+
+        # Verify the count of Active=TRUE NICs matches expected
+        if active_count != expected_nics_count:
+            test.error(
+                "Expected %s NICs with Active=TRUE, but found %s"
+                % (expected_nics_count, active_count)
+            )


⚠️ Potential issue | 🟠 Major

Handle non-numeric timing fields defensively.

int(re.search(...)[1]) will raise when the WMI output emits blank strings or “N/A” (observed on older virtio builds). Wrap the parse in a try/except and surface a clear TestError pointing to the offending NIC so the test fails gracefully instead of crashing.

🤖 Prompt for AI Agents

In qemu/tests/netkvm_fastinit.py around lines 85 to 141, the direct int(...) conversions for InitTimeMs and LazyAllocTimeMs will raise on blank/"N/A" WMI output; wrap parsing in a small try/except that validates the matched string is numeric before converting, and if conversion fails raise a clear test.error/TestError that names the offending NIC and includes the raw value. Compute total only after successful parses, and keep the existing threshold logic; the except should show which field (InitTimeMs/LazyAllocTimeMs), the raw value, and the NIC name so the test fails gracefully instead of crashing.

coderabbitai · 2025-09-29T03:54:53Z

qemu/tests/netkvm_fastinit.py

+    def fastinit_nics_operations(session, vm, params, test, timeout):
+        """
+        Check first NIC FastInit status only.
+
+        :param session: VM session info
+        :param vm: QEMU test object
+        :param params: Dictionary with the test parameters
+        :param test: QEMU test object
+        :param timeout: VM login time value
+        """
+        error_context.context("Checking first NIC FastInit status...", test.log.info)
+        fastinit_value = params.get_numeric("fastinit_value", 1)
+        first_nic_num = 0  # Only check first NIC to save time
+
+        # Only verify the first NIC's fastinit status to save time
+        current_value = utils_net.get_netkvm_param_value(
+            vm, fastinit_name, nic_index=first_nic_num
+        )
+        test.log.info("NIC %d current FastInit value: %s", first_nic_num, current_value)
+
+        if str(current_value).strip() != str(fastinit_value):
+            for nic_num in range(0, nics_num - 1):
+                utils_net.set_netkvm_param_value(
+                    vm, fastinit_name, fastinit_value, nic_index=nic_num
+                )
+                output = utils_net.get_netkvm_param_value(
+                    vm, fastinit_name, nic_index=nic_num
+                )
+                test.log.info("NIC %d new FastInit value is %s", nic_num, output)
+
+    login_timeout = params.get_numeric("login_timeout", 3600)


⚠️ Potential issue | 🟠 Major

Avoid reusing the same session after toggling FastInit.

utils_net.set_netkvm_param_value() can reboot the guest when Driver Verifier is active. Because you do not re-login after each call, later commands may act on a dead session. Acquire a fresh session inside the loop (or exit early if a reboot is detected) to keep operations robust.

🤖 Prompt for AI Agents

In qemu/tests/netkvm_fastinit.py around lines 162 to 192, the loop toggles FastInit with utils_net.set_netkvm_param_value() which can trigger a guest reboot (e.g. when Driver Verifier is active) but the code continues to reuse the original session; update the code to obtain a fresh session after each set operation (or detect a reboot and re-login) before calling any further guest commands: after calling set_netkvm_param_value() call the appropriate VM login/wait method to re-establish session (or break/return if reboot is detected) and replace the session variable used for subsequent get_netkvm_param_value() and logging so you never operate on a dead session.

coderabbitai · 2025-09-29T03:54:53Z

qemu/tests/netkvm_fastinit.py

+    for i in range(2, nics_num + 1):
+        nics = "nic%s" % i
+        params["nics"] = " ".join([params["nics"], nics])
+        params["nic_extra_params_" + str(nics)] = nics_param
+    params["start_vm"] = "yes"


⚠️ Potential issue | 🟠 Major

Guard against missing nics baseline before concatenation.

If params["nics"] is empty or undefined, the " ".join(...) call raises a KeyError. Initialize params.setdefault("nics", "nic1") (or similar) before the loop so the test config doesn’t have to predefine it.

🤖 Prompt for AI Agents

In qemu/tests/netkvm_fastinit.py around lines 197 to 201, the loop concatenates onto params["nics"] without ensuring it exists; this can raise a KeyError if params lacks a baseline. Before the loop, ensure a baseline by calling params.setdefault("nics", "nic1") (or another sensible default) so subsequent " ".join operations are safe, and keep adding per-NIC extra params as currently implemented.

heywji force-pushed the add_fastinit_case2 branch 3 times, most recently from 98e220a to 7001b6b Compare July 8, 2025 07:20

heywji force-pushed the add_fastinit_case2 branch from 7001b6b to 5f2aed4 Compare July 14, 2025 02:32

heywji force-pushed the add_fastinit_case2 branch from 5f2aed4 to 6b065a6 Compare July 24, 2025 08:19

netkvm: add fastinit regarding lazy memory allocation

6426d5a

This test case wants to add some checkpoints to verify long initialization time of NETKVM device driver (NDIS miniport). Signed-off-by: Wenkang Ji <[email protected]>

heywji force-pushed the add_fastinit_case2 branch from 6b065a6 to 6426d5a Compare September 29, 2025 03:52

coderabbitai bot reviewed Sep 29, 2025

View reviewed changes

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

netkvm: add fastinit regarding lazy memory allocation #4334

netkvm: add fastinit regarding lazy memory allocation #4334

Uh oh!

heywji commented Jul 8, 2025 •

edited by coderabbitai bot

Loading

Uh oh!

heywji commented Jul 8, 2025

Uh oh!

heywji commented Jul 8, 2025

Uh oh!

coderabbitai bot commented Sep 29, 2025 •

edited

Loading

Uh oh!

coderabbitai bot left a comment

Uh oh!

coderabbitai bot Sep 29, 2025

Uh oh!

coderabbitai bot Sep 29, 2025

Uh oh!

coderabbitai bot Sep 29, 2025

Uh oh!

coderabbitai bot Sep 29, 2025

Uh oh!

coderabbitai bot Sep 29, 2025

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

1 participant

netkvm: add fastinit regarding lazy memory allocation #4334

Are you sure you want to change the base?

netkvm: add fastinit regarding lazy memory allocation #4334

Uh oh!

Conversation

heywji commented Jul 8, 2025 • edited by coderabbitai bot Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Summary by CodeRabbit

Uh oh!

heywji commented Jul 8, 2025

Uh oh!

heywji commented Jul 8, 2025

Uh oh!

coderabbitai bot commented Sep 29, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Walkthrough

Changes

Sequence Diagram(s)

Estimated code review effort

Poem

Pre-merge checks and finishing touches

Uh oh!

coderabbitai bot left a comment

Choose a reason for hiding this comment

Uh oh!

coderabbitai bot Sep 29, 2025

Choose a reason for hiding this comment

Uh oh!

coderabbitai bot Sep 29, 2025

Choose a reason for hiding this comment

Uh oh!

coderabbitai bot Sep 29, 2025

Choose a reason for hiding this comment

Uh oh!

coderabbitai bot Sep 29, 2025

Choose a reason for hiding this comment

Uh oh!

coderabbitai bot Sep 29, 2025

Choose a reason for hiding this comment

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

1 participant

heywji commented Jul 8, 2025 •

edited by coderabbitai bot

Loading

coderabbitai bot commented Sep 29, 2025 •

edited

Loading