Skip to content

CRASH: tool.drcacheoff.analysis_unit_tests fails the "Testing tool errors" subtest with a timeout #7741

@palmer-dabbelt

Description

@palmer-dabbelt

Describe the bug
I have two Arm machines (one's a GCP machine, one's internal). This test passes on our internal machines and fails on the GCP machines. As far as I can tell this failure is what the test is looking for: the scheduler isn't cleaning up one of the inputs, and thus it's hanging.

I'm not sure what other debugging I should do here.

To Reproduce
Steps to reproduce the behavior:

  1. make test

Expected behavior
The test to pass.

Screenshots or Pasted Text
Here's the "tool errors" subtest from the log on the broken machine. Note that I've added

diff --git a/clients/drcachesim/scheduler/scheduler_impl.cpp b/clients/drcachesim/scheduler/scheduler_impl.cpp
index 3c570b6e5..d04623bc0 100644
--- a/clients/drcachesim/scheduler/scheduler_impl.cpp
+++ b/clients/drcachesim/scheduler/scheduler_impl.cpp
@@ -3445,7 +3445,7 @@ scheduler_impl_tmpl_t<RecordType, ReaderType>::mark_input_eof(input_info_t &inpu
         live_input_count_.fetch_add(-1, std::memory_order_release);
     assert(old_count > 0);
     int live_inputs = live_input_count_.load(std::memory_order_acquire);
-    VPRINT(this, 2, "input %d at eof; %d live inputs left\n", input.index, live_inputs);
+    VPRINT(this, 1, "input %d at eof; %d live inputs left\n", input.index, live_inputs);
     if (options_.mapping == sched_type_t::MAP_TO_ANY_OUTPUT &&
         live_inputs <=
             static_cast<int>(inputs_.size() * options_.exit_if_fraction_inputs_left)) {

to make that message more verbose as I was trying to see what's going on.

----------------
Testing tool errors
[scheduler] Scheduler configuration:
[scheduler]   Inputs                    : 5
[scheduler]   Outputs                   : 2
[scheduler]   mapping                   : 2
[scheduler]   deps                      : 0
[scheduler]   flags                     : 0x00000002
[scheduler]   quantum_unit              : 0
[scheduler]   quantum_duration          : 0
[scheduler]   verbosity                 : 1
[scheduler]   schedule_record_ostream   : (nil)
[scheduler]   schedule_replay_istream   : (nil)
[scheduler]   replay_as_traced_istream  : (nil)
[scheduler]   syscall_switch_threshold  : 30000000
[scheduler]   blocking_switch_threshold : 500
[scheduler]   block_time_scale          : 0.000000
[scheduler]   block_time_max            : 0
[scheduler]   kernel_switch_trace_path  :
[scheduler]   kernel_switch_reader      : (nil)
[scheduler]   kernel_switch_reader_end  : (nil)
[scheduler]   single_lockstep_output    : 0
[scheduler]   randomize_next_input      : 0
[scheduler]   read_inputs_in_init       : 1
[scheduler]   honor_direct_switches     : 1
[scheduler]   time_units_per_us         : 1000.000000
[scheduler]   quantum_duration_us       : 5000
[scheduler]   quantum_duration_instrs   : 10000000
[scheduler]   block_time_multiplier     : 0.100000
[scheduler]   block_time_max_us         : 2500
[scheduler]   migration_threshold_us    : 500
[scheduler]   rebalance_period_us       : 50000
[scheduler]   honor_infinite_timeouts   : 0
[scheduler]   exit_if_fraction_inputs_left : 0.100000
[scheduler]   kernel_syscall_trace_path :
[scheduler]   kernel_syscall_reader     : (nil)
[scheduler]   kernel_syscall_reader_end : (nil)
[scheduler] Reading headers from inputs to find filetypes
[scheduler] Output 0 triggered a rebalance @0:
[analyzer] Creating 2 worker threads
[analyzer] Worker 0 starting on trace shard 0 stream is 0xaaaac49b8390
[analyzer] Worker 1 starting on trace shard 1 stream is 0xaaaac49b8638
[scheduler] input 0 at eof; 4 live inputs left
[scheduler] input 1 at eof; 3 live inputs left
[scheduler] input 2 at eof; 2 live inputs left
[scheduler] input 3 at eof; 1 live inputs left
[analyzer] Worker 0 hit shard memref error cpuid not supported on trace shard
[scheduler] Queue snapshot: inputs: 1 schedulable, 0 unscheduled, 4 eof
  out #0 @28: running #-1; 0 in queue; 0 blocked
  out #1 @500013: running #-1; 0 in queue; 0 blocked

[scheduler] Queue snapshot: inputs: 1 schedulable, 0 unscheduled, 4 eof
  out #0 @28: running #-1; 0 in queue; 0 blocked
  out #1 @1000013: running #-1; 0 in queue; 0 blocked

[scheduler] Queue snapshot: inputs: 1 schedulable, 0 unscheduled, 4 eof
  out #0 @28: running #-1; 0 in queue; 0 blocked
  out #1 @1500013: running #-1; 0 in queue; 0 blocked

[scheduler] Queue snapshot: inputs: 1 schedulable, 0 unscheduled, 4 eof
  out #0 @28: running #-1; 0 in queue; 0 blocked
  out #1 @2000013: running #-1; 0 in queue; 0 blocked

[scheduler] Queue snapshot: inputs: 1 schedulable, 0 unscheduled, 4 eof
  out #0 @28: running #-1; 0 in queue; 0 blocked
  out #1 @2500013: running #-1; 0 in queue; 0 blocked

[scheduler] Queue snapshot: inputs: 1 schedulable, 0 unscheduled, 4 eof
  out #0 @28: running #-1; 0 in queue; 0 blocked
  out #1 @3000013: running #-1; 0 in queue; 0 blocked

[scheduler] Queue snapshot: inputs: 1 schedulable, 0 unscheduled, 4 eof
  out #0 @28: running #-1; 0 in queue; 0 blocked
  out #1 @3500013: running #-1; 0 in queue; 0 blocked

[scheduler] Queue snapshot: inputs: 1 schedulable, 0 unscheduled, 4 eof
  out #0 @28: running #-1; 0 in queue; 0 blocked
  out #1 @4000013: running #-1; 0 in queue; 0 blocked

[scheduler] Queue snapshot: inputs: 1 schedulable, 0 unscheduled, 4 eof
  out #0 @28: running #-1; 0 in queue; 0 blocked
  out #1 @4500013: running #-1; 0 in queue; 0 blocked

[scheduler] Queue snapshot: inputs: 1 schedulable, 0 unscheduled, 4 eof
  out #0 @28: running #-1; 0 in queue; 0 blocked
  out #1 @5000013: running #-1; 0 in queue; 0 blocked

[scheduler] Queue snapshot: inputs: 1 schedulable, 0 unscheduled, 4 eof
  out #0 @28: running #-1; 0 in queue; 0 blocked
  out #1 @5500013: running #-1; 0 in queue; 0 blocked

[scheduler] Queue snapshot: inputs: 1 schedulable, 0 unscheduled, 4 eof
  out #0 @28: running #-1; 0 in queue; 0 blocked
  out #1 @6000013: running #-1; 0 in queue; 0 blocked

[scheduler] Queue snapshot: inputs: 1 schedulable, 0 unscheduled, 4 eof
  out #0 @28: running #-1; 0 in queue; 0 blocked
  out #1 @6500013: running #-1; 0 in queue; 0 blocked

[scheduler] Queue snapshot: inputs: 1 schedulable, 0 unscheduled, 4 eof
  out #0 @28: running #-1; 0 in queue; 0 blocked
  out #1 @7000013: running #-1; 0 in queue; 0 blocked

Versions

$ git log --pretty=oneline | head -n1
1cec2631a8a289e08c3070c57d568a9587a4cee8 i#7685 DrPoints: add inline counter update for AARCH64 (#7737)
$ uname -a
Linux dynamorio-ubuntu-20-arm 5.15.0-1096-gcp #105~20.04.1-Ubuntu SMP Wed Oct 22 06:50:03 UTC 2025 aarch64 aarch64 aarch64 GNU/Linux
$ cat /etc/lsb-release
DISTRIB_ID=Ubuntu
DISTRIB_RELEASE=20.04
DISTRIB_CODENAME=focal
DISTRIB_DESCRIPTION="Ubuntu 20.04.6 LTS"

Metadata

Metadata

Assignees

No one assigned

    Labels

    No labels
    No labels

    Type

    No type

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions