Skip to content

ObjectReference is required to be word aligned. addr: 0x8000026 #291

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Closed
qinsoon opened this issue Nov 25, 2024 · 9 comments · Fixed by #294
Closed

ObjectReference is required to be word aligned. addr: 0x8000026 #291

qinsoon opened this issue Nov 25, 2024 · 9 comments · Fixed by #294
Labels
C-bug Category: Bug

Comments

@qinsoon
Copy link
Member

qinsoon commented Nov 25, 2024

Reproduciable with v0.29.0. It is a bit strange that I didn't see the issue before. It would be good if anyone can try reproduce it and see if the issue actually exists.

MMTk OpenJDK: 1fcc9f1 (v0.29.0)
OpenJDK: 28e56ee32525c32c5a88391d0b01f24e5cd16c0f (as recorded in Cargo.toml with the biding commit)

Build:

export DEBUG_LEVEL=slowdebug && sh configure --disable-warnings-as-errors --with-debug-level=$DEBUG_LEVEL && make CONF=linux-x86_64-normal-server-$DEBUG_LEVEL THIRD_PARTY_HEAP=/home/yilin/Code/openjdk_workspace/mmtk-openjdk/openjdk

Run:

yilin@rat:~/Code/openjdk_workspace/openjdk$ RUST_BACKTRACE=1 build/linux-x86_64-normal-server-slowdebug/jdk/bin/java -XX:+UseThirdPartyHeap -jar /usr/share/benchmarks/dacapo/dacapo-23.11-chopin.jar fop
[2024-11-25T06:20:07Z INFO  mmtk::memory_manager] Initialized MMTk with GenImmix (DynamicHeapSize(6815744, 8391753728))
[2024-11-25T06:20:07Z INFO  mmtk::util::heap::gc_trigger] [POLL] nursery: Triggering collection (1670/1664 pages)
[2024-11-25T06:20:07Z INFO  mmtk::plan::generational::global] Nursery GC
[2024-11-25T06:20:07Z INFO  mmtk::plan::generational::immix::global] Nursery GC
[2024-11-25T06:20:07Z WARN  mmtk::util::heap::gc_trigger] Proportional nursery with min size 0.25 (1703936) is smaller than DEFAULT_MIN_NURSERY (2097152). Use DEFAULT_MIN_NURSERY instead.
[2024-11-25T06:20:07Z INFO  mmtk::scheduler::scheduler] End of GC (276/1664 pages, took 9 ms)
thread '<unnamed>' panicked at /home/yilin/.cargo/git/checkouts/mmtk-core-3306bdeb8eb4322b/8640ab8/src/util/address.rs:589:9:
ObjectReference is required to be word aligned.  addr: 0x817606c
stack backtrace:
   0: rust_begin_unwind
             at /rustc/aedd173a2c086e558c2b66d3743b344f977621a7/library/std/src/panicking.rs:647:5
   1: core::panicking::panic_fmt
             at /rustc/aedd173a2c086e558c2b66d3743b344f977621a7/library/core/src/panicking.rs:72:14
   2: mmtk::util::address::ObjectReference::from_raw_address
             at /home/yilin/.cargo/git/checkouts/mmtk-core-3306bdeb8eb4322b/8640ab8/src/util/address.rs:589:9
   3: mmtk::util::api_util::<impl core::convert::From<mmtk::util::api_util::NullableObjectReference> for core::option::Option<mmtk::util::address::ObjectReference>>::from
             at /home/yilin/.cargo/git/checkouts/mmtk-core-3306bdeb8eb4322b/8640ab8/src/util/api_util.rs:22:9
   4: <T as core::convert::Into<U>>::into
             at /rustc/aedd173a2c086e558c2b66d3743b344f977621a7/library/core/src/convert/mod.rs:758:9
   5: mmtk_object_reference_write_slow
             at /home/yilin/Code/openjdk_workspace/mmtk-openjdk/mmtk/src/api.rs:431:60
   6: _ZN21MMTkBarrierSetRuntime32object_reference_write_slow_callEPvS0_S0_
             at /home/yilin/Code/openjdk_workspace/mmtk-openjdk/openjdk/mmtkBarrierSet.cpp:144:37
   7: <unknown>
note: Some details are omitted, run with `RUST_BACKTRACE=full` for a verbose backtrace.
fatal runtime error: failed to initiate panic, error 5
Aborted (core dumped)
@qinsoon qinsoon added the C-bug Category: Bug label Nov 25, 2024
@qinsoon
Copy link
Member Author

qinsoon commented Nov 25, 2024

This issue can also be reproduced on a different machine, and with fop from dacapo-2006-10-MR2.

RUST_BACKTRACE=1 build/linux-x86_64-normal-server-slowdebug/jdk/bin/java -XX:+UseThirdPartyHeap -jar /usr/share/benchmarks/dacapo/dacapo-2006-10-MR2.jar fop
[2024-11-25T23:49:16Z INFO  mmtk::memory_manager] Initialized MMTk with GenImmix (DynamicHeapSize(6815744, 8391753728))
[2024-11-25T23:49:17Z INFO  mmtk::util::heap::gc_trigger] [POLL] nursery: Triggering collection (1670/1664 pages)
[2024-11-25T23:49:17Z INFO  mmtk::plan::generational::global] Nursery GC
[2024-11-25T23:49:17Z INFO  mmtk::plan::generational::immix::global] Nursery GC
[2024-11-25T23:49:17Z WARN  mmtk::util::heap::gc_trigger] Proportional nursery with min size 0.25 (1703936) is smaller than DEFAULT_MIN_NURSERY (2097152). Use DEFAULT_MIN_NURSERY instead.
[2024-11-25T23:49:17Z INFO  mmtk::scheduler::scheduler] End of GC (320/1664 pages, took 20 ms)
thread '<unnamed>' panicked at /home/yilin/.cargo/git/checkouts/mmtk-core-3306bdeb8eb4322b/8640ab8/src/util/address.rs:589:9:
ObjectReference is required to be word aligned.  addr: 0x80000cc
stack backtrace:
   0: rust_begin_unwind
             at /rustc/aedd173a2c086e558c2b66d3743b344f977621a7/library/std/src/panicking.rs:647:5
   1: core::panicking::panic_fmt
             at /rustc/aedd173a2c086e558c2b66d3743b344f977621a7/library/core/src/panicking.rs:72:14
   2: mmtk::util::address::ObjectReference::from_raw_address
             at /home/yilin/.cargo/git/checkouts/mmtk-core-3306bdeb8eb4322b/8640ab8/src/util/address.rs:589:9
   3: mmtk::util::api_util::<impl core::convert::From<mmtk::util::api_util::NullableObjectReference> for core::option::Option<mmtk::util::address::ObjectReference>>::from
             at /home/yilin/.cargo/git/checkouts/mmtk-core-3306bdeb8eb4322b/8640ab8/src/util/api_util.rs:22:9
   4: <T as core::convert::Into<U>>::into
             at /rustc/aedd173a2c086e558c2b66d3743b344f977621a7/library/core/src/convert/mod.rs:758:9
   5: mmtk_object_reference_write_slow
             at /home/yilin/Code/mmtk-openjdk/mmtk/src/api.rs:431:60
   6: _ZN21MMTkBarrierSetRuntime32object_reference_write_slow_callEPvS0_S0_
             at /home/yilin/Code/mmtk-openjdk/openjdk/mmtkBarrierSet.cpp:144:37
   7: <unknown>
note: Some details are omitted, run with `RUST_BACKTRACE=full` for a verbose backtrace.
fatal runtime error: failed to initiate panic, error 5
Aborted (core dumped)

@qinsoon
Copy link
Member Author

qinsoon commented Nov 26, 2024

This issue is only reproducible when we use a maximum heap size that is larger than 4G, and when compressed oops is enabled. We are not seeing this issue in our CI tests, as we test with heap sizes that are smaller than 4G.

So the problem is that after encoding, a compressed oop is not word aligned. The encoding for compressed oops includes right-shifting a word aligned 64-bit pointer by 3 bits, and there seems no guarantee that the compressed oops are word aligned.

We cannot directly pass compressed oops to MMTk as ObjectReference. One possible solution is to use the Slot type in the write barrier API so we will decode the compressed oops back to a 64-bit address which can be passed to MMTk as an ObjectReference.

@wks
Copy link
Contributor

wks commented Nov 26, 2024

In void MMTkBarrierSetRuntime::object_reference_write_slow_call(void* src, void* slot, void* target) (implemented in C++), src, slot and target are all compressed pointers. It simply calls ::mmtk_object_reference_write_slow (implemented in Rust) and passes src, slot and target verbatim.

However, the Rust function is defined as following:

pub extern "C" fn mmtk_object_reference_write_slow(
    mutator: *mut libc::c_void,
    src: ObjectReference,
    slot: Address,
    target: NullableObjectReference,
) {
    with_mutator!(|mutator| {
        mutator
            .barrier()
            .object_reference_write_slow(src, slot.into(), target.into());
    })
}

This means src and target must be MMTk-level (uncompressed) ObjectReference values, and slot must be an interior pointer (also uncompressed) pointing to a field. But all of those parameters get wrong values.

The way to fix it is introducing a Rust function that includes the decoding

pub extern "C" fn mmtk_object_reference_write_slow_compressedoops(
    mutator: *mut libc::c_void,
    src_compressed: u32,
    slot_compressed: u32,
    target_compressed: u32,
) {
    let src: ObjectReference = decompress(src_compressed);
    let slot: OpenJDKSlot<true> = decompress(slot_compressed);
    let target: Option<ObjectReference> = decompress(target_compressed);
    with_mutator!(|mutator| {
        mutator
            .barrier()
            .object_reference_write_slow(src, slot, target);
    })
}

And let the C++ part MMTkBarrierSet choose whether to call the _compressedoops version or the uncompressed version. Maybe we can parameterize MMTkBarrierSet with a COMPRESSED parameter and let get_selected_barrier() instantiate the right one according to whether the UseCompressedOops option is enabled or not.

@k-sareen
Copy link
Collaborator

Does this not mean the write barrier was broken for OpenJDK? As in it was adding incorrect object references to the remset?

@wks
Copy link
Contributor

wks commented Nov 26, 2024

Does this not mean the write barrier was broken for OpenJDK? As in it was adding incorrect object references to the remset?

Yes. I wonder why this has been working for so long.

@qinsoon
Copy link
Member Author

qinsoon commented Nov 26, 2024

I tried the version right after introducing compressed pointer support (v0.20): a1a8bdf. It 'worked' in a very hacky way.

The object reference we get in write barriers are still compressed pointers. If we add an alignment assertion in write barriers, we still see the assertion fail. The object references (compressed pointers) in MMTk are stored in mod buffers and only used for calling the binding to scan the objects during a GC. The binding knows that it is a compressed pointer, and knows how to decode to get klass information to scan the objects.

All the methods related with ObjectReference in ObjectModel such as ref_to_address, ref_to_object_start, and ref_to_header simply cast the object reference to the raw address, which do not handle the cases for compressed pointers. But they are not used for the references in the mod buffer.

@wks
Copy link
Contributor

wks commented Dec 2, 2024

Note: When using CompressedOops, the heap start address is 0x4000_0000, so 0x800_0026 must be a compressed pointer. After shifting 3 bits to the left, 0x800_0026 becomes 0x4000_0130, and it is in the heap.

@wks
Copy link
Contributor

wks commented Dec 2, 2024

Actually the mmtk-openjdk binding calls the write barriers with uncompressed pointers most of the time.

[2024-12-02T08:38:06Z INFO  mmtk_openjdk::api] mmtk_array_copy_post(0x40041050, 0x40cbf3e0, 0)
[2024-12-02T08:38:06Z INFO  mmtk_openjdk::api] mmtk_array_copy_post(0x40cbf3e0, 0x40cbfa58, 3)
[2024-12-02T08:38:06Z INFO  mmtk_openjdk::api] mmtk_array_copy_post(0x40041050, 0x40d8be30, 0)
[2024-12-02T08:38:06Z INFO  mmtk::mmtk] User triggering collection
[2024-12-02T08:38:06Z INFO  mmtk::plan::generational::global] Nursery GC
[2024-12-02T08:38:06Z INFO  mmtk::plan::generational::immix::global] Nursery GC
[2024-12-02T08:38:06Z INFO  mmtk::scheduler::scheduler] End of GC (649/131072 pages, took 10 ms)
immix_mature    0x41000000->0x413fffff
nursery    
immortal IN 
los  N 0x40400000->0x407fffff
nonmoving IN 
[2024-12-02T08:38:06Z INFO  mmtk_openjdk::api] mmtk_object_reference_write_slow_2(0x413d9c58, 0x413d9c80, 0x40800038)
===== DaCapo 9.12-MR1 lusearch starting =====
[2024-12-02T08:38:06Z INFO  mmtk_openjdk::api] mmtk_object_reference_write_slow_2(0x413d8338, 0x413d8374, 0x413ff2d8)
[2024-12-02T08:38:06Z INFO  mmtk_openjdk::api] mmtk_object_reference_write_slow_1(0x413f48b8, 0x413f48d0, 0x408002b0)
[2024-12-02T08:38:06Z INFO  mmtk_openjdk::api] mmtk_object_reference_write_slow_1(0x41395f48, 0x413960e4, 0x40800360)
[2024-12-02T08:38:06Z INFO  mmtk_openjdk::api] mmtk_object_reference_write_slow_1(0x413a8dc0, 0x413a8f5c, 0x408003d0)
[2024-12-02T08:38:06Z INFO  mmtk_openjdk::api] mmtk_object_reference_write_slow_3(0x413f0128, 0x413f0129, 0x810008d)
thread '<unnamed>' panicked at /home/wks/projects/mmtk-github/parallels/fix/openjdk-wb-coops/mmtk-core/src/util/address.rs:589:9:
ObjectReference is required to be word aligned.  addr: 0x810008d
stack backtrace:
   0: rust_begin_unwind
             at /rustc/aedd173a2c086e558c2b66d3743b344f977621a7/library/std/src/panicking.rs:647:5
   1: core::panicking::panic_fmt
             at /rustc/aedd173a2c086e558c2b66d3743b344f977621a7/library/core/src/panicking.rs:72:14
   2: mmtk::util::address::ObjectReference::from_raw_address
             at /home/wks/projects/mmtk-github/parallels/fix/openjdk-wb-coops/mmtk-core/src/util/address.rs:589:9
   3: mmtk::util::api_util::<impl core::convert::From<mmtk::util::api_util::NullableObjectReference> for core::option::Option<mmtk::util::address::ObjectReference>>::from
             at /home/wks/projects/mmtk-github/parallels/fix/openjdk-wb-coops/mmtk-core/src/util/api_util.rs:23:9
   4: <T as core::convert::Into<U>>::into
             at /rustc/aedd173a2c086e558c2b66d3743b344f977621a7/library/core/src/convert/mod.rs:758:9
   5: mmtk_object_reference_write_slow
             at /home/wks/projects/mmtk-github/parallels/fix/openjdk-wb-coops/mmtk-openjdk/mmtk/src/api.rs:435:60
   6: _ZN21MMTkBarrierSetRuntime34object_reference_write_slow_call_3EPvS0_S0_
             at ./../mmtk-openjdk/openjdk/mmtkBarrierSet.cpp:152:37
   7: <unknown>
note: Some details are omitted, run with `RUST_BACKTRACE=full` for a verbose backtrace.
fatal runtime error: failed to initiate panic, error 5
fish: Job 1, './build/linux-x86_64-normal-ser…' terminated by signal SIGABRT (Abort)

I made several copies of the MMTkBarrierSetRuntime::object_reference_write_slow_call function, with suffixes _1, _2, _3 and _4, so that each call site calls a different function. The error comes from the call site in MMTkObjectBarrierSetAssembler::object_reference_write_post.

@wks
Copy link
Contributor

wks commented Dec 2, 2024

I also implemented MMTkObjectBarrierSetAssembler::object_reference_write_pre which simply calls mmtk_object_reference_write_pre which does nothing for ObjectBarrier.

[2024-12-02T09:31:31Z INFO  mmtk_openjdk::api] mmtk_object_reference_write_pre(0x40000980, 0x400009f0, 0x40006368)
[2024-12-02T09:31:31Z INFO  mmtk_openjdk::api] mmtk_object_reference_write_post(0x40000980, 0x400009f0, 0x8000c6d)
thread '<unnamed>' panicked at /home/wks/projects/mmtk-github/parallels/fix/openjdk-wb-coops/mmtk-core/src/util/address.rs:589:9:
ObjectReference is required to be word aligned.  addr: 0x8000c6d

We can see that the "target" value was uncompressed in the pre barrier, but compressed in the post barrier. Why?

The real problem is in the implementation of BarrierSetAssembler::store_at for x86_64.

class MMTkBarrierSetAssembler: public BarrierSetAssembler {
  // ...
  virtual void store_at(MacroAssembler* masm, DecoratorSet decorators, BasicType type, Address dst, Register val, Register tmp1, Register tmp2) override {
    if (type == T_OBJECT || type == T_ARRAY) object_reference_write_pre(masm, decorators, dst, val, tmp1, tmp2);
    BarrierSetAssembler::store_at(masm, decorators, type, dst, val, tmp1, tmp2);
    if (type == T_OBJECT || type == T_ARRAY) object_reference_write_post(masm, decorators, dst, val, tmp1, tmp2);
  }
  // ...
};

And in OpenJDK (barrierSetAssembler_x86.cpp):

void BarrierSetAssembler::store_at(MacroAssembler* masm, DecoratorSet decorators, BasicType type,
                                   Address dst, Register val, Register tmp1, Register tmp2) {
  bool in_heap = (decorators & IN_HEAP) != 0;
  // ...
  switch (type) {
  case T_OBJECT:
  case T_ARRAY: {
    if (in_heap) {
      if (val == noreg) {
        // ... storing constant null reference to the field
      } else {
#ifdef _LP64
        if (UseCompressedOops) {
          assert(!dst.uses(val), "not enough registers");
          if (is_not_null) {
            __ encode_heap_oop_not_null(val);
          } else {
            __ encode_heap_oop(val);
          }
          __ movl(dst, val);
        } else
#endif
        {
          __ movptr(dst, val);
        }
      }
    } else {
      // ... error handling
    }
    break;

When UseCompressedOops is true, it will call MacroAssembler::encode_heap_oop on val. That has the side effect of changing the value in val and make it compressed.

// Algorithm must match oop.inline.hpp encode_heap_oop.
void MacroAssembler::encode_heap_oop(Register r) {
  // ... assertions go here
  if (Universe::narrow_oop_base() == NULL) {
    // ... handle null oop.
    return;
  }
  testq(r, r);
  cmovq(Assembler::equal, r, r12_heapbase);
  subq(r, r12_heapbase);
  shrq(r, LogMinObjAlignmentInBytes);
}

The subq and shrq instructions will mutate the register r in place. This will change the register value the post barrier can see. In this case, we see compressed pointer in the val register, which is passed verbatim as the target argument of mmtk_object_reference_write_post or mmtk_object_reference_write_slow.

Solution

Knowing what's going wrong, I can think of three solutions.

  1. Don't depend on the behavior of BarrierSetAssembler::store_at in OpenJDK. In MMTkBarrierSetAssembler::store_at, we save the val register before calling BarrierSetAssembler::store_at, and restore the val register after it returns.
  2. Depend on the behavior of BarrierSetAssembler::store_at in OpenJDK, and decompress the target argument in the post barrier.
  3. Hack mmtk-core and introduce a form of barrier that don't take the slot or target arguments (See this issue). We add something like
fn mmtk_object_reference_write_no_slot_no_target(mutator: Mutator, src: ObjectReference);

And use it specifically for the ObjectBarrier. Because the assembler is already aware of the ObjectBarrier, we can let MMTkObjectBarrierSetAssembler depend on mmtk_object_reference_write_no_slot_no_target.

Option (1) has the fewest assumptions, and it is relatively easy to implement

@@ -38,9 +38,20 @@ void MMTkObjectBarrierSetRuntime::object_reference_write_post(oop src, oop* slot
 
 #define __ masm->
 
+void MMTkObjectBarrierSetAssembler::object_reference_write_pre(MacroAssembler* masm, DecoratorSet decorators, Address dst, Register val, Register tmp1, Register tmp2) const {
+  if (can_remove_barrier(decorators, val, /* skip_const_null */ true)) return;
+  Register obj = dst.base();
+
+  __ push(val);
+  __ push(val);
+}
+
 void MMTkObjectBarrierSetAssembler::object_reference_write_post(MacroAssembler* masm, DecoratorSet decorators, Address dst, Register val, Register tmp1, Register tmp2) const {
   if (can_remove_barrier(decorators, val, /* skip_const_null */ true)) return;
   Register obj = dst.base();
+
+  __ pop(val);
+  __ pop(val);
 #if MMTK_ENABLE_BARRIER_FASTPATH
   Label done;

But the down side is that it adds two push instructions on the store fast path.

Option (2) should perform better because it adds nothing on the fast path. But we now relies on the behavior of BarrierSetAssembler::store_at in OpenJDK.

But Option (3) should perform even better because it omits the slot and the target arguments completely. But that needs change on the mmtk-core side, and we haven't discussed that through.

@wks wks closed this as completed in #294 Dec 5, 2024
wks added a commit that referenced this issue Dec 5, 2024
`MMTkBarrierSetAssembler::store_at` calls `object_reference_write_post`
after calling `BarrierSetAssembler::store_at`. However,
`BarrierSetAssembler::store_at` modifies the `val` register to compress
its value in place. Consequently, `object_reference_write_post` will see
a compressed pointer in the `val` register.

This PR makes two changes.

Firstly, in
`MMTkObjectBarrierSetAssembler::object_reference_write_post`, we simply
set both `c_rarg1` and `c_rarg2` to 0 before calling the write barrier
slow path. We exploit the semantics of the `ObjectBarrier` that it
simply logs the object without looking at the slot or the target. This
will fix the [assertion
error](#291) because 0 will
be interpreted as a `None` of type `Option<ObjectReference>`.

Secondly, we add a `bool compensate_val_reg` parameter to
`MMTkBarrierSetAssembler::object_reference_write_post` so that if we
call it after `BarrierSetAssembler::store_at`, we can give
`object_reference_write_post` a chance to decompress the compressed oop
in the `val`. This is intended for implementing *other barriers
introduced in the future* that may use the `val` register, and keep the
developers informed that the `val` register is mutated in
`BarrierSetAssembler::store_at`.

Fixes: #291
Related PR: #293
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
C-bug Category: Bug
Projects
None yet
3 participants