Skip to content

Conversation

@JamesKim2998
Copy link

Summary

This PR contains multiple fixes for the ARM64 JIT compiler:

  • V8-V15 callee-saved FP register handling: Never allocate V8-V15 since the prologue doesn't save them (AAPCS64 compliance)
  • RTMP/RTMP2 (X27/X28) save/restore: Save callee-saved scratch registers in prologue/epilogue (frame size 80→96 bytes)
  • Switch statement optimization: Replace O(n) linear scan with O(1) branch table using ADR/BR
  • op_get_mem register clobbering fix: Use RTMP2 for offset calculation to avoid clobbering base register
  • Exception type filtering: Properly dereference global pointers for exception type checks
  • hl_jit_free hardening: Add double-free and use-after-free protection
  • W^X memory protection: Proper write/execute toggling for Apple Silicon
  • Register spilling safety: Free scratch registers (X9, V16) before reuse
  • Size encoding fix: Handle all memory access sizes (1/2/4/8 bytes) correctly
  • OAssert call sequence: Fix literal pool pattern for indirect calls

Test plan

  • Verified instruction encodings against ARM64 reference manual
  • Verified AAPCS64 calling convention compliance
  • Run HashLink test suite on ARM64

🤖 Generated with Claude Code

bmdhacks and others added 16 commits December 17, 2025 21:19
All unit tests pass except stack traces
- implement detailed null access errors
- fix memory size encoding in loads/stores
- properly save/restore RTMP/RTMP2 registers
- improve jit_ctx cleanup and register allocation
- fix dynamic field setters and assertion implementation
- Add freed flag to jit_ctx for double-free protection
- NULL pointers before free() to eliminate use-after-free window
- Poison freed memory with 0xDD pattern in debug builds
- Clear freed flag in hl_jit_reset for context reuse

🤖 Generated with [Claude Code](https://claude.com/claude-code)

Co-Authored-By: Claude Opus 4.5 <[email protected]>
The alloc_fpu() function could allocate V8-V15, which are callee-saved
per AAPCS64, but our prologue/epilogue didn't save/restore them. This
could corrupt caller's floating-point values.

Fix: Only allocate caller-saved FP registers (V0-V7, V16-V31), giving
24 available registers. The eviction pass now also skips V8-V15.

Also fixes CMakeLists.txt to detect 'arm64' (macOS) in addition to
'aarch64' (Linux) for architecture selection.

Adds test_fp_pressure.c to verify register spilling works correctly
under high FP register pressure (25+ simultaneous float values).

🤖 Generated with [Claude Code](https://claude.com/claude-code)

Co-Authored-By: Claude Opus 4.5 <[email protected]>
- Fixed op_set_array to use V16 instead of V0 as temporary FP register, preventing argument register corruption.
- Added safety checks for X9 usage in multiple operations (op_get_mem_reg, op_set_mem_reg, op_get_array, etc.) to ensure the register is freed if currently holding a value.
- Refactored zero-initialization of local variables to use str_stack with XZR for better efficiency and alignment with existing patterns.
- Added Arm64JitTest.hx to verify register pressure and memory operations.
- Removed completed improvement plan phase 2 and added phase 1 roadmap.
- Ported exception type filtering logic from x86 to AArch64 OTrap, ensuring catch(e:Type) correctly filters exceptions.
- Optimized OSwitch to use a jump table (ADR/ADD/BR sequence) instead of linear search for better performance.
- Fixed a potential register conflict in OSwitch by using RTMP2 for stack-loaded values.
Added LDR instruction to dereference the global address to get the actual
type object pointer for tcheck, matching x86 behavior.
- Add 'com.apple.security.cs.allow-jit' entitlement to 'other/osx/entitlements.xml' to allow W^X memory protection changes.
- Remove redundant 'mprotect' workaround in 'src/jit_aarch64.c' as the entitlement ensures 'pthread_jit_write_protect_np' works correctly.
@tobil4sk
Copy link
Member

Did you mean to open this PR against https://github.com/bmdhacks/hashlink?

JamesKim2998 and others added 4 commits December 21, 2025 09:06
- Add semaphore_signal() to EXC_BAD_INSTRUCTION and EXC_BAD_ACCESS
  handlers so session_wait() returns immediately instead of timing out
- Fix memory leaks in read_register() and write_register() by freeing
  allocated thread/debug state structs after use
- Add REG_DR4-DR7 cases to get_register_name() for complete debug
  register name mapping
- Document that EXC_ARM_SINGLE_STEP and EXC_ARM_HW_BREAKPOINT are
  empirical values not in official macOS SDK

🤖 Generated with [Claude Code](https://claude.com/claude-code)

Co-Authored-By: Claude Opus 4.5 <[email protected]>
Add entitlements required for debugging on Apple Silicon:
- allow-unsigned-executable-memory: for JIT code without MAP_JIT
- disable-library-validation: for loading debugger native modules

🤖 Generated with [Claude Code](https://claude.com/claude-code)

Co-Authored-By: Claude Opus 4.5 <[email protected]>
Tests verify that ARM64 single-stepping uses MDSCR_EL1.SS (bit 0)
via DR6, not CPSR/EFLAGS which has no trap flag on ARM64.

🤖 Generated with [Claude Code](https://claude.com/claude-code)

Co-Authored-By: Claude Opus 4.5 <[email protected]>
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

3 participants