Skip to content

c-print-results: Different spill order of arguments #291

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Closed
blackgeorge-boom opened this issue Sep 20, 2023 · 2 comments
Closed

c-print-results: Different spill order of arguments #291

blackgeorge-boom opened this issue Sep 20, 2023 · 2 comments

Comments

@blackgeorge-boom
Copy link
Collaborator

blackgeorge-boom commented Sep 20, 2023

void results(char *name, char class, int n1, int n2, int n3, int niter,
					 double t, double mops, char *optype,
					 int passed_verification, char *npbversion,
					 char *compiletime, char *cc, char *clink, char *c_lib,
					 char *c_inc, char *cflags, char *clinkflags)
{
	printf("%c\n", class);
	if (n3 == 0) {
		n3++;
	}
	else
		printf("%4dx%4dx%4d\n", n1, n2, n3);
}

int main() {

	double timecounter = 0.0;

	results(
			"IS", CLASS, 1, 64, 0, 3, timecounter, 1.0,
			"keys ranked", 1, NPBVERSION, COMPILETIME, CC, CLINK,
			C_LIB, C_INC, CFLAGS, CLINKFLAGS);

	return 0;

}

Examining the regalloc debug info, we see that the following corresponding registers are allocated differently. This is because of different live interval weights (calculated as UseDefFreq / (Size + 25*SlotIndex::InstrDist)) that lead to different evictions:

AArch64

%4: 4.38 / (400 + 400) = 4.38 * 1.25 = 5.47
%14: 3.03 / (168 + 400) = 3.03 * 1.76 = 5.33

X86

%4: 4.38 / (512 + 400) = 4.38 * 1.1 = 4.80
%18: 3.03 / (216 + 400) = 3.03 * 1.62 = 4.92
@blackgeorge-boom
Copy link
Collaborator Author

The cause behind the different live interval weights are probably some gaps inside the slot numbering, because of different instruction removals that precede greedy.

AArch64:

16B	  %4:gpr32 = COPY $w4
32B	  %3:gpr32 = COPY $w3
48B	  %2:gpr32 = COPY $w2
64B	  %1:gpr32 = COPY $w1
80B	  ADJCALLSTACKDOWN 0, 0, implicit-def dead $sp, implicit $sp
112B	  $x0 = MOVaddr target-flags(aarch64-page) @main__str__c__, target-flags(aarch64-pageoff, aarch64-nc) @main__str__c__

X86:

0B	bb.0.entry:
	  successors: %bb.2(0x30000000), %bb.1(0x50000000); %bb.2(37.50%), %bb.1(62.50%)
	  liveins: $esi, $edx, $ecx, $r8d
16B	  MOV32mr %stack.0, 1, $noreg, 0, $noreg, $r8d :: (store 4 into %stack.0)
32B	  MOV32mr %stack.1, 1, $noreg, 0, $noreg, $ecx :: (store 4 into %stack.1)
48B	  %23:gr32 = COPY $edx
64B	  %1:gr32 = COPY $esi
80B	  ADJCALLSTACKDOWN64 0, 0, 0, implicit-def dead $rsp, implicit-def dead $eflags, implicit-def dead $ssp, implicit $rsp, implicit $ssp
144B	  $rdi = LEA64r $rip, 1, $noreg, @main__str__c__, $noreg

The last instructions seem further apart in the two architectures, even though they aren't. This could be fixed by "packing" the slot indexes of instructions right before greedy.

@blackgeorge-boom
Copy link
Collaborator Author

An attempt for that is already underway in LLVM, which we will try to add to our LLVM as well:
llvm/llvm-project#67038

blackgeorge-boom added a commit to blackgeorge-boom/llvm-project that referenced this issue Sep 29, 2023
Applies:
llvm#66334
llvm#67038

Packing the slot indexes before register allocation is useful for us
because it evens the gaps between slots after all the optimization
passes that happen before `greedy` and may have removed a different number
of instructions between AArch64 and X86. This leads to different slot gaps
and, hence, slightly different regalloc in some cases.

We backport the above patches for our LLVM, with the main difference
being the absence of some convenient data structure iterators, which we
had to convert to be compatible with our ADT infrastructure.

We add the `-pack-indexes` flag to activate this.

Addressses: systems-nuts/unifico#291
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

1 participant