Skip to content

Reducing switch range on powers of two #70756

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Closed
DKay7 opened this issue Oct 31, 2023 · 4 comments · Fixed by #70977
Closed

Reducing switch range on powers of two #70756

DKay7 opened this issue Oct 31, 2023 · 4 comments · Fixed by #70977

Comments

@DKay7
Copy link
Contributor

DKay7 commented Oct 31, 2023

I recently found FIXME:

// FIXME: It's possible that optimizing a switch on powers of two might also
// be beneficial - flag values are often powers of two and we could use a CLZ
// as the key function.

I decided to implement it. My Idea was, as described in FIXME, to check if switch contains only powers of two, and if so, to replace each key with countl_zero(key). When I'm almost done, I ran into a problem that I also need to change the switch condition to countl_zero(condition). I tried to find some simple and fast algorithm to count leading zeros in runtime, but all algorithms are too slow or too complex. Should I try to implement something complicated at all, or is this optimization not worth it?

My current approach looks like that:

  SmallVector<int64_t,4> Values;
  for (const auto &Case : SI->cases())
    Values.push_back(Case.getCaseValue()->getValue().getSExtValue());
  
  if (!IsSwitchOfPowersOfTwo(Values))
    return false;


  Builder.SetInsertPoint(SI);
  auto *Ty = cast<IntegerType>(SI->getCondition()->getType());
  
  SI->replaceUsesOfWith(SI->getCondition(), /* TODO: count leading zeros of SI->getCondition() */);
  
  for (auto Case : SI->cases()) {
    auto OrigValue = Case.getCaseValue()->getValue();
    Case.setValue(
        cast<ConstantInt>(ConstantInt::get(Ty, OrigValue.countl_zero())));
  }
@dtcxzyw
Copy link
Member

dtcxzyw commented Oct 31, 2023

You should emit a call to @llvm.ctlz.* intrinsic.
See also https://llvm.org/docs/LangRef.html#llvm-ctlz-intrinsic.

BTW, I guess TLI->isCtlzFast/isCheapToSpeculateCtlz may help if you want to avoid introducing slow ctlz implementation on targets without native support for ctlz (e.g., RISC-V without zbb).

@dtcxzyw dtcxzyw added question A question, not bug report. Check out https://llvm.org/docs/GettingInvolved.html instead! and removed new issue labels Oct 31, 2023
@DKay7
Copy link
Contributor Author

DKay7 commented Nov 1, 2023

I ran into another problem, which is that we need to convert the condition correctly in order to redirect all non-powers of two to the default case. I solved this with the following bitwise transformation:

a = condition & (condition - 1)
condition |= a >> count_trailing_zeros(a)
new_condition = count_trailing_zeros(condition)

The idea of transformations is to make the number of trailing zeros equal to 0 if the "condition" is not a power of two (since zero case is forbidden in this optimization), and otherwise not to change this number.
So, in the first two lines, the least significant bit is assigned the value 1 if the "condition" is not a power of two.

My question is: isn't it too slow for such optimization, even if we only perform it on systems with built-in support for @llvm.cttz? And if so, do you have any ideas how to do it faster?

Upd:

I have come up with another solution that
seems to me faster:

a = condition & (condition - 1)
condition |= (a == 0)
new_condition = count_trailing_zeros(condition)

But the question remains open.

@dtcxzyw
Copy link
Member

dtcxzyw commented Nov 1, 2023

I ran into another problem, which is that we need to convert the condition correctly in order to redirect all non-powers of two to the default case. I solved this with the following bitwise transformation:

a = condition & (condition - 1)
condition |= a >> count_trailing_zeros(a)
new_condition = count_trailing_zeros(condition)

The idea of transformations is to make the number of trailing zeros equal to 0 if the "condition" is not a power of two (since zero case is forbidden in this optimization), and otherwise not to change this number. So, in the first two lines, the least significant bit is assigned the value 1 if the "condition" is not a power of two.

My question is: isn't it too slow for such optimization, even if we only perform it on systems with built-in support for @llvm.cttz? And if so, do you have any ideas how to do it faster?

Upd:

I have come up with another solution that seems to me faster:

a = condition & (condition - 1)
condition |= (a == 0)
new_condition = count_trailing_zeros(condition)

But the question remains open.

IMHO, I recommend using ctpop(x) == 1 if the backend allows. If you have concerns about the cost of handling default cases, you can start with handling switches that with unreachable default cases.

bool HasDefault =
      !isa<UnreachableInst>(SI->getDefaultDest()->getFirstNonPHIOrDbg());
  if (HasDefault)
    return false;

// Handle switches with power-of-2 case values
...

@dtcxzyw dtcxzyw added missed-optimization and removed question A question, not bug report. Check out https://llvm.org/docs/GettingInvolved.html instead! labels Nov 1, 2023
@DKay7
Copy link
Contributor Author

DKay7 commented Nov 1, 2023

I believe I need help with failed test:

; The switch is lowered with a jump table for cases 1--32 and case 64 handled
; separately. Even though the default of the switch is unreachable, the
; fall-through for the jump table *is* reachable so the range check must be
; emitted.

Proposed optimization doesn't check range for value not to fall through the jump table. It's just reduce switch range and only for switches with unreachable default case. Should I add range-check anyway?

swift-ci pushed a commit to swiftlang/llvm-project that referenced this issue Nov 18, 2023
)

Optimization reduces the range for switches whose cases are positive powers
of two by replacing each case with count_trailing_zero(case).

Resolves llvm#70756
sr-tream pushed a commit to sr-tream/llvm-project that referenced this issue Nov 20, 2023
)

Optimization reduces the range for switches whose cases are positive powers
of two by replacing each case with count_trailing_zero(case).

Resolves llvm#70756
zahiraam pushed a commit to zahiraam/llvm-project that referenced this issue Nov 20, 2023
)

Optimization reduces the range for switches whose cases are positive powers
of two by replacing each case with count_trailing_zero(case).

Resolves llvm#70756
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

Successfully merging a pull request may close this issue.

3 participants