Reducing switch range on powers of two #70756

DKay7 · 2023-10-31T02:10:47Z

I recently found FIXME:

llvm-project/llvm/lib/Transforms/Utils/SimplifyCFG.cpp

Lines 6795 to 6797 in b0e00ca

    
           // FIXME: It's possible that optimizing a switch on powers of two might also 
        
           // be beneficial - flag values are often powers of two and we could use a CLZ 
        
           // as the key function.

I decided to implement it. My Idea was, as described in FIXME, to check if switch contains only powers of two, and if so, to replace each key with countl_zero(key). When I'm almost done, I ran into a problem that I also need to change the switch condition to countl_zero(condition). I tried to find some simple and fast algorithm to count leading zeros in runtime, but all algorithms are too slow or too complex. Should I try to implement something complicated at all, or is this optimization not worth it?

My current approach looks like that:

  SmallVector<int64_t,4> Values;
  for (const auto &Case : SI->cases())
    Values.push_back(Case.getCaseValue()->getValue().getSExtValue());
  
  if (!IsSwitchOfPowersOfTwo(Values))
    return false;


  Builder.SetInsertPoint(SI);
  auto *Ty = cast<IntegerType>(SI->getCondition()->getType());
  
  SI->replaceUsesOfWith(SI->getCondition(), /* TODO: count leading zeros of SI->getCondition() */);
  
  for (auto Case : SI->cases()) {
    auto OrigValue = Case.getCaseValue()->getValue();
    Case.setValue(
        cast<ConstantInt>(ConstantInt::get(Ty, OrigValue.countl_zero())));
  }

The text was updated successfully, but these errors were encountered:

dtcxzyw · 2023-10-31T03:26:23Z

You should emit a call to @llvm.ctlz.* intrinsic.
See also https://llvm.org/docs/LangRef.html#llvm-ctlz-intrinsic.

BTW, I guess TLI->isCtlzFast/isCheapToSpeculateCtlz may help if you want to avoid introducing slow ctlz implementation on targets without native support for ctlz (e.g., RISC-V without zbb).

DKay7 · 2023-11-01T01:29:13Z

I ran into another problem, which is that we need to convert the condition correctly in order to redirect all non-powers of two to the default case. I solved this with the following bitwise transformation:

a = condition & (condition - 1)
condition |= a >> count_trailing_zeros(a)
new_condition = count_trailing_zeros(condition)

The idea of transformations is to make the number of trailing zeros equal to 0 if the "condition" is not a power of two (since zero case is forbidden in this optimization), and otherwise not to change this number.
So, in the first two lines, the least significant bit is assigned the value 1 if the "condition" is not a power of two.

My question is: isn't it too slow for such optimization, even if we only perform it on systems with built-in support for @llvm.cttz? And if so, do you have any ideas how to do it faster?

Upd:

I have come up with another solution that
seems to me faster:

a = condition & (condition - 1)
condition |= (a == 0)
new_condition = count_trailing_zeros(condition)

But the question remains open.

dtcxzyw · 2023-11-01T14:38:09Z

I ran into another problem, which is that we need to convert the condition correctly in order to redirect all non-powers of two to the default case. I solved this with the following bitwise transformation:
a = condition & (condition - 1)
condition |= a >> count_trailing_zeros(a)
new_condition = count_trailing_zeros(condition)
The idea of transformations is to make the number of trailing zeros equal to 0 if the "condition" is not a power of two (since zero case is forbidden in this optimization), and otherwise not to change this number. So, in the first two lines, the least significant bit is assigned the value 1 if the "condition" is not a power of two.

My question is: isn't it too slow for such optimization, even if we only perform it on systems with built-in support for @llvm.cttz? And if so, do you have any ideas how to do it faster?

Upd:

I have come up with another solution that seems to me faster:
a = condition & (condition - 1)
condition |= (a == 0)
new_condition = count_trailing_zeros(condition)
But the question remains open.

IMHO, I recommend using ctpop(x) == 1 if the backend allows. If you have concerns about the cost of handling default cases, you can start with handling switches that with unreachable default cases.

bool HasDefault =
      !isa<UnreachableInst>(SI->getDefaultDest()->getFirstNonPHIOrDbg());
  if (HasDefault)
    return false;

// Handle switches with power-of-2 case values
...

DKay7 · 2023-11-01T23:01:08Z

I believe I need help with failed test:

llvm-project/llvm/test/CodeGen/AArch64/switch-unreachable-default.ll

Lines 77 to 80 in 801c78d

    
           ; The switch is lowered with a jump table for cases 1--32 and case 64 handled 
        
           ; separately. Even though the default of the switch is unreachable, the 
        
           ; fall-through for the jump table *is* reachable so the range check must be 
        
           ; emitted.

Proposed optimization doesn't check range for value not to fall through the jump table. It's just reduce switch range and only for switches with unreachable default case. Should I add range-check anyway?

) Optimization reduces the range for switches whose cases are positive powers of two by replacing each case with count_trailing_zero(case). Resolves llvm#70756

github-actions bot added the new issue label Oct 31, 2023

dtcxzyw added question A question, not bug report. Check out https://llvm.org/docs/GettingInvolved.html instead! and removed new issue labels Oct 31, 2023

dtcxzyw assigned DKay7 Oct 31, 2023

dtcxzyw added the llvm:optimizations label Oct 31, 2023

dtcxzyw added missed-optimization and removed question A question, not bug report. Check out https://llvm.org/docs/GettingInvolved.html instead! labels Nov 1, 2023

DKay7 mentioned this issue Nov 1, 2023

[SimplifyCFG] Add optimization for switches of powers of two #70977

Merged

dtcxzyw closed this as completed in #70977 Nov 18, 2023

EugeneZelenko added llvm:transforms and removed llvm:optimizations labels Nov 18, 2023

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Reducing switch range on powers of two #70756

Reducing switch range on powers of two #70756

DKay7 commented Oct 31, 2023

dtcxzyw commented Oct 31, 2023

DKay7 commented Nov 1, 2023 •

edited

Loading

dtcxzyw commented Nov 1, 2023

DKay7 commented Nov 1, 2023 •

edited

Loading

Reducing switch range on powers of two #70756

Reducing switch range on powers of two #70756

Comments

DKay7 commented Oct 31, 2023

dtcxzyw commented Oct 31, 2023

DKay7 commented Nov 1, 2023 • edited Loading

dtcxzyw commented Nov 1, 2023

DKay7 commented Nov 1, 2023 • edited Loading

DKay7 commented Nov 1, 2023 •

edited

Loading

DKay7 commented Nov 1, 2023 •

edited

Loading