Description
- Implement
WavePrefixCountBits
clang builtin, - Link
WavePrefixCountBits
clang builtin withhlsl_intrinsics.h
- Add sema checks for
WavePrefixCountBits
toCheckHLSLBuiltinFunctionCall
inSemaChecking.cpp
- Add codegen for
WavePrefixCountBits
toEmitHLSLBuiltinExpr
inCGBuiltin.cpp
- Add codegen tests to
clang/test/CodeGenHLSL/builtins/WavePrefixCountBits.hlsl
- Add sema tests to
clang/test/SemaHLSL/BuiltIns/WavePrefixCountBits-errors.hlsl
- Create the
int_dx_WavePrefixCountBits
intrinsic inIntrinsicsDirectX.td
- Create the
DXILOpMapping
ofint_dx_WavePrefixCountBits
to136
inDXIL.td
- Create the
WavePrefixCountBits.ll
andWavePrefixCountBits_errors.ll
tests inllvm/test/CodeGen/DirectX/
- Create the
int_spv_WavePrefixCountBits
intrinsic inIntrinsicsSPIRV.td
- In SPIRVInstructionSelector.cpp create the
WavePrefixCountBits
lowering and map it toint_spv_WavePrefixCountBits
inSPIRVInstructionSelector::selectIntrinsic
. - Create SPIR-V backend test case in
llvm/test/CodeGen/SPIRV/hlsl-intrinsics/WavePrefixCountBits.ll
DirectX
DXIL Opcode | DXIL OpName | Shader Model | Shader Stages |
---|---|---|---|
136 | WavePrefixBitCount | 6.0 | ('library', 'compute', 'amplification', 'mesh', 'pixel', 'vertex', 'hull', 'domain', 'geometry', 'raygeneration', 'intersection', 'anyhit', 'closesthit', 'miss', 'callable', 'node') |
SPIR-V
OpGroupNonUniformBallotBitCount:
Description:
Result is the number of bits that are set to 1 in Value, considering
only the bits in Value required to represent all bits of the
group's invocations.
Result Type must be a scalar of integer type, whose
Signedness operand is 0.
Execution is a Scope that identifies the group of
invocations affected by this command. It must be Subgroup.
The identity I for Operation is 0.
Value must be a vector of four components of integer
type scalar, whose Width operand is 32 and whose
Signedness operand is 0.
Value is a set of bitfields where the first invocation is represented
in the lowest bit of the first vector component and the last (up to the
size of the group) is the higher bit number of the last bitmask needed
to represent all bits of the group invocations.
Capability:
GroupNonUniformBallot
Missing before version 1.3.
Word Count | Opcode | Results | Operands | |||
---|---|---|---|---|---|---|
6 |
342 |
<id> |
Scope <id> |
Group Operation |
<id> |
Test Case(s)
Example 1
//dxc WavePrefixCountBits_test.hlsl -T lib_6_8 -enable-16bit-types -O0
export uint fn(bool p1) {
return WavePrefixCountBits(p1);
}
HLSL:
Returns the sum of all the specified boolean variables set to true across all active lanes with indices smaller than the current lane.
Syntax
uint WavePrefixCountBits(
bool bBit
);
Parameters
-
bBit
-
The specified boolean variables.
Return value
The sum of all the specified Boolean variables set to true across all active lanes with indices smaller than the current lane.
Remarks
This function is supported from shader model 6.0 in all shader stages.
Examples
The following code describes how to implement a compacted write to an ordered stream where the number of elements written per lane is either 1 or 0.
bool bDoesThisLaneHaveAnAppendItem = <expr>;
// compute number of items to append for the whole wave
uint laneAppendOffset = WavePrefixCountBits( bDoesThisLaneHaveAnAppendItem );
uint appendCount = WaveActiveCountBits( bDoesThisLaneHaveAnAppendItem);
// update the output location for this whole wave
uint appendOffset;
if ( WaveIsFirstLane () )
{
// this way, we only issue one atomic for the entire wave, which reduces contention
// and keeps the output data for each lane in this wave together in the output buffer
InterlockedAdd(bufferSize, appendCount, appendOffset);
}
appendOffset = WaveReadLaneFirst( appendOffset ); // broadcast value
appendOffset += laneAppendOffset; // and add in the offset for this lane
buffer[appendOffset] = myData; // write to the offset location for this lane
See also
Metadata
Metadata
Assignees
Type
Projects
Status