Inefficient integer multiplications

An ASIC will be 4 times more efficient with these two operations because a, b are 32-bit integers:
```
    case 1: return a * b;
    case 2: return mul_hi(a, b);
```

32-bit integer multiplications are inefficient on GPUs because GPUs only have 24-bit wide data path for multiplication. 32-bit MUL is 4 times slower than 24-bit MUL. It's better to use mul24 here. 

Side note: it's a shame that OpenCL still doesn't have mul24_hi, but CUDA has it.


Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Inefficient integer multiplications #16

Metadata

Assignees

Labels

Projects

Milestone

Relationships

Development

Inefficient integer multiplications #16

Description

Metadata

Metadata

Assignees

Labels

Projects

Milestone

Relationships

Development

Issue actions