Skip to content

Inefficient integer multiplications #16

@SChernykh

Description

@SChernykh

An ASIC will be 4 times more efficient with these two operations because a, b are 32-bit integers:

    case 1: return a * b;
    case 2: return mul_hi(a, b);

32-bit integer multiplications are inefficient on GPUs because GPUs only have 24-bit wide data path for multiplication. 32-bit MUL is 4 times slower than 24-bit MUL. It's better to use mul24 here.

Side note: it's a shame that OpenCL still doesn't have mul24_hi, but CUDA has it.

Metadata

Metadata

Assignees

No one assigned

    Labels

    No labels
    No labels

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions