SChernykh/CryptonightR#1 (comment)
In fact, a carefully designed ASIC could still outperform GPU by spending more resource/area on the bottlenecks. The memory bandwidth can be greatly improved using more smaller DRAM partitions and parallel memory controllers with address interleaving. The random math cannot utilize GPU’s float point ALUs, tensor cores and certain on chip memory, which occupies much more area than the tiny integer ALUs. An ASIC implementation could just build more simplified integer ALUs, multi-bank RFs with a very simple decoder for better TLP. It is also possible to achieve chained operations with reconfigurable ALU-array.
What's IfDefElse's take on this?
SChernykh/CryptonightR#1 (comment)
What's IfDefElse's take on this?