Skip to content
This repository was archived by the owner on Dec 22, 2021. It is now read-only.

Unsigned narrowing #94

Closed
penzn opened this issue Aug 14, 2019 · 5 comments
Closed

Unsigned narrowing #94

penzn opened this issue Aug 14, 2019 · 5 comments

Comments

@penzn
Copy link
Contributor

penzn commented Aug 14, 2019

Widening and narrowing operations were proposed in #21 and added in #89. There is a discussion about what should be the input for unsigned narrowing instructions in #91. The issue is that x86(64) SIMD narrowing instructions treat input as signed, even when output is not. ARM supports both signed-unsigned and unsigned-unsigned narrowing.

I can see value in "signed to unsigned" narrowing for things like RGBA graphics -- results of signed integer arithmetic that would be packed into unsigned RGBA output. For one example, see Sobel operator, other image filters would use it as well. Would "unsigned to unsigned" narrowing be equally useful? Where would it be used?

Another problem is that emulating "unsigned to unsigned" narrowing on x86(64) requires about 4 instructions, which is not a good value proposition for operation that would work on 2*, 4, or 8 lanes, as only the 8-lane version would be faster than scalar.

*if 64 bit lanes are supported

@penzn
Copy link
Contributor Author

penzn commented Aug 14, 2019

My back of the envelope instruction sequence to emulate unsigned-unsigned narrowing in SSE would look something like the following, there are other ways as well, but I can't seem to find anything substantially shorter:

pcmpeq(w/d/q) ;; on the same register, get lanes filled with 0xFF..F
psrl(w/d/q)   ;; shift right by one to get 0x7FF..F
pminu(w/d/q)  ;; replace negative lanes with 0x7FF..F (as they are bigger when cast to unsigned)
packus(dw/wb) ;; get the final result

@arunetm
Copy link
Collaborator

arunetm commented Aug 14, 2019

Can we have a spc PR with these recommendations and seek feedback there? Unless there are use cases benefiting from unsigned to unsigned narrowing, it is justifiable to drop them from the initial proposal.

@penzn
Copy link
Contributor Author

penzn commented Aug 14, 2019

#91 does the needful by stating that the inputs are signed.

@penzn
Copy link
Contributor Author

penzn commented Aug 15, 2019

Sorry, a longer response now. Another way to look at it is that there are two questions: performance of unsigned narrowing with unsigned inputs and the benefits of unsigned narrowing with signed input.

#91 has the spec change to substitute the former with the latter.

Alternatively, we can add the other variant in alongside the ones that have been added. The advantages of having unsigned narrowing with signed inputs are described above, and it can be done in one instruction on both x86 and ARM.

penzn added a commit to penzn/simd that referenced this issue Aug 15, 2019
penzn added a commit to penzn/simd that referenced this issue Aug 15, 2019
@penzn
Copy link
Contributor Author

penzn commented Sep 3, 2019

Closing as we merged the alternative in PR #91

@penzn penzn closed this as completed Sep 3, 2019
Sign up for free to subscribe to this conversation on GitHub. Already have an account? Sign in.
Labels
None yet
Projects
None yet
Development

No branches or pull requests

2 participants