Skip to content

ARROW-17584: [Go] Use unsafe.Slice from Go 1.17 #14026

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Closed
wants to merge 1 commit into from

Conversation

tschaub
Copy link
Contributor

@tschaub tschaub commented Sep 1, 2022

The current use of reflect.SliceHeader fields makes the code less portable than it could be. For example, it is not possible to build with TinyGo where the slice header fields have different types.

This change proposes using unsafe.Slice from Go 1.17 as an alternative.

The tests are still failing for me, so I'm guessing that the changes aren't yet entirely correct. I'll open as a draft PR in hopes of working out any issues.

cc @zeroshade

@github-actions
Copy link

github-actions bot commented Sep 1, 2022

@github-actions
Copy link

github-actions bot commented Sep 1, 2022

⚠️ Ticket has not been started in JIRA, please click 'Start Progress'.

@zeroshade
Copy link
Member

@tschaub you're going to have to update the github workflows to run using go1.17 instead of go1.16 for this to work.

@tschaub
Copy link
Contributor Author

tschaub commented Sep 1, 2022

@tschaub you're going to have to update the github workflows to run using go1.17 instead of go1.16 for this to work.

Thanks, @zeroshade. Updated.

@zeroshade
Copy link
Member

sigh It looks like the issue is that we actually sometimes call this with a 0 length slice and my suggestion of avoiding the reflect.SliceHeader was incorrect because of this.

Can you try shifting back to grabbing the slice header of the incoming slice and then using the .Data as the uintptr passed to unsafe.Slice ?

@tschaub
Copy link
Contributor Author

tschaub commented Sep 1, 2022

Will do. It feels like the length may then need adjusting (in cases where the slice is not aligned to the start of the array). But I may be thinking about it wrong.

@tschaub
Copy link
Contributor Author

tschaub commented Sep 1, 2022

@zeroshade - I updated things to use the h.Data from the slice header. I'm seeing the same test failures as before (with 9978861). I'll dig a bit deeper.

@zeroshade
Copy link
Member

@tschaub I figured out the issue.

If you look at the original versions, we ensured that the capacity was h.Cap/(nbytes) and Len was h.Len/(nbytes) In your implementation here you're only using the length. This is specifically relevant because of how concatenation handles offset types like variable length binary, where we gather the buffers sized to the length of the Array, but then expand it by one to ensure it encompasses the final offset.

Essentially instead of this:

return unsafe.Slice((*int32)(unsafe.Pointer(h.Data)), len(b)/Int32SizeBytes)

Do this:

return unsafe.Slice((*int32)(unsafe.Pointer(h.Data)), cap(b)/Int32SizeBytes)[:len(b)/Int32SizeBytes]

Note the usage of cap and the slice of it at the end.

@tschaub
Copy link
Contributor Author

tschaub commented Sep 8, 2022

Thanks for digging into this, @zeroshade.

I pushed a commit that doesn't make use of reflect.SliceHeader, but it does require an extra check for zero length slices. Considering just the bytesToUint64 function, I think the two variants look like this:

func bytesToUint64V1(b []byte) []uint64 {
	if len(b) == 0 {
		return nil
	}

	return unsafe.Slice((*uint64)(unsafe.Pointer(&b[0])), cap(b)/uint64SizeBytes)[:len(b)/uint64SizeBytes]
}

func bytesToUint64V2(b []byte) []uint64 {
	h := (*reflect.SliceHeader)(unsafe.Pointer(&b))

	return unsafe.Slice((*uint64)(unsafe.Pointer(h.Data)), cap(b)/uint64SizeBytes)[:len(b)/uint64SizeBytes]
}

Maybe you'd only notice the difference if you were passing zero length slices with non-zero capacity. Let me know if you think V2 above would be preferable.

@zeroshade
Copy link
Member

@tschaub the V2 version is preferable and should solve your unit test issues

@tschaub
Copy link
Contributor Author

tschaub commented Sep 8, 2022

Thanks, @zeroshade. I pushed an updated commit with the V2 flavor.

I have been running make test - which passed with the V1 flavor. But I see that that target doesn't cover everything run in CI.

@tschaub
Copy link
Contributor Author

tschaub commented Sep 9, 2022

It looks like there are the same test failures with the V1 implementation (4fc2e1f) and the V2 implementation (5b90c88).

@zeroshade
Copy link
Member

@tschaub Not quite, notice the difference in the error messages and the fact that several of the versions of the tests succeeded. It looks like this is an interesting checkptr error, i'll try to take a look at it and see what i can figure out.

@zeroshade
Copy link
Member

Ah, i see the issue. Stay with me here....

The CI runs our tests using the -race option which automatically enables the checkptr option to validate pointer usage. In this case the slice being passed to bytesToUint64 is less than 8 bytes but we're using unsafe.Slice to assert it as a *uint64 and treat it as such. This ends up invoking checkptr when the option is set (since it is) and it looks at the whole 8 bytes. If it turns out that the place it ended up in memory is straddling multiple memory allocations (since we're less than 8 bytes) we end up failing the checkptr and it crashes.

My preferred solution would this:

func CountSetBits(buf []byte, offset, n int) int {
    if offset > 0 {
        return countSetBitsWithOffset(buf, offset, n)
    }

   count := 0
   uint64Bytes := n / uint64SizeBits * 8
   if uint64Bytes > 0 {  // <-- add this check to wrap the bytesToUint64 call so we don't call it with less than 8 bytes
       for _, v := range bytesToUint64(buf[:uint64Bytes]) {
             count += bits.OnesCount64(v)
       }
   }

  ...

Adding that if check to the function makes checkptr happy as we stop trying to look at one byte and treat it like a uint64 pointer.

@tschaub
Copy link
Contributor Author

tschaub commented Sep 9, 2022

Thanks for the explanation, @zeroshade. Latest commit adds that condition to CountSetBits.

@zeroshade
Copy link
Member

Wow it really doesn't like you 😦 I'll see if i can dig into these failures.

@zeroshade
Copy link
Member

@tschaub looks like a similar issue in countSetBitsWithOffset which, at that point, is both places that it is called sigh.

I guess the solution is just to push the condition down into bytesToUint64 directly. Just add

if h.Cap < uint64SizeBytes {
    return nil
}

there instead of the if you added to CountSetBits previously

@tschaub
Copy link
Contributor Author

tschaub commented Sep 9, 2022

if h.Cap < uint64SizeBytes {
    return nil
}

This would only work where h.Cap is an int (which probably works everywhere except TinyGo). We could int(h.Cap) - though I imagine that would seem mysterious to future readers (perhaps a comment could explain).

Would a check for cap(b) do it? Let me know which way you'd like to go.

@zeroshade
Copy link
Member

sure, cap(b) should probably work fine, mainly i was thinking of h.Cap since it was already there, but you can do the cap(b) check before you even grab the slice header.

@tschaub
Copy link
Contributor Author

tschaub commented Sep 9, 2022

Ok, let's see what's next.

@zeroshade
Copy link
Member

woohoo, the integration test failure is not related to this change and is being tracked by a different PR. so this looks good. When you're ready make this an actual (instead of draft) PR and i'll give it a last lookover.

@tschaub tschaub marked this pull request as ready for review September 9, 2022 22:05
@tschaub
Copy link
Contributor Author

tschaub commented Sep 9, 2022

@zeroshade - Thank you for your work on this. I appreciate all the digging you did to resolve the issues.

I fear that it might not be achievable to use TinyGo to build WASM binaries using arrow. I've found additional incompatibilities in github.com/apache/thrift - where TinyGo's implementation of the net package doesn't cover all thrift's usage.

So if you feel like the change in this branch is not the direction you want to go, or if you think it may introduce additional issues, please don't feel pressured to merge it in. I'd be happy to be able to build small WASM binaries that used arrow, but I'm not 100% sure it will be possible.

@zeroshade
Copy link
Member

zeroshade commented Sep 11, 2022

@tschaub any particular reason you're using TinyGo over just using go's GOOS=js GOARCH=wasm and building a wasm binary that way? I was able to build a simple wasm binary using Arrow Go by doing:

GOOS=js GOARCH=wasm go build -tags noasm -o arrow.wasm ./main.go

Sure it won't be as small as TinyGo might be able to create, but the arrow.wasm this created was only around 8M uncompressed. Compressing with gzip --best dropped it down to about 2M.

@tschaub
Copy link
Contributor Author

tschaub commented Sep 14, 2022

@zeroshade - Yeah, I'll stick with the go build for now. Hoping at some point to get a more web-friendly build out of TinyGo.

@amol-
Copy link
Member

amol- commented Mar 30, 2023

Closing because it has been untouched for a while, in case it's still relevant feel free to reopen and move it forward 👍

@amol- amol- closed this Mar 30, 2023
@tschaub tschaub deleted the unsafe-slice branch March 30, 2023 17:14
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

Successfully merging this pull request may close these issues.

3 participants