Skip to content

Buffer.byteLength and simdutf #66

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Closed
ronag opened this issue Mar 18, 2023 · 4 comments
Closed

Buffer.byteLength and simdutf #66

ronag opened this issue Mar 18, 2023 · 4 comments

Comments

@ronag
Copy link
Member

ronag commented Mar 18, 2023

Buffer.byteLength can be implemented more efficiently with simdutf

@joyeecheung
Copy link
Member

To use simdutf to calculate the byte length of a string we need to access its content first, which usually requires copying its content to an address provided by us, I wonder if the copying already defeats any speedup brought by simdutf. If simdutf provides latin1 to utf8 transcoding/length calculation we might still be able to use that on known one-byte strings where the the speedup from SMID outweighs the copying overhead, but for strings that might not be one-byte (V8 only knows after traversing the string) the penalty of copying and even letting V8 to transcode a not-so-certain one-byte string into UTF16 first before copying the contents for us is probably too significant.

@ronag
Copy link
Member Author

ronag commented Mar 22, 2023

I don't understand. The following code should be able to use simdutf without any additional copying. You already have direct access to the data through source.data.

uint32_t FastByteLengthUtf8(Local<Value> receiver,
                            const v8::FastOneByteString& source) {
  uint32_t result = 0;
  uint32_t length = source.length;
  const uint8_t* data = reinterpret_cast<const uint8_t*>(source.data);
  for (uint32_t i = 0; i < length; ++i) {
    result += (data[i] >> 7);
  }
  result += length;
  return result;
}

https://github.com/joyeecheung/node/blob/ee1ce1872ff38fc5a2fd3b2e3a97600e5d6b2e14/src/node_buffer.cc#L798-L808

@joyeecheung
Copy link
Member

joyeecheung commented Mar 27, 2023

That's only hit in the case where the string is known to be one-byte and sequential (on the premise that fast API calls are hit). But even in this case, we can't use simdutf (yet), because simdutf does not yet have latin1-to-utf8 transcoding (it should have one soon, but for now, not yet)

@ronag
Copy link
Member Author

ronag commented Apr 3, 2023

Probably duplicate #52

@ronag ronag closed this as completed Apr 3, 2023
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

2 participants