Skip to content

Vectorize string deserialization #20

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Closed
k0ekk0ek opened this issue Feb 8, 2023 · 1 comment
Closed

Vectorize string deserialization #20

k0ekk0ek opened this issue Feb 8, 2023 · 1 comment
Assignees
Labels
enhancement New feature or request help wanted Extra attention is needed question Further information is requested

Comments

@k0ekk0ek
Copy link
Contributor

k0ekk0ek commented Feb 8, 2023

For character-strings (tokens), simdzone writes two indexes in order to have the length available. However, simdjson copies strings by doing an _mm_loadu_si128 followed by a _mm_storeu_si128. To increase the pointer value, they then also do a _mm_cmpeq_epi8 to check for the end-of-string (closing quote) and escape sequences (backslash). Once we grab a token from the tape, we return ZONE_QUOTED or ZONE_CONTIGUOUS, that information could potentially be used to select one of two "tables" to detect end-of-string. Using this seems like a really nice speedup for copying strings. If so, it may not be necessary to store a second index and may allow for simplification of the critical path, resulting in a potentially nice increase in performance.

@k0ekk0ek k0ekk0ek added enhancement New feature or request question Further information is requested labels Feb 8, 2023
@k0ekk0ek k0ekk0ek changed the title Copying of text (reevaluate necessity of 2nd index too) Copying of strings (reevaluate necessity of 2nd index too) Feb 17, 2023
@k0ekk0ek k0ekk0ek changed the title Copying of strings (reevaluate necessity of 2nd index too) Vectorization of string deserialization (reevaluate necessity of 2nd index too) Feb 20, 2023
@k0ekk0ek k0ekk0ek changed the title Vectorization of string deserialization (reevaluate necessity of 2nd index too) Vectorize string deserialization (reevaluate necessity of 2nd index too) Feb 20, 2023
@k0ekk0ek k0ekk0ek added the help wanted Extra attention is needed label Feb 20, 2023
@k0ekk0ek
Copy link
Contributor Author

Re-evaluation of second index is moved to #30.

@k0ekk0ek k0ekk0ek changed the title Vectorize string deserialization (reevaluate necessity of 2nd index too) Vectorize string deserialization Feb 22, 2023
@k0ekk0ek k0ekk0ek self-assigned this Feb 23, 2023
k0ekk0ek added a commit to k0ekk0ek/simdzone that referenced this issue Mar 1, 2023
Add a fallback implementation to support architectures for which no SIMD
implementation is available yet. Sources have been reorganized to allow
for easy optimization per target. Closes NLnetLabs#4.

Use longjmp for error handling. Closes NLnetLabs#1.

Add zone-bench to allow for convenient benchmarking.

Vectorize string to wire conversion. Closes NLnetLabs#20.

Vectorize name to wire conversion. Closes NLnetLabs#29.
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
enhancement New feature or request help wanted Extra attention is needed question Further information is requested
Projects
None yet
Development

No branches or pull requests

1 participant