Skip to content

Re-evaluate second (length) index (yet again) #95

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Closed
k0ekk0ek opened this issue Aug 28, 2023 · 1 comment
Closed

Re-evaluate second (length) index (yet again) #95

k0ekk0ek opened this issue Aug 28, 2023 · 1 comment

Comments

@k0ekk0ek
Copy link
Contributor

k0ekk0ek commented Aug 28, 2023

The second index was dropped by #64 (#30) as it proved quite a bit faster to simplify the lex function. My reasoning was that we have to validate the input anyway and domain names (very likely the bulk of the data) are parsed like strings using a scalar loop to calculate label lengths. However, now that the domain name parser is being optimized (#66, thanks @lemire!), it seems my assumption may be less correct.

Tests showed that the data dependencies and branches in the lex function reduced performance of the scanner by about 40%. The branches and dependencies were required because all indexes were written to the same tape (vector). Now that we know most parse functions will benefit from having the length available (mentioned both by @lemire and @aqrit on multiple occasions), it may worth looking into this again. It will also simplify parsing of fields that allow for both contiguous and quoted character strings (domain names and text).

The new plan is to have two separate tapes, one that contains the start index and one that contains the delimiter index. The counts should generally be the same, so select the bigger number and write that many indexes in simdjson fashion to both vectors (CPU should handle this in parallel). Now, when we encounter a CONTIGUOUS or QUOTED token, it's guaranteed the delimiter tape has the delimiting index and we can quickly calculate the length and pop the index of the stack without adding branches. All of this assumes that writing out two indexes does indeed not add too much overhead.

Initial results look promising:

Current main in release mode:

$ time ./zone-bench lex ../../zones/com.zone
Selected target haswell
Lexed 1405612413 tokens

real	0m7.737s
user	0m6.595s
sys	0m1.132s

Quick hack in release mode:

$ time ./zone-bench lex ../../zones/com.zone
Selected target haswell
Lexed 1405612413 tokens

real	0m8.333s
user	0m7.111s
sys	0m1.212s

Note that yes, there's a delay in scanning, but nowhere near 40%. To see if it's actually viable, I'll need to update some functions to leverage the length. The plan is to update RRTYPE and name parsing. We should see an improvement when actually parsing all the data.

@k0ekk0ek
Copy link
Contributor Author

Providing the length slows down the scanner a bit (obviously as it has to do more work), but slightly improves the performance of the parser overall. For the same .com zone (parsing, not just lexing like above).

Without length:

$ time ./zone-bench parse ../../zones/com.zone
Selected target haswell
Parsed 341535548 records

real	0m16.528s
user	0m15.348s
sys	0m1.129s

With length:

$ time ./zone-bench parse ../../zones/com.zone
Selected target haswell
Parsed 341535548 records

real	0m16.154s
user	0m14.922s
sys	0m1.200s

It's consistently faster for .se (more DNSSEC data) too. I'll open a draft PR asap (it needs polishing), but it's a speedup and hopefully further improves @lemire's new approach as we don't have to determine the length first.

k0ekk0ek added a commit to k0ekk0ek/simdzone that referenced this issue Sep 4, 2023
k0ekk0ek added a commit to k0ekk0ek/simdzone that referenced this issue Sep 14, 2023
k0ekk0ek added a commit to k0ekk0ek/simdzone that referenced this issue Sep 14, 2023
k0ekk0ek added a commit to k0ekk0ek/simdzone that referenced this issue Sep 14, 2023
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

1 participant