-
-
Notifications
You must be signed in to change notification settings - Fork 21
Re-evaluate second (length) index (yet again) #95
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Comments
Providing the length slows down the scanner a bit (obviously as it has to do more work), but slightly improves the performance of the parser overall. For the same Without length:
With length:
It's consistently faster for |
The second index was dropped by #64 (#30) as it proved quite a bit faster to simplify the
lex
function. My reasoning was that we have to validate the input anyway and domain names (very likely the bulk of the data) are parsed like strings using a scalar loop to calculate label lengths. However, now that the domain name parser is being optimized (#66, thanks @lemire!), it seems my assumption may be less correct.Tests showed that the data dependencies and branches in the lex function reduced performance of the scanner by about 40%. The branches and dependencies were required because all indexes were written to the same tape (vector). Now that we know most parse functions will benefit from having the length available (mentioned both by @lemire and @aqrit on multiple occasions), it may worth looking into this again. It will also simplify parsing of fields that allow for both contiguous and quoted character strings (domain names and text).
The new plan is to have two separate tapes, one that contains the start index and one that contains the delimiter index. The counts should generally be the same, so select the bigger number and write that many indexes in simdjson fashion to both vectors (CPU should handle this in parallel). Now, when we encounter a
CONTIGUOUS
orQUOTED
token, it's guaranteed the delimiter tape has the delimiting index and we can quickly calculate the length and pop the index of the stack without adding branches. All of this assumes that writing out two indexes does indeed not add too much overhead.Initial results look promising:
Current main in release mode:
Quick hack in release mode:
Note that yes, there's a delay in scanning, but nowhere near 40%. To see if it's actually viable, I'll need to update some functions to leverage the length. The plan is to update RRTYPE and name parsing. We should see an improvement when actually parsing all the data.
The text was updated successfully, but these errors were encountered: