Description
The second index was dropped by #64 (#30) as it proved quite a bit faster to simplify the lex
function. My reasoning was that we have to validate the input anyway and domain names (very likely the bulk of the data) are parsed like strings using a scalar loop to calculate label lengths. However, now that the domain name parser is being optimized (#66, thanks @lemire!), it seems my assumption may be less correct.
Tests showed that the data dependencies and branches in the lex function reduced performance of the scanner by about 40%. The branches and dependencies were required because all indexes were written to the same tape (vector). Now that we know most parse functions will benefit from having the length available (mentioned both by @lemire and @aqrit on multiple occasions), it may worth looking into this again. It will also simplify parsing of fields that allow for both contiguous and quoted character strings (domain names and text).
The new plan is to have two separate tapes, one that contains the start index and one that contains the delimiter index. The counts should generally be the same, so select the bigger number and write that many indexes in simdjson fashion to both vectors (CPU should handle this in parallel). Now, when we encounter a CONTIGUOUS
or QUOTED
token, it's guaranteed the delimiter tape has the delimiting index and we can quickly calculate the length and pop the index of the stack without adding branches. All of this assumes that writing out two indexes does indeed not add too much overhead.
Initial results look promising:
Current main in release mode:
$ time ./zone-bench lex ../../zones/com.zone
Selected target haswell
Lexed 1405612413 tokens
real 0m7.737s
user 0m6.595s
sys 0m1.132s
Quick hack in release mode:
$ time ./zone-bench lex ../../zones/com.zone
Selected target haswell
Lexed 1405612413 tokens
real 0m8.333s
user 0m7.111s
sys 0m1.212s
Note that yes, there's a delay in scanning, but nowhere near 40%. To see if it's actually viable, I'll need to update some functions to leverage the length. The plan is to update RRTYPE and name parsing. We should see an improvement when actually parsing all the data.