Skip to content

Re-evaluate second (length) index (yet again) #95

Closed
@k0ekk0ek

Description

@k0ekk0ek

The second index was dropped by #64 (#30) as it proved quite a bit faster to simplify the lex function. My reasoning was that we have to validate the input anyway and domain names (very likely the bulk of the data) are parsed like strings using a scalar loop to calculate label lengths. However, now that the domain name parser is being optimized (#66, thanks @lemire!), it seems my assumption may be less correct.

Tests showed that the data dependencies and branches in the lex function reduced performance of the scanner by about 40%. The branches and dependencies were required because all indexes were written to the same tape (vector). Now that we know most parse functions will benefit from having the length available (mentioned both by @lemire and @aqrit on multiple occasions), it may worth looking into this again. It will also simplify parsing of fields that allow for both contiguous and quoted character strings (domain names and text).

The new plan is to have two separate tapes, one that contains the start index and one that contains the delimiter index. The counts should generally be the same, so select the bigger number and write that many indexes in simdjson fashion to both vectors (CPU should handle this in parallel). Now, when we encounter a CONTIGUOUS or QUOTED token, it's guaranteed the delimiter tape has the delimiting index and we can quickly calculate the length and pop the index of the stack without adding branches. All of this assumes that writing out two indexes does indeed not add too much overhead.

Initial results look promising:

Current main in release mode:

$ time ./zone-bench lex ../../zones/com.zone
Selected target haswell
Lexed 1405612413 tokens

real	0m7.737s
user	0m6.595s
sys	0m1.132s

Quick hack in release mode:

$ time ./zone-bench lex ../../zones/com.zone
Selected target haswell
Lexed 1405612413 tokens

real	0m8.333s
user	0m7.111s
sys	0m1.212s

Note that yes, there's a delay in scanning, but nowhere near 40%. To see if it's actually viable, I'll need to update some functions to leverage the length. The plan is to update RRTYPE and name parsing. We should see an improvement when actually parsing all the data.

Metadata

Metadata

Assignees

No one assigned

    Labels

    No labels
    No labels

    Type

    No type

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions