Vectorize string deserialization #20
Labels
enhancement
New feature or request
help wanted
Extra attention is needed
question
Further information is requested
For character-strings (tokens), simdzone writes two indexes in order to have the length available. However, simdjson copies strings by doing an
_mm_loadu_si128
followed by a_mm_storeu_si128
. To increase the pointer value, they then also do a_mm_cmpeq_epi8
to check for the end-of-string (closing quote) and escape sequences (backslash). Once we grab a token from the tape, we returnZONE_QUOTED
orZONE_CONTIGUOUS
, that information could potentially be used to select one of two "tables" to detect end-of-string. Using this seems like a really nice speedup for copying strings. If so, it may not be necessary to store a second index and may allow for simplification of the critical path, resulting in a potentially nice increase in performance.The text was updated successfully, but these errors were encountered: