Conversation
breaking test fix #1575 - clear() scalars in precompute and get_dtype to prevent reading unitialized union members prevent nulls from hitting strcmp entirely
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Add this suggestion to a batch that can be applied as a single commit.This suggestion is invalid because no changes were made to the code.Suggestions cannot be applied while the pull request is closed.Suggestions cannot be applied while viewing a subset of changes.Only one suggestion per line can be applied in a batch.Add this suggestion to a batch that can be applied as a single commit.Applying suggestions on deleted lines is not supported.You must change the existing code in this line in order to create a valid suggestion.Outdated suggestions cannot be applied.This suggestion has been applied or marked resolved.Suggestions cannot be applied from pending reviews.Suggestions cannot be applied on multi-line comments.Suggestions cannot be applied while the pull request is queued to merge.Suggestion cannot be applied right now. Please check back later.
During column
expressionscomputation, Perspective maintains a linear, growable memory block for end-to-end C-string storage to minimize heap allocation, in the form of at_vocabstruct. In investigating failing tests around string values inexpressions, @sc1f discovered a bug in the expression engine's use oft_vocab's reallocation method, which is called when the memory block is exhausted. In this case, after an expression was in the process of being calculated, any already-allocatedt_scalarstructs interned on thist_vocabblock would contain invalid pointers to the just-deallocated block.We discussed this issue and decided to implement a paged-vocab approach that simply allocates a new
t_vocabon exhaustion rather than reallocating. Interning strings in the first place is to prevent allocating intermediate to calculating a column; so we should minimize these anyway. There is quite a bit of potentially wasted memory here at the expensive of lookup-time in the string block and other issues; however, in the interest of correctness, we chose this implementation due to its simplicity, practicality and lack of worst-case performance corners.An alternate approach, updating the scalars in-place by incrementing their pointers by the offset of the reallocated base address, was not implemented; its a place for potential investigation in the future. I expect this will perform better for expressions with a small set strings, but its performance will degrade linearly as this set expands; whether that cost is of practical consequence versus the latter strategy would need performance profiles to know for sure.