Fix partial updates in Python using dicts#1298
Merged
Conversation
Add test to repro #1268 WIP add tests
b0f69ad to
bc85acd
Compare
texodus
approved these changes
Jan 28, 2021
Member
texodus
left a comment
There was a problem hiding this comment.
Looks good! Thanks for the PR!
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Add this suggestion to a batch that can be applied as a single commit.This suggestion is invalid because no changes were made to the code.Suggestions cannot be applied while the pull request is closed.Suggestions cannot be applied while viewing a subset of changes.Only one suggestion per line can be applied in a batch.Add this suggestion to a batch that can be applied as a single commit.Applying suggestions on deleted lines is not supported.You must change the existing code in this line in order to create a valid suggestion.Outdated suggestions cannot be applied.This suggestion has been applied or marked resolved.Suggestions cannot be applied from pending reviews.Suggestions cannot be applied on multi-line comments.Suggestions cannot be applied while the pull request is queued to merge.Suggestion cannot be applied right now. Please check back later.
Fixes #1268 by refactoring the data loading path for dict-shaped datasets.
There are some inconsistencies between how
Nonevalues are handled in JS and in Python. In JS,nullandundefinedin a dataset mean different things in a partial update:undefinedresolves to a no-op, so values that areundefinedwill not be modified or overwritten. However, if the value is replaced with null:To remedy this, the Python library uses a method called
_has_column, which ascertains whether a given row contains a given column name. If the column name does not exist, it is a no-op similar toundefined, but if the column name exists then it will be overwritten with the new value.Thus, a row-oriented update of
[{a: undefined}]that would work as a no-op update in Javascript does not work in Python. In the column-oriented case, #1268 illustrates the issue where a missing column in a columnar dataset would be treated as an overwrite, and not a no-op. This PR fixes the behavior by treating missing columns in columnar datasets as no-ops, and has been tested. This behavior is now equivalent between JS and Python.There will always remain Python-specific idiosyncrasies around partial updates. For example, this update works in JS:
but it would not work in Python, because one cannot satisfy the "all columns must have the same # of rows" requirement AND mark a
column[row]as a no-op, and there is no way to reconcile this behavior: