Conversation
Add unit context - ctx0 with no configuration at all that reads straight from gstate WIP: use unit ctx in JS, indexed updates/removes still broken WIP: fix JS tests WIP: get_pkeys no longer push_back
column order no longer matters for unit context as long as num_columns == table.num_columns more tests, print inside traversal::step_end
unit context = no pivot/sort/filter/computed, any column order/num of columns read m_delta_pkeys instead of get_delta_pkeys() cleanup
Implement unit context in python add more python tests, make get_row_expanded return bool fix windows build
texodus
approved these changes
Oct 28, 2020
Member
texodus
left a comment
There was a problem hiding this comment.
Looks good! Thanks for the PR!
I can independently confirm the benchmark results, too. Some improvements I'd like to look into in the future from review:
- In Emscripten I believe there is quite a bit of code generation associated with these repeated Context APIs, which leads to larger client assets in JS and WASM. Is there? If so, does embind support virtual dispatch? If not, can we perform a switch within a single dispatch C++ function so we do not need embind to generate the entire context API for each of 4 (eventually 5) context types?
- Contexts could use a cleanup, e.g.
FMODE_SIMPLE_CLAUSES,combiner, etc .. - I concur with e.g.
size()->num_rows(), and IMO this is worth just applying consistently across the board.
| auto columns = view_config->get_columns(); | ||
| auto filter_op = view_config->get_filter_op(); | ||
| auto fterm = view_config->get_fterm(); | ||
| auto computed_columns = view_config->get_computed_columns(); |
Member
There was a problem hiding this comment.
Can we add a t_config which initializes these to the empty values we alreayd know these to be?
| // TODO: int/float/date/datetime pkeys are already sorted here, so if | ||
| // there was a way to assert that `psp_pkey` is a string typed column, | ||
| // we can conditional the sort on whether m_sortby.size() > 0 or if | ||
| // psp_pkey is a string column. |
Member
There was a problem hiding this comment.
I think this still needs to be re-sorted - this std::sort just guarantees overlapping indices will be contiguous.
| * @return t_uindex | ||
| */ | ||
| t_uindex size() const; | ||
| t_uindex num_columns() const; |
Member
There was a problem hiding this comment.
We may as well go all the way and apply this change to Table.size() API in JS and Python!
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Add this suggestion to a batch that can be applied as a single commit.This suggestion is invalid because no changes were made to the code.Suggestions cannot be applied while the pull request is closed.Suggestions cannot be applied while viewing a subset of changes.Only one suggestion per line can be applied in a batch.Add this suggestion to a batch that can be applied as a single commit.Applying suggestions on deleted lines is not supported.You must change the existing code in this line in order to create a valid suggestion.Outdated suggestions cannot be applied.This suggestion has been applied or marked resolved.Suggestions cannot be applied from pending reviews.Suggestions cannot be applied on multi-line comments.Suggestions cannot be applied while the pull request is queued to merge.Suggestion cannot be applied right now. Please check back later.
This PR enhances the performance of certain 0-sided Views in Perspective by 2x-10x, depending on data size.
Each
Viewis backed by a context object, which maintains its own traversal of the underlying master table. This traversal allows the user to read data out based on the order of primary keys, allows for pivots to traverse the underlying datasets, allows for sorts to be applied to the subset of data in a context, etc. In the case where the context maintains a basically trivial traversal, where the order that it reads data out is equivalent to the order data is stored in the underlying table, and when it does not have to apply any sorts, filters, or computed columns, we can skip the creation of a traversal entirely, and avoid the overhead of storing primary keys, sorting them, and converting row indices to primary keys.The unit context is a context object that has no traversal and reads directly from the underlying master table of the gnode. Internally, it offers the same API as all other context types, and all construction around unit contexts occurs in internal code and has no bearing to the public API.
Externally, the unit context offers a massive performance improvement in a large use case—when the View has no pivots, sorts, filters, or computed columns, and the Table does not have a user-specified index. On a Table with a user-specified index, data must be read out in the same order as primary keys, which is different from the underlying stored order in the master table. However, this PR will allow for future improvements to this behavior.
Changelog
Benchmarks
Javascript benchmarks show a massive improvement in View creation time, and slight improvements in serialization time and time to create a delta.
In Python, where I've benchmarked this PR against much larger datasets (5m rows), the performance of
view()is almost equivalent to the performance ofopen_view(), which simply provides a handle to an already-created view on the server. Over large datasets and multiple, parallel clients, the unit context massively reduces the overhead ofview(), resulting in a 5x-10x improvement in performance over a regularctx0.