-
-
Notifications
You must be signed in to change notification settings - Fork 18.5k
wide table support #10243
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
wide table support #10243
Conversation
Actually I just fixed the issue, but I still need to know if the approach is sound (also i need to write tests) |
don't pickle things - just store as a direct carray |
don't change attribute names - leave them the same |
Wait I'm confused - I didn't think I changed any attribute names? What I meant was that some attrs are stored on the group and some on the table, but I need a place to put the carrays. I can create an attr group under the group housing the Table, but the attrs on the table are on a table dataset, there isn't a place to keep those separately, I was just going to stick them in the same attr group. Also, I don't know how to store it as a direct carray. non_index_axes is something like this:
|
you need to use a |
OK I think I get that what about values_block_kind_0 - are you ok with storing that under Is there a particular reason why values_block_kind_0 is stored on the table, rather than on the group housing the table? |
these attributes needs to be stored on the actual table node (this is actually not a table, but a |
In this case, values_block_kind_0 is a list of column names, and so if we only make the change to non_index_axes, we won't be able to support wide tables, as we'll just push the 64k limit problem to values_block_kind_x If you don't want an attrs group created, where do you want the carrays to be created? I have it setup now where the carray based attrs are created inside the attrs group, but for backwards compatability we read them from the old location if the carray does not exist |
no no no. the point of a |
I understand that the carray is a value - under which path do you want it to live? |
I think it should live UNDER the Table (so its again hidden), sorry you do need to create this group. |
I'm sorry - I understand the confusion now - I thought that the table was a dataset - it is not, it is a group containing datasets |
yes, you are free to put additional attributes there (e.g. the axis and so on, whatever). you just need to be able to back-compat read this |
@hhuuggoo want to update this? |
I do - but I won't be able to in the short term - do you want to close and I can re-submit? |
@jreback I re-did this doing wht you suggested
|
@@ -11,7 +11,8 @@ | |||
import itertools | |||
import warnings | |||
import os | |||
|
|||
from six import string_types |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
use compat.string_types
@hhuuggoo approach looks just fine. a couple of minor corrections. also pls generate a couple of tests hdf files using 0.16.2 for back-compat testing, they go in |
cool thanks! I'll finish and update tomorrow |
@jreback so these tests are failing because on a second write, the carrays I created, we attempt to write/create again. I can mirror the semantics of the hdf5 attrs, where we basically overwrite the old value - but I wanted to verify that this was the correct thing to do? thanks |
these are normally written once only |
@hhuuggoo can you rebase / update |
closing, but if you want to update, pls reopen @hhuuggoo so close!!! |
@hhuuggoo & @jreback, I've been watching as I also have need of this. I've made the suggested changes. Currently two tests fail, test_append_frame_column_oriented and test_column_multiindex. It looks like the first errors on dtype=object and the second cannot match existing table structure. I presume the former error is a limitation of carray (but haven't looked) and the latter appears to be the tolist() extraction of the carray, which needs to generate tuples for a multiIndex. |
why don't u put up your changes as and I'll have a look |
I submitted a pull request. Legacy test included. The code still needs some cleanup and as I noted in the pull request one test still fails. Also, you suggested in your comments above that one could get _handle from the table, but I don't see any way to find that via self.table. So for now I have a the handle property as a placeholder. |
addresses #6245
@jreback this isn't quite working yet for older arrays - the reason is because I have to address values_block_kind_0 and things like that. Currently what I'm doing is creating a group under
self.group
of the Table class, named 'attr', which has all the attributes as CArrays. However the issue is that values_block_kind_0 is actually written as an attr of the table dataset, not the group. I can definitely setup the code to read that stuff for backwards compatability, but is it ok if I put all carray attrs as datasets under the attr group, or is there some namespace collisioning that I'm not aware of?