Skip to content

ENH: ndim tables in HDFStore (allow indexables to be passed in) #2497

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Merged
merged 13 commits into from
Dec 13, 2012

Conversation

jreback
Copy link
Contributor

@jreback jreback commented Dec 12, 2012

  • ENH: changes axes_to_index keyword in table creation to axes
  • ENH: allow passing of numeric or named axes (e.g. 0 or 'minor_axis') in axes
  • BUG: create_axes now checks for current table scheme; raises if this indexing scheme is violated
  • TST: added many p4d tests for appending/selection/partial selection/and axis permuation
  • TST: added addition Term tests to include p4d
  • BUG: add eq operators to IndexCol/DataCol/Table to comparisons
  • DOC: updated docs with Panel4D saving & issues relating to threading
  • ENH: supporting non-regular indexables:
    • can index a Panel4D on say [labels,major_axis,minor_axis], rather
      than the default of [items,major_axis,minor_axis]
    • support column oriented DataFrames (e.g. queryable by the columns
  • REMOVED: support metadata serializing via 'meta' attribute on an object
  • ENH: supporting a data versioning scheme, warns if this is non-existant or old (only if trying a query)
    as soon as the data is written again, this won't be warned again
  • BUG: ixed PyTables version checking (and now correctly raises the error msg)
  • BUG/DOC: added min_itemsize for strings in the values block (added TST and BUG fixes)
  • BUG: handle int/float in indexes in the Term specification correctly
  • ENH/DOC: allow changing of the table indexing scheme by calling create_table_index with new parameters

turned off memory testing: enable big_table test to turn it on

@jreback
Copy link
Contributor Author

jreback commented Dec 12, 2012

@wesm should be the last one!

@jreback
Copy link
Contributor Author

jreback commented Dec 12, 2012

hmm....couple of failing tests...hold off for a bit

@ghost
Copy link

ghost commented Dec 12, 2012

I opened a PR to have travis test with pytables installed (on py2.x),
you might want to install the travis hook on your fork.

@jreback
Copy link
Contributor Author

jreback commented Dec 12, 2012

thanks
I'll take a look

On Dec 12, 2012, at 10:03 AM, y-p [email protected] wrote:

I opened a PR to have travis test with pytables installed (on py2.x),
you might want to install the travis hook on your fork.


Reply to this email directly or view it on GitHub.

@jreback
Copy link
Contributor Author

jreback commented Dec 12, 2012

ready to merge

@jreback
Copy link
Contributor Author

jreback commented Dec 12, 2012

@wesm I put some memory debugging in here (requires psutil)...and a big table to test...looks better..but can you take a look at 2 issues:

  1. concat with a single item uses memory....maybe some reindexing with copy not set to False somewhere?
  2. memory in block2d_to_blocknd seems ok..am I missing something?

thanks

@jreback
Copy link
Contributor Author

jreback commented Dec 13, 2012

@y-p finally pulled in your travis changes....for the full_deps build....thxs

@jreback
Copy link
Contributor Author

jreback commented Dec 13, 2012

@wesm this is ready to merge - travis build failing because pytz a dep and also numexpr is a dep (for tables) in full_dep

@wesm
Copy link
Member

wesm commented Dec 13, 2012

Got it thanks!

@ghost
Copy link

ghost commented Dec 13, 2012

travis should be good now, rebase if you like.

@ghost
Copy link

ghost commented Dec 13, 2012

I'm not sure the meta bit should be merged in yet, too early to make changes before
we know how it's all supposed to work.

@jreback
Copy link
Contributor Author

jreback commented Dec 13, 2012

@y-p removed meta stuff, trying new travis build.....

@jreback
Copy link
Contributor Author

jreback commented Dec 13, 2012

@y-p ok new travis full_dep build seems to work for PyTables! great thanks

is sqlalchemy supposed to be installed? (for full_deps)

@jreback
Copy link
Contributor Author

jreback commented Dec 13, 2012

@wesm done fixing bugs - ready to merge (I know I have said it before...but really ready now!)

     changes axes_to_index keyword in table creation to axes
     allow passing of numeric or named axes (e.g. 0 or 'minor_axis') in axes
     create_axes now checks for current table scheme; raises if this indexing scheme is violated
     added many p4d tests for appending/selection/partial selection/and axis permuation
     added addition Term tests to include p4d
     add __eq__ operators to IndexCol/DataCol/Table to comparisons
     updated docs with Panel4D saving & issues relating to threading
     supporting non-regular indexables:
       e.g. can index a Panel4D on say [labels,major_axis,minor_axis], rather
            than the default of [items,major_axis,minor_axis]
     support column oriented DataFrames (e.g. queryable by the columns)
…ally)

     allow types in Term that are datetime-like (e.g. can provide a timetuple method)
     added a warning if you try to select/remove with a where criteria on a legacy table (which isn't supported),
       you must convert to new format
     added versioning ability, 'pandas_version', can't detect future format changes (not a required attribute)
     bug in concat with single object
     not sure about block2d_to_blocknd memory increase....
…ings (via Terms)

      added support for integer, float, date
…er that existing

     removed meta data saving
     disable memory tests (and put a try:except: around it)
…ns on how to deal with strings in indexables/values
@wesm wesm merged commit 77db9aa into pandas-dev:master Dec 13, 2012
@wesm
Copy link
Member

wesm commented Dec 13, 2012

Merged. thanks dude

@jreback
Copy link
Contributor Author

jreback commented Dec 13, 2012

did u have a chance to look at memory issues?

@wesm
Copy link
Member

wesm commented Dec 14, 2012

No, can you give me an example of the memory problems?

@jreback
Copy link
Contributor Author

jreback commented Dec 14, 2012

Un comment the skipped test for big_table in test_pytables - I print out memory usage (assume u have psutil installed)

2 issues

  1. concat with a single item leaks (I worked around this)
  2. the block2d_to_blocknd uses a bunch of memory (but might be ok)

On Dec 13, 2012, at 7:27 PM, Wes McKinney [email protected] wrote:

No, can you give me an example of the memory problems?


Reply to this email directly or view it on GitHub.

@ghost
Copy link

ghost commented Dec 14, 2012

@jreback , nope, sqlalchemy is only installed for the VBENCH=true job.

@wesm
Copy link
Member

wesm commented Dec 14, 2012

I ran the below. I don't think it's leaking memory, I blame the Python private heap

In [1]: paste
wp = Panel(np.random.randn(20, 1000, 1000),
                   items= [ 'Item%s' % i for i in xrange(20) ],
                   major_axis=date_range('1/1/2000', periods=1000),
                   minor_axis = [ 'E%s' % i for i in xrange(1000) ])

wp.ix[:,100:200,300:400] = np.nan

import psutil
import os
proc = psutil.Process(os.getpid())
print proc.get_memory_info().rss

path = 'foo.h5'
for i in range(10): 
        try:
                store = HDFStore(path)
                store._debug_memory = True
                store.append('wp',wp)
                recons = store.select('wp')
                del recons
                print proc.get_memory_info().rss
        finally:
                store.close()
                os.remove(path)
## -- End pasted text --
228184064
[start] cur_mem->566.12 (MB),per_mem->4.48
[read_axes] cur_mem->894.82 (MB),per_mem->7.09
[pre block] cur_mem->895.74 (MB),per_mem->7.09
[block created done] cur_mem->1055.74 (MB),per_mem->8.36
[pre-concat] cur_mem->1055.74 (MB),per_mem->8.36
[post-concat] cur_mem->1055.74 (MB),per_mem->8.36
[done] cur_mem->1055.75 (MB),per_mem->8.36
737345536
[start] cur_mem->567.20 (MB),per_mem->4.49
[read_axes] cur_mem->895.89 (MB),per_mem->7.10
[pre block] cur_mem->895.89 (MB),per_mem->7.10
[block created done] cur_mem->1055.89 (MB),per_mem->8.36
[pre-concat] cur_mem->1055.89 (MB),per_mem->8.36
[post-concat] cur_mem->1055.89 (MB),per_mem->8.36
[done] cur_mem->1055.89 (MB),per_mem->8.36
737484800
[start] cur_mem->567.20 (MB),per_mem->4.49
[read_axes] cur_mem->895.89 (MB),per_mem->7.10
[pre block] cur_mem->895.89 (MB),per_mem->7.10
[block created done] cur_mem->1055.89 (MB),per_mem->8.36
[pre-concat] cur_mem->1055.89 (MB),per_mem->8.36
[post-concat] cur_mem->1055.89 (MB),per_mem->8.36
[done] cur_mem->1055.89 (MB),per_mem->8.36
737484800
[start] cur_mem->567.20 (MB),per_mem->4.49
[read_axes] cur_mem->895.89 (MB),per_mem->7.10
[pre block] cur_mem->895.89 (MB),per_mem->7.10
[block created done] cur_mem->1055.89 (MB),per_mem->8.36
[pre-concat] cur_mem->1055.89 (MB),per_mem->8.36
[post-concat] cur_mem->1055.89 (MB),per_mem->8.36
[done] cur_mem->1055.89 (MB),per_mem->8.36
737484800
[start] cur_mem->567.20 (MB),per_mem->4.49
[read_axes] cur_mem->895.89 (MB),per_mem->7.10
[pre block] cur_mem->895.89 (MB),per_mem->7.10
[block created done] cur_mem->1055.89 (MB),per_mem->8.36
[pre-concat] cur_mem->1055.89 (MB),per_mem->8.36
[post-concat] cur_mem->1055.89 (MB),per_mem->8.36
[done] cur_mem->1055.89 (MB),per_mem->8.36
737484800
[start] cur_mem->567.20 (MB),per_mem->4.49
[read_axes] cur_mem->895.89 (MB),per_mem->7.10
[pre block] cur_mem->895.89 (MB),per_mem->7.10
[block created done] cur_mem->1055.89 (MB),per_mem->8.36
[pre-concat] cur_mem->1055.89 (MB),per_mem->8.36
[post-concat] cur_mem->1055.89 (MB),per_mem->8.36
[done] cur_mem->1055.89 (MB),per_mem->8.36
737484800
[start] cur_mem->567.20 (MB),per_mem->4.49
[read_axes] cur_mem->895.89 (MB),per_mem->7.10
[pre block] cur_mem->895.89 (MB),per_mem->7.10
[block created done] cur_mem->1055.89 (MB),per_mem->8.36
[pre-concat] cur_mem->1055.89 (MB),per_mem->8.36
[post-concat] cur_mem->1055.89 (MB),per_mem->8.36
[done] cur_mem->1055.89 (MB),per_mem->8.36
737484800
[start] cur_mem->567.27 (MB),per_mem->4.49
[read_axes] cur_mem->895.95 (MB),per_mem->7.10
[pre block] cur_mem->895.95 (MB),per_mem->7.10
[block created done] cur_mem->1055.95 (MB),per_mem->8.36
[pre-concat] cur_mem->1055.95 (MB),per_mem->8.36
[post-concat] cur_mem->1055.95 (MB),per_mem->8.36
[done] cur_mem->1055.95 (MB),per_mem->8.36
737550336
[start] cur_mem->567.27 (MB),per_mem->4.49
[read_axes] cur_mem->895.95 (MB),per_mem->7.10
[pre block] cur_mem->895.95 (MB),per_mem->7.10
[block created done] cur_mem->1055.95 (MB),per_mem->8.36
[pre-concat] cur_mem->1055.95 (MB),per_mem->8.36
[post-concat] cur_mem->1055.95 (MB),per_mem->8.36
[done] cur_mem->1055.95 (MB),per_mem->8.36
737550336
[start] cur_mem->567.27 (MB),per_mem->4.49
[read_axes] cur_mem->895.95 (MB),per_mem->7.10
[pre block] cur_mem->895.95 (MB),per_mem->7.10
[block created done] cur_mem->1055.95 (MB),per_mem->8.36
[pre-concat] cur_mem->1055.95 (MB),per_mem->8.36
[post-concat] cur_mem->1055.95 (MB),per_mem->8.36
[done] cur_mem->1055.95 (MB),per_mem->8.36
737550336

@jreback
Copy link
Contributor Author

jreback commented Dec 14, 2012

ok that's fine

still would take a look at concat issue
(with a single item)

  • I just did that check and avoided it

I can be reached on my cell 917-971-6387

On Dec 13, 2012, at 7:48 PM, Wes McKinney [email protected] wrote:

I ran the below. I don't think it's leaking memory, I blame the Python private heap

In [1]: paste
wp = Panel(np.random.randn(20, 1000, 1000),
items= [ 'Item%s' % i for i in xrange(20) ],
major_axis=date_range('1/1/2000', periods=1000),
minor_axis = [ 'E%s' % i for i in xrange(1000) ])

wp.ix[:,100:200,300:400] = np.nan

import psutil
import os
proc = psutil.Process(os.getpid())
print proc.get_memory_info().rss

path = 'foo.h5'
for i in range(10):
try:
store = HDFStore(path)
store._debug_memory = True
store.append('wp',wp)
recons = store.select('wp')
del recons
print proc.get_memory_info().rss
finally:
store.close()
os.remove(path)

-- End pasted text --

228184064
[start] cur_mem->566.12 (MB),per_mem->4.48
[read_axes] cur_mem->894.82 (MB),per_mem->7.09
[pre block] cur_mem->895.74 (MB),per_mem->7.09
[block created done] cur_mem->1055.74 (MB),per_mem->8.36
[pre-concat] cur_mem->1055.74 (MB),per_mem->8.36
[post-concat] cur_mem->1055.74 (MB),per_mem->8.36
[done] cur_mem->1055.75 (MB),per_mem->8.36
737345536
[start] cur_mem->567.20 (MB),per_mem->4.49
[read_axes] cur_mem->895.89 (MB),per_mem->7.10
[pre block] cur_mem->895.89 (MB),per_mem->7.10
[block created done] cur_mem->1055.89 (MB),per_mem->8.36
[pre-concat] cur_mem->1055.89 (MB),per_mem->8.36
[post-concat] cur_mem->1055.89 (MB),per_mem->8.36
[done] cur_mem->1055.89 (MB),per_mem->8.36
737484800
[start] cur_mem->567.20 (MB),per_mem->4.49
[read_axes] cur_mem->895.89 (MB),per_mem->7.10
[pre block] cur_mem->895.89 (MB),per_mem->7.10
[block created done] cur_mem->1055.89 (MB),per_mem->8.36
[pre-concat] cur_mem->1055.89 (MB),per_mem->8.36
[post-concat] cur_mem->1055.89 (MB),per_mem->8.36
[done] cur_mem->1055.89 (MB),per_mem->8.36
737484800
[start] cur_mem->567.20 (MB),per_mem->4.49
[read_axes] cur_mem->895.89 (MB),per_mem->7.10
[pre block] cur_mem->895.89 (MB),per_mem->7.10
[block created done] cur_mem->1055.89 (MB),per_mem->8.36
[pre-concat] cur_mem->1055.89 (MB),per_mem->8.36
[post-concat] cur_mem->1055.89 (MB),per_mem->8.36
[done] cur_mem->1055.89 (MB),per_mem->8.36
737484800
[start] cur_mem->567.20 (MB),per_mem->4.49
[read_axes] cur_mem->895.89 (MB),per_mem->7.10
[pre block] cur_mem->895.89 (MB),per_mem->7.10
[block created done] cur_mem->1055.89 (MB),per_mem->8.36
[pre-concat] cur_mem->1055.89 (MB),per_mem->8.36
[post-concat] cur_mem->1055.89 (MB),per_mem->8.36
[done] cur_mem->1055.89 (MB),per_mem->8.36
737484800
[start] cur_mem->567.20 (MB),per_mem->4.49
[read_axes] cur_mem->895.89 (MB),per_mem->7.10
[pre block] cur_mem->895.89 (MB),per_mem->7.10
[block created done] cur_mem->1055.89 (MB),per_mem->8.36
[pre-concat] cur_mem->1055.89 (MB),per_mem->8.36
[post-concat] cur_mem->1055.89 (MB),per_mem->8.36
[done] cur_mem->1055.89 (MB),per_mem->8.36
737484800
[start] cur_mem->567.20 (MB),per_mem->4.49
[read_axes] cur_mem->895.89 (MB),per_mem->7.10
[pre block] cur_mem->895.89 (MB),per_mem->7.10
[block created done] cur_mem->1055.89 (MB),per_mem->8.36
[pre-concat] cur_mem->1055.89 (MB),per_mem->8.36
[post-concat] cur_mem->1055.89 (MB),per_mem->8.36
[done] cur_mem->1055.89 (MB),per_mem->8.36
737484800
[start] cur_mem->567.27 (MB),per_mem->4.49
[read_axes] cur_mem->895.95 (MB),per_mem->7.10
[pre block] cur_mem->895.95 (MB),per_mem->7.10
[block created done] cur_mem->1055.95 (MB),per_mem->8.36
[pre-concat] cur_mem->1055.95 (MB),per_mem->8.36
[post-concat] cur_mem->1055.95 (MB),per_mem->8.36
[done] cur_mem->1055.95 (MB),per_mem->8.36
737550336
[start] cur_mem->567.27 (MB),per_mem->4.49
[read_axes] cur_mem->895.95 (MB),per_mem->7.10
[pre block] cur_mem->895.95 (MB),per_mem->7.10
[block created done] cur_mem->1055.95 (MB),per_mem->8.36
[pre-concat] cur_mem->1055.95 (MB),per_mem->8.36
[post-concat] cur_mem->1055.95 (MB),per_mem->8.36
[done] cur_mem->1055.95 (MB),per_mem->8.36
737550336
[start] cur_mem->567.27 (MB),per_mem->4.49
[read_axes] cur_mem->895.95 (MB),per_mem->7.10
[pre block] cur_mem->895.95 (MB),per_mem->7.10
[block created done] cur_mem->1055.95 (MB),per_mem->8.36
[pre-concat] cur_mem->1055.95 (MB),per_mem->8.36
[post-concat] cur_mem->1055.95 (MB),per_mem->8.36
[done] cur_mem->1055.95 (MB),per_mem->8.36
737550336

Reply to this email directly or view it on GitHub.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

3 participants