index is included in memory usage by default#11867
index is included in memory usage by default#11867max-sixty wants to merge 1 commit intopandas-dev:masterfrom
Conversation
pandas/core/base.py
Outdated
There was a problem hiding this comment.
you can put this in PandasObject I think
pandas/tests/test_frame.py
Outdated
There was a problem hiding this comment.
test this on series/Index/Categorical as well. (for Index put a test in test_index on the Base which will run for all indexes).
There was a problem hiding this comment.
I added onto the existing tests in IndexOps - lmk if that's not OK
c462b19 to
c99cda7
Compare
pandas/tests/test_base.py
Outdated
There was a problem hiding this comment.
we don't use this
use tm.assert_almost_equal
There was a problem hiding this comment.
I thought we disabled the test routines that one shouldn't use - in tm.TestCase
but maybe that's an open issue (or if not can you create one)
we are dropping 2.6 soon - but still like for there to be 1 way to do testing
There was a problem hiding this comment.
OK - although the functionality is materially worse than the nosetests one (actually kinda peculiar with the bool argument for 3 vs 5):
check_less_precise : bool, default False
Specify comparison precision.
5 digits (False) or 3 digits (True) after decimal points are compared.
51de6c4 to
36d75d8
Compare
|
@jreback |
doc/source/whatsnew/v0.18.0.txt
Outdated
There was a problem hiding this comment.
actually I think this should call with deep=True.
There was a problem hiding this comment.
I don't have enough context to know whether deep should be True or False by default, but I can see the logic behind the default for .memory_usage should being the same as that for sys.getsizeof, given the similar use cases.
Or why are the use cases different? Because a user calling from the system wants accuracy vs. speed but a user calling from pandas wants speed vs. accuracy?
There was a problem hiding this comment.
using __sizeof__ should have deep=True which gives the 'best' report of memory used. This is not the default for .memory_usage() because its expensive to compute (potentially). see #11595 where it can be somewhat slow (though the cython impl helps).
There was a problem hiding this comment.
e.g. if someone is calling sys.getsizeof(df) then I think its appropriate to give the most accurate (if maybe somewhat non-performant) answer.
There was a problem hiding this comment.
OK, if it's a material performance drag and a different enough set of use cases, sobeit
3c02600 to
6045ae7
Compare
|
@jreback updated & green |
pandas/core/base.py
Outdated
There was a problem hiding this comment.
nothing should raise a TypeError, what does? rather than catching this error, need to fix the underlying.
...and sys.getsizeof returns correct value. Closes #11597
Is this the best implementation for a common method across classes?