Skip to content

index is included in memory usage by default#11867

Closed
max-sixty wants to merge 1 commit intopandas-dev:masterfrom
max-sixty:get-size-of
Closed

index is included in memory usage by default#11867
max-sixty wants to merge 1 commit intopandas-dev:masterfrom
max-sixty:get-size-of

Conversation

@max-sixty
Copy link
Copy Markdown
Contributor

...and sys.getsizeof returns correct value. Closes #11597

Is this the best implementation for a common method across classes?

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

you can put this in PandasObject I think

@jreback jreback added this to the 0.18.0 milestone Dec 19, 2015
@jreback jreback added the Compat pandas objects compatability with Numpy or Python functions label Dec 19, 2015
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

test this on series/Index/Categorical as well. (for Index put a test in test_index on the Base which will run for all indexes).

Copy link
Copy Markdown
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I added onto the existing tests in IndexOps - lmk if that's not OK

@max-sixty max-sixty force-pushed the get-size-of branch 3 times, most recently from c462b19 to c99cda7 Compare December 21, 2015 18:45
@max-sixty
Copy link
Copy Markdown
Contributor Author

@jreback Python 2.6 error with delta argument on assertAlmostEqual - should I skip the test on 2.6 or find an alternate way of doing this?

xref #7718

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

we don't use this

use tm.assert_almost_equal

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I thought we disabled the test routines that one shouldn't use - in tm.TestCase
but maybe that's an open issue (or if not can you create one)

we are dropping 2.6 soon - but still like for there to be 1 way to do testing

Copy link
Copy Markdown
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

OK - although the functionality is materially worse than the nosetests one (actually kinda peculiar with the bool argument for 3 vs 5):

check_less_precise : bool, default False
Specify comparison precision.
5 digits (False) or 3 digits (True) after decimal points are compared.

@max-sixty max-sixty force-pushed the get-size-of branch 2 times, most recently from 51de6c4 to 36d75d8 Compare December 21, 2015 19:59
@max-sixty
Copy link
Copy Markdown
Contributor Author

@jreback assertLess isn't supported either on 2.6?
But we're green

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

actually I think this should call with deep=True.

Copy link
Copy Markdown
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I don't have enough context to know whether deep should be True or False by default, but I can see the logic behind the default for .memory_usage should being the same as that for sys.getsizeof, given the similar use cases.

Or why are the use cases different? Because a user calling from the system wants accuracy vs. speed but a user calling from pandas wants speed vs. accuracy?

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

using __sizeof__ should have deep=True which gives the 'best' report of memory used. This is not the default for .memory_usage() because its expensive to compute (potentially). see #11595 where it can be somewhat slow (though the cython impl helps).

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

e.g. if someone is calling sys.getsizeof(df) then I think its appropriate to give the most accurate (if maybe somewhat non-performant) answer.

Copy link
Copy Markdown
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

OK, if it's a material performance drag and a different enough set of use cases, sobeit

@max-sixty max-sixty force-pushed the get-size-of branch 6 times, most recently from 3c02600 to 6045ae7 Compare January 2, 2016 00:08
@max-sixty
Copy link
Copy Markdown
Contributor Author

@jreback updated & green

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

nothing should raise a TypeError, what does? rather than catching this error, need to fix the underlying.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

Compat pandas objects compatability with Numpy or Python functions

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants