-
-
Notifications
You must be signed in to change notification settings - Fork 18.5k
na_position doesn't work for sort_index() with MultiIndex #14784
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Comments
So basically sorting a multi-index always puts NaNs first, which can be quite annoying. If someone wants to try to tackle this, it should be rather easy I think. |
I'd like to give this a try! I have been looking at the code and when sorting the multi-index, the labels are passed to the After a quick look I see several ways of fixing the
Is the check best left as is and should I fix data to make it work or should I fix the check to match the data? That bit where the |
you can do something like this right about here There are 2 cases when this is called.
This will correctly handle case 1), case 2) is handled by the existing code.
Simply do the above in the MultiIndex code before passing to |
Like so:
I figured the for loop was better than using either map or a list comprehension, since that is what was suggested and for loops are what is used elsewhere. I also did a few tests:
When sorting on levelsWhen sorting the index with the level option set, the na_position option is ignored. Is this working as intended or should the option be passed along? |
@linebp It is probably easier if you open a PR with the above changes (even if you are not sure of the approach, or if it is not yet finished, just indicate so in the PR), that will make discussing it easier.
I think ideally this should also work (so pass na_option along) |
thanks @linebp |
thanks, @jreback , for your patience with all the overthinking |
closes pandas-dev#14784 Author: Line Pedersen <[email protected]> Closes pandas-dev#15845 from linebp/json_normalize_seperator and squashes the following commits: 66f809e [Line Pedersen] BUG GH14784 na_position doesn't work for sort_index() with MultiIndex
closes pandas-dev#14784 Author: Line Pedersen <[email protected]> Closes pandas-dev#15845 from linebp/json_normalize_seperator and squashes the following commits: 66f809e [Line Pedersen] BUG GH14784 na_position doesn't work for sort_index() with MultiIndex
) * TST: separate out groupby/test_nth * BUG: bug in groupby on empty frame with multi groupers xref pandas-dev#14784 closes pandas-dev#16064
Code Sample, a copy-pastable example if possible
Problem description
The
na_position
argument isn't used inDataFrame.sort_index()
orSeries.sort_index()
due to the way we sort theMultiIndex
. Whenever we create aMultiIndex
, we store the labels as relative values. For instance, if we have the followingMultiIndex
:the values get stored as
with a
NaN
placeholder of -1.These label values are what get passed to the sorting algorithm for both DataFrames and Series. Since the sorting only happens on the
labels
, it has no notion of theNaN
.This has been discussed in #14015 and #14672 .
My original naive solution was to change these lines from:
to
This didn't break any tests, but it isn't necessarily the best approach.
Expected Output
Output of
pd.show_versions()
INSTALLED VERSIONS
commit: None
python: 2.7.12.final.0
python-bits: 64
OS: Linux
OS-release: 3.16.0-77-generic
machine: x86_64
processor: x86_64
byteorder: little
LC_ALL: None
LANG: en_US.UTF-8
LOCALE: None.None
pandas: 0.19.0
nose: 1.3.4
pip: 9.0.0
setuptools: 27.2.0
Cython: 0.21
numpy: 1.11.2
scipy: 0.16.1
statsmodels: 0.6.1
xarray: None
IPython: 4.0.0
sphinx: 1.2.3
patsy: 0.3.0
dateutil: 2.5.3
pytz: 2016.7
blosc: None
bottleneck: None
tables: 3.1.1
numexpr: 2.3.1
matplotlib: 1.5.0
openpyxl: 1.8.5
xlrd: 0.9.3
xlwt: 0.7.5
xlsxwriter: 0.5.7
lxml: 3.4.0
bs4: 4.3.2
html5lib: None
httplib2: 0.9.2
apiclient: 1.5.5
sqlalchemy: 0.9.7
pymysql: None
psycopg2: 2.6.1 (dt dec pq3 ext lo64)
jinja2: 2.7.3
boto: 2.32.1
pandas_datareader: None
The text was updated successfully, but these errors were encountered: