Skip to content

ENH: DataFrame.interpolate limit to support all-or-none filling #42291

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Open
aaronsl-hku opened this issue Jun 29, 2021 · 2 comments
Open

ENH: DataFrame.interpolate limit to support all-or-none filling #42291

aaronsl-hku opened this issue Jun 29, 2021 · 2 comments
Labels
Enhancement Missing-data np.nan, pd.NaT, pd.NA, dropna, isnull, interpolate Needs Discussion Requires discussion from core team before further action

Comments

@aaronsl-hku
Copy link

Currently with df.interpolate(limit, limit_direction) , I must choose 1 or both sides to fill when I am limiting the interpolation. What I find more useful is a all-or-none strategy rather than only fill up to the limit count, so I can fill up some short-term missing data and keep long-term missing data to be filtered after. Demonstrated as here:

>>> df = pd.DataFrame([[0,1,2,3],[1,np.nan,np.nan,np.nan],[np.nan,np.nan,np.nan,5],[3,4,5,6]],columns=list('abcd'))
>>> df
     a    b    c    d
0  0.0  1.0  2.0  3.0
1  1.0  NaN  NaN  NaN
2  NaN  NaN  NaN  5.0
3  3.0  4.0  5.0  6.0
>>> # Current options
>>> df.interpolate(axis=0,limit=1,limit_direction='forward')
     a    b    c    d
0  0.0  1.0  2.0  3.0
1  1.0  2.0  3.0  4.0
2  2.0  NaN  NaN  5.0
3  3.0  4.0  5.0  6.0
>>> df.interpolate(axis=0,limit=1,limit_direction='backward')
     a    b    c    d
0  0.0  1.0  2.0  3.0
1  1.0  NaN  NaN  4.0
2  2.0  3.0  4.0  5.0
3  3.0  4.0  5.0  6.0
>>> df.interpolate(axis=0,limit=1,limit_direction='both')
     a    b    c    d
0  0.0  1.0  2.0  3.0
1  1.0  2.0  3.0  4.0
2  2.0  3.0  4.0  5.0
3  3.0  4.0  5.0  6.0
>>> # What is desired
>>> interpolated_df = pd.DataFrame([[0,1,2,3],[1,np.nan,np.nan,4],[2,np.nan,np.nan,5],[3,4,5,6]],columns=list('abcd'))
>>> interpolated_df # NaNs at column b and c not filtered for exceeding limit 1
   a    b    c  d
0  0  1.0  2.0  3
1  1  NaN  NaN  4
2  2  NaN  NaN  5
3  3  4.0  5.0  6
@aaronsl-hku aaronsl-hku added Enhancement Needs Triage Issue that has not been reviewed by a pandas team member labels Jun 29, 2021
@joAschauer
Copy link

Hi there, have a look at #36352 and #25141.

@aaronsl-hku
Copy link
Author

Hi there, have a look at #36352 and #25141.

Hello, thanks I just read into them. I find this different from #36352 as that was reported as BUG while I am suggesting it as an ENH (as extra functionality).

#25141 was quite align with my thoughts... but was it closed to give way to #36352 ?

@mroeschke mroeschke added Missing-data np.nan, pd.NaT, pd.NA, dropna, isnull, interpolate Needs Discussion Requires discussion from core team before further action and removed Needs Triage Issue that has not been reviewed by a pandas team member labels Aug 21, 2021
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
Enhancement Missing-data np.nan, pd.NaT, pd.NA, dropna, isnull, interpolate Needs Discussion Requires discussion from core team before further action
Projects
None yet
Development

No branches or pull requests

3 participants