-
-
Notifications
You must be signed in to change notification settings - Fork 18.5k
BUG: pd.offsets.MonthEnd() delevers wrong month #48779
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Comments
Thanks for the report
You probably want Might be good to add this as an example to the docs as it seems to be a common source of confusion https://stackoverflow.com/questions/37354105/find-the-end-of-the-month-of-a-pandas-dataframe-series |
Hi! :) I'd love to help with the documentation if this issue is still relevant. If so, could you please assign it to me? |
thanks, go ahead! if you comment "take" the issue will be assigned to you, do reach out if you have questions |
take |
take |
Hello, I am looking for my very first contribution. Is there still something I can help will (I would probably need a little bit of guidance). Thanks |
yup, pull requests welcome https://pandas.pydata.org/pandas-docs/stable/development/contributing.html |
take |
Thanks,
I found a equivalent solution: ts.replace(day=1) + pd.offsets.MonthEnd(0)
For me it is not logical to go to the next month when no parameters are
given.
For me it is logical that the default parameter should be '0' when no
parameters are given by the user.
But thanks for the quick response
Op ma 26 sep. 2022 om 10:51 schreef Marco Edward Gorelli <
***@***.***>:
… Thanks for the report
MonthEnd goes to the next date which is an end of the month, so this
seems correct
You probably want MonthEnd(0) to get the end of the current month
Might be good to add this as an example to the docs as it seems to be a
common source of confusion
https://stackoverflow.com/questions/37354105/find-the-end-of-the-month-of-a-pandas-dataframe-series
—
Reply to this email directly, view it on GitHub
<#48779 (comment)>,
or unsubscribe
<https://github.com/notifications/unsubscribe-auth/A3IXQLDZWOMAEFHTFZOVUQDWAFPZDANCNFSM6AAAAAAQVRXV34>
.
You are receiving this because you authored the thread.Message ID:
***@***.***>
|
Hello, pjousma Thanks for your comment. I've been thinking about it, and also agree, that the default n parameter should be zero. And also the same for pandas.tseries.offsets.YearEnd() function (it has the same issue). Who can decide whether we should change parameter to zero or keep it as is? Thanks, |
this'd be a breaking change and would have to go through a deprecation cycle, not sure it'd be worth it. Would be more in favour of just clearly documenting the behaviour |
Ok, ok. I am going to update the documentation in the next few days. I'll let you know in case of any questions/issues. Thanks. |
Hi @MarcoGorelli , please may I ask you for a clarification. Do I understand correctly, I should is update this page of documentation?: https://pandas.pydata.org/docs/reference/api/pandas.tseries.offsets.MonthEnd.html Therefore I'll be modifying this part of code |
yeah looks fine |
I created a Pull Request #49133 However as it is my very Pull Request, therefore for now, I marked it as a draft. Please can you look at it and give me some feedback? Thanks |
"When parameter n=0, always offset to the end of month.
When parameter n=1, depend on the given date:
a) Given date is on an anchor point (last day of the month) -> offset to
the end of the following month.
b) Given date is not on an anchor point -> offset to the end of the same
month."
For me the functionality of the parameter is not logical.
My intepretation is that n the offset of the data used by this function.
n=0: end of month (of used date)
n=1: end of next month
n=2: end of 2e next month
n=-1: end of previous month
An intepration as mentioned "n=1 a if ... else b" is for me very unlogical.
Op zo 16 okt. 2022 om 22:54 schreef Jan Timko ***@***.***>:
… I created a Pull Request #49133
<#49133>
However as it is my very Pull Request, therefore for now, I marked it as a
draft. Please can you look at it and give me some feedback? Thanks
—
Reply to this email directly, view it on GitHub
<#48779 (comment)>,
or unsubscribe
<https://github.com/notifications/unsubscribe-auth/A3IXQLBOJSCRDFHHADNJHNDWDRTOPANCNFSM6AAAAAAQVRXV34>
.
You are receiving this because you authored the thread.Message ID:
***@***.***>
|
@pjousma Yes, I agree that the current behavior is little bit confusing, and what you suggested is definitely more logical. However, I believe that as quick immediate solution we can document current logic (if everything goes well we can close this PR in next several days) and in longer term we can open discussion about changing functionality of the 'n' parameter. Not sure about processes here in Panda, but it make take some time to approve an implement the change. |
Yeah changing behaviour shouldn't be taken lightly - for a start, would there be a way to communicate the change to users in such a way as to not flood them with warnings every time they use But certainly it should be clearly documented with examples so thanks @honza26 for taking this on! |
Hello, unfortunately I am facing technical issues with setting up environment and will not be able to finish (my laptop is quite old and slow, therefore cannot install a lot of tools that will slow it done further) PR in progress for anyone who want to take it from there: If any question, feel free to reach out. And @MarcoGorelli thanks a lot for help and guidance with it (although I didn't manage to cross the finishing line). Cheers, |
hey @honza26 - if you want to come to the next new contributors meeting, you could share you screen and we could resolve your technical issues with you? |
take |
Hi. I'm a long time user of pandas, but this is my first contribution. Can I update both the docstring and the docs? I plan to:
normalize is an ignored parameter docstring states (class) Docstring shows MonthEnd(n: int = ..., normalize: bool = ...), however normalize doesn't appear to do anything: for date in pd.date_range(start='2000-01-01', end='2001-01-01'):
for n in [-1, 0, 1, 2]:
a = date + pd.offsets.MonthEnd(n=n, normalize=True)
b = date + pd.offsets.MonthEnd(n=n, normalize=False)
assert a == b unintuitive logic for n in range(-2, 3):
print(f"\n{n=}")
print(f"2000-05-01 -> {(pd.Timestamp('2000-05-01') + pd.offsets.MonthEnd(n=n)).strftime('%Y-%m-%d')}")
print(f"2000-05-15 -> {(pd.Timestamp('2000-05-15') + pd.offsets.MonthEnd(n=n)).strftime('%Y-%m-%d')}")
print(f"2000-05-31 -> {(pd.Timestamp('2000-05-31') + pd.offsets.MonthEnd(n=n)).strftime('%Y-%m-%d')}") # n=-2
+ 2000-05-01 -> 2000-03-31
+ 2000-05-15 -> 2000-03-31
+ 2000-05-31 -> 2000-03-31
# n=-1
+ 2000-05-01 -> 2000-04-30
+ 2000-05-15 -> 2000-04-30
+ 2000-05-31 -> 2000-04-30
# n=0
+ 2000-05-01 -> 2000-05-31
+ 2000-05-15 -> 2000-05-31
+ 2000-05-31 -> 2000-05-31
# n=1
! 2000-05-01 -> 2000-05-31
! 2000-05-15 -> 2000-05-31
+ 2000-05-31 -> 2000-06-30
# n=2
! 2000-05-01 -> 2000-06-30
! 2000-05-15 -> 2000-06-30
+ 2000-05-31 -> 2000-07-31 I appreciate the breaking changes impact, but this really does feel like a bug to me as I would expect MonthEnd() should behave like excels =EOMONTH function. Regardless, I will at least document as clearly as possible with examples, but it's going to be hard to explain away given results are not even consistent, i.e. n = -1 Vs n = +1 (as shown above). |
Thanks for your offer. I will try to build the environment again, and potentially join the next contributors meeting. |
@justin4480 Thanks for taking this further. A good initiative! A good explanation of general offset logic is here: (in case it helps) |
That's a good reference, it's just not very easy to discover from the docstrings. I'd be all for just adding a simple example to the docstrings of each of those offsets ( |
@honza26 Thanks for sending this over. This was very helpful in understanding the logic & design intent! However, one doubt for me remains regarding the normalize parameter. The docs claim: However the below runs perfectly fine. for dt in pd.date_range(start='2000-01-01', end='2001-01-01'):
for n in range(-2, 3):
a = dt + pd.offsets.MonthEnd(n=n, normalize=True)
b = dt + pd.offsets.MonthEnd(n=n, normalize=False)
assert a == b
c = dt - pd.offsets.MonthEnd(n=n, normalize=True)
d = dt - pd.offsets.MonthEnd(n=n, normalize=False)
assert c == d |
@justin4480 My examples to show functionality: `dt_now = datetime.datetime.now() a = dt_now + pd.offsets.MonthEnd(n=1, normalize=True) print(f'Offset to MonthEnd; normalize = True: {a}') And output is: `Datetime now: 2022-10-30 20:42:39.526462 Offset to MonthEnd; normalize = True: 2022-10-31 00:00:00 |
Sorry I was determined to do this, but have spent far too long trying to get my environment setup. I tried both the recommended docker & mamba dev environment setup, but just couldn't get the pytest suite to run without tones of errors (before making any code edits). I will revisit when I have more time as I do want to contribute, but have unassigned this particular issue for now. |
Hi, I'd like to take a shot at this. Gonna be my first contribution here. I've been using Pandas on and off for my job since the past 3 years (: |
take |
@zemnly Nice :) Good luck!! |
Hi, I am working on this issue as well. |
Pandas version checks
I have checked that this issue has not already been reported.
I have confirmed this bug exists on the latest version of pandas.
I have confirmed this bug exists on the main branch of pandas.
Reproducible Example
Issue Description
I expect that the answer is: Timestamp('2017-01-31 00:00:00')
but I get: Timestamp('2017-02-28 00:00:00')
Expected Behavior
Timestamp('2017-01-31 00:00:00')
Installed Versions
INSTALLED VERSIONS
commit : 4bfe3d0
python : 3.9.12.final.0
python-bits : 64
OS : Windows
OS-release : 10
Version : 10.0.19044
machine : AMD64
processor : Intel64 Family 6 Model 140 Stepping 1, GenuineIntel
byteorder : little
LC_ALL : None
LANG : None
LOCALE : Dutch_Netherlands.1252
pandas : 1.4.2
numpy : 1.21.5
pytz : 2021.3
dateutil : 2.8.2
pip : 21.2.4
setuptools : 61.2.0
Cython : 0.29.28
pytest : 7.1.1
hypothesis : None
sphinx : 4.4.0
blosc : None
feather : None
xlsxwriter : 3.0.3
lxml.etree : 4.8.0
html5lib : None
pymysql : None
psycopg2 : None
jinja2 : 2.11.3
IPython : 8.2.0
pandas_datareader: None
bs4 : 4.11.1
bottleneck : 1.3.4
brotli :
fastparquet : None
fsspec : 2022.02.0
gcsfs : None
markupsafe : 2.0.1
matplotlib : 3.5.1
numba : 0.55.1
numexpr : 2.8.1
odfpy : None
openpyxl : 3.0.9
pandas_gbq : None
pyarrow : None
pyreadstat : None
pyxlsb : None
s3fs : None
scipy : 1.7.3
snappy :
sqlalchemy : 1.4.32
tables : 3.6.1
tabulate : 0.8.9
xarray : 0.20.1
xlrd : 2.0.1
xlwt : None
zstandard : None
The text was updated successfully, but these errors were encountered: