BUG: pd.offsets.MonthEnd() delevers wrong month #48779

pjousma · 2022-09-26T08:35:23Z

Pandas version checks

I have checked that this issue has not already been reported.
I have confirmed this bug exists on the latest version of pandas.
I have confirmed this bug exists on the main branch of pandas.

Reproducible Example

import pandas as pd
ts = pd.Timestamp(2017, 1, 31)
ts + pd.offsets.MonthEnd()

Issue Description

I expect that the answer is: Timestamp('2017-01-31 00:00:00')
but I get: Timestamp('2017-02-28 00:00:00')

Expected Behavior

Timestamp('2017-01-31 00:00:00')

Installed Versions

INSTALLED VERSIONS

commit : 4bfe3d0
python : 3.9.12.final.0
python-bits : 64
OS : Windows
OS-release : 10
Version : 10.0.19044
machine : AMD64
processor : Intel64 Family 6 Model 140 Stepping 1, GenuineIntel
byteorder : little
LC_ALL : None
LANG : None
LOCALE : Dutch_Netherlands.1252

pandas : 1.4.2
numpy : 1.21.5
pytz : 2021.3
dateutil : 2.8.2
pip : 21.2.4
setuptools : 61.2.0
Cython : 0.29.28
pytest : 7.1.1
hypothesis : None
sphinx : 4.4.0
blosc : None
feather : None
xlsxwriter : 3.0.3
lxml.etree : 4.8.0
html5lib : None
pymysql : None
psycopg2 : None
jinja2 : 2.11.3
IPython : 8.2.0
pandas_datareader: None
bs4 : 4.11.1
bottleneck : 1.3.4
brotli :
fastparquet : None
fsspec : 2022.02.0
gcsfs : None
markupsafe : 2.0.1
matplotlib : 3.5.1
numba : 0.55.1
numexpr : 2.8.1
odfpy : None
openpyxl : 3.0.9
pandas_gbq : None
pyarrow : None
pyreadstat : None
pyxlsb : None
s3fs : None
scipy : 1.7.3
snappy :
sqlalchemy : 1.4.32
tables : 3.6.1
tabulate : 0.8.9
xarray : 0.20.1
xlrd : 2.0.1
xlwt : None
zstandard : None

The text was updated successfully, but these errors were encountered:

MarcoGorelli · 2022-09-26T08:51:19Z

Thanks for the report

MonthEnd goes to the next date which is an end of the month, so this seems correct

You probably want MonthEnd(0) to get the end of the current month

Might be good to add this as an example to the docs as it seems to be a common source of confusion https://stackoverflow.com/questions/37354105/find-the-end-of-the-month-of-a-pandas-dataframe-series

ZDenizYStenhaug · 2022-09-26T09:37:13Z

Hi! :) I'd love to help with the documentation if this issue is still relevant. If so, could you please assign it to me?
Thanks, have a good day.

MarcoGorelli · 2022-09-26T10:18:01Z

thanks, go ahead! if you comment "take" the issue will be assigned to you, do reach out if you have questions

ZDenizYStenhaug · 2022-09-26T11:03:45Z

take

ChhaSahil · 2022-10-03T17:21:36Z

take

jtimko16 · 2022-10-09T13:27:50Z

Hello, I am looking for my very first contribution. Is there still something I can help will (I would probably need a little bit of guidance). Thanks

MarcoGorelli · 2022-10-09T13:36:42Z

yup, pull requests welcome https://pandas.pydata.org/pandas-docs/stable/development/contributing.html

jtimko16 · 2022-10-10T06:36:49Z

take

pjousma · 2022-10-11T07:53:56Z

Thanks, I found a equivalent solution: ts.replace(day=1) + pd.offsets.MonthEnd(0) For me it is not logical to go to the next month when no parameters are given. For me it is logical that the default parameter should be '0' when no parameters are given by the user. But thanks for the quick response Op ma 26 sep. 2022 om 10:51 schreef Marco Edward Gorelli < ***@***.***>:

…

Thanks for the report MonthEnd goes to the next date which is an end of the month, so this seems correct You probably want MonthEnd(0) to get the end of the current month Might be good to add this as an example to the docs as it seems to be a common source of confusion https://stackoverflow.com/questions/37354105/find-the-end-of-the-month-of-a-pandas-dataframe-series — Reply to this email directly, view it on GitHub <#48779 (comment)>, or unsubscribe <https://github.com/notifications/unsubscribe-auth/A3IXQLDZWOMAEFHTFZOVUQDWAFPZDANCNFSM6AAAAAAQVRXV34> . You are receiving this because you authored the thread.Message ID: ***@***.***>

jtimko16 · 2022-10-11T10:18:55Z

Thanks, I found a equivalent solution: ts.replace(day=1) + pd.offsets.MonthEnd(0) For me it is not logical to go to the next month when no parameters are given. For me it is logical that the default parameter should be '0' when no parameters are given by the user. But thanks for the quick response Op ma 26 sep. 2022 om 10:51 schreef Marco Edward Gorelli < @.>:
…
Thanks for the report MonthEnd goes to the next date which is an end of the month, so this seems correct You probably want MonthEnd(0) to get the end of the current month Might be good to add this as an example to the docs as it seems to be a common source of confusion https://stackoverflow.com/questions/37354105/find-the-end-of-the-month-of-a-pandas-dataframe-series — Reply to this email directly, view it on GitHub <#48779 (comment)>, or unsubscribe https://github.com/notifications/unsubscribe-auth/A3IXQLDZWOMAEFHTFZOVUQDWAFPZDANCNFSM6AAAAAAQVRXV34 . You are receiving this because you authored the thread.Message ID: @.>

Hello, pjousma

Thanks for your comment. I've been thinking about it, and also agree, that the default n parameter should be zero. And also the same for pandas.tseries.offsets.YearEnd() function (it has the same issue).

Who can decide whether we should change parameter to zero or keep it as is?

Thanks,
J.

MarcoGorelli · 2022-10-12T07:35:07Z

this'd be a breaking change and would have to go through a deprecation cycle, not sure it'd be worth it. Would be more in favour of just clearly documenting the behaviour

jtimko16 · 2022-10-12T09:22:35Z

this'd be a breaking change and would have to go through a deprecation cycle, not sure it'd be worth it. Would be more in favour of just clearly documenting the behaviour

Ok, ok. I am going to update the documentation in the next few days. I'll let you know in case of any questions/issues. Thanks.

jtimko16 · 2022-10-15T09:11:22Z

Hi @MarcoGorelli , please may I ask you for a clarification.

Do I understand correctly, I should is update this page of documentation?: https://pandas.pydata.org/docs/reference/api/pandas.tseries.offsets.MonthEnd.html

Therefore I'll be modifying this part of code
https://github.com/pandas-dev/pandas/blob/main/pandas/_libs/tslibs/offsets.pyx (class MonthEnd; starting from row 2746 - I am going to write more examples into the comment block below). Is it correct? Thanks

MarcoGorelli · 2022-10-15T09:24:04Z

yeah looks fine

jtimko16 · 2022-10-16T20:53:48Z

I created a Pull Request #49133

However as it is my very Pull Request, therefore for now, I marked it as a draft. Please can you look at it and give me some feedback? Thanks

pjousma · 2022-10-19T13:46:53Z

"When parameter n=0, always offset to the end of month. When parameter n=1, depend on the given date: a) Given date is on an anchor point (last day of the month) -> offset to the end of the following month. b) Given date is not on an anchor point -> offset to the end of the same month." For me the functionality of the parameter is not logical. My intepretation is that n the offset of the data used by this function. n=0: end of month (of used date) n=1: end of next month n=2: end of 2e next month n=-1: end of previous month An intepration as mentioned "n=1 a if ... else b" is for me very unlogical. Op zo 16 okt. 2022 om 22:54 schreef Jan Timko ***@***.***>:

…

I created a Pull Request #49133 <#49133> However as it is my very Pull Request, therefore for now, I marked it as a draft. Please can you look at it and give me some feedback? Thanks — Reply to this email directly, view it on GitHub <#48779 (comment)>, or unsubscribe <https://github.com/notifications/unsubscribe-auth/A3IXQLBOJSCRDFHHADNJHNDWDRTOPANCNFSM6AAAAAAQVRXV34> . You are receiving this because you authored the thread.Message ID: ***@***.***>

jtimko16 · 2022-10-19T20:45:23Z

@pjousma Yes, I agree that the current behavior is little bit confusing, and what you suggested is definitely more logical.

However, I believe that as quick immediate solution we can document current logic (if everything goes well we can close this PR in next several days) and in longer term we can open discussion about changing functionality of the 'n' parameter. Not sure about processes here in Panda, but it make take some time to approve an implement the change.

MarcoGorelli · 2022-10-19T21:21:19Z

Yeah changing behaviour shouldn't be taken lightly - for a start, would there be a way to communicate the change to users in such a way as to not flood them with warnings every time they use MonthEnd?

But certainly it should be clearly documented with examples so thanks @honza26 for taking this on!

jtimko16 · 2022-10-29T14:18:15Z

Hello, unfortunately I am facing technical issues with setting up environment and will not be able to finish (my laptop is quite old and slow, therefore cannot install a lot of tools that will slow it done further)

PR in progress for anyone who want to take it from there:
#49133

If any question, feel free to reach out. And @MarcoGorelli thanks a lot for help and guidance with it (although I didn't manage to cross the finishing line).

Cheers,
Jan

MarcoGorelli · 2022-10-29T14:38:51Z

hey @honza26 - if you want to come to the next new contributors meeting, you could share you screen and we could resolve your technical issues with you?

justin4480 · 2022-10-29T15:21:21Z

take

justin4480 · 2022-10-29T16:31:37Z

Hi. I'm a long time user of pandas, but this is my first contribution. Can I update both the docstring and the docs?

I plan to:

make clear that normalize is an ignored parameter
clarify the unintuitive logic when MonthEnd n > 0 and added to a Timestamp not at monthend

normalize is an ignored parameter

docstring states (class) Docstring shows MonthEnd(n: int = ..., normalize: bool = ...), however normalize doesn't appear to do anything:

for date in pd.date_range(start='2000-01-01', end='2001-01-01'):
    for n in [-1, 0, 1, 2]:
        a = date + pd.offsets.MonthEnd(n=n, normalize=True)
        b = date + pd.offsets.MonthEnd(n=n, normalize=False)
        assert a == b

unintuitive logic

for n in range(-2, 3):
    print(f"\n{n=}")
    print(f"2000-05-01 -> {(pd.Timestamp('2000-05-01') + pd.offsets.MonthEnd(n=n)).strftime('%Y-%m-%d')}")
    print(f"2000-05-15 -> {(pd.Timestamp('2000-05-15') + pd.offsets.MonthEnd(n=n)).strftime('%Y-%m-%d')}")
    print(f"2000-05-31 -> {(pd.Timestamp('2000-05-31') + pd.offsets.MonthEnd(n=n)).strftime('%Y-%m-%d')}")

# n=-2
+ 2000-05-01 -> 2000-03-31
+ 2000-05-15 -> 2000-03-31
+ 2000-05-31 -> 2000-03-31

# n=-1
+ 2000-05-01 -> 2000-04-30
+ 2000-05-15 -> 2000-04-30
+ 2000-05-31 -> 2000-04-30

# n=0
+ 2000-05-01 -> 2000-05-31
+ 2000-05-15 -> 2000-05-31
+ 2000-05-31 -> 2000-05-31

# n=1
! 2000-05-01 -> 2000-05-31
! 2000-05-15 -> 2000-05-31
+ 2000-05-31 -> 2000-06-30

# n=2
! 2000-05-01 -> 2000-06-30
! 2000-05-15 -> 2000-06-30
+ 2000-05-31 -> 2000-07-31

I appreciate the breaking changes impact, but this really does feel like a bug to me as I would expect MonthEnd() should behave like excels =EOMONTH function.

Regardless, I will at least document as clearly as possible with examples, but it's going to be hard to explain away given results are not even consistent, i.e. n = -1 Vs n = +1 (as shown above).

jtimko16 · 2022-10-29T17:56:21Z

hey @honza26 - if you want to come to the next new contributors meeting, you could share you screen and we could resolve your technical issues with you?

Thanks for your offer. I will try to build the environment again, and potentially join the next contributors meeting.

jtimko16 · 2022-10-30T09:41:10Z

@justin4480 Thanks for taking this further. A good initiative!

A good explanation of general offset logic is here: (in case it helps)
https://pandas.pydata.org/docs/user_guide/timeseries.html

MarcoGorelli · 2022-10-30T11:15:18Z

That's a good reference, it's just not very easy to discover from the docstrings.

I'd be all for just adding a simple example to the docstrings of each of those offsets (MonthEnd, MonthBegin, WeekEnd)

justin4480 · 2022-10-30T16:45:29Z

@justin4480 Thanks for taking this further. A good initiative!

A good explanation of general offset logic is here: (in case it helps) https://pandas.pydata.org/docs/user_guide/timeseries.html

@honza26 Thanks for sending this over. This was very helpful in understanding the logic & design intent!

However, one doubt for me remains regarding the normalize parameter. The docs claim:

source: https://pandas.pydata.org/docs/user_guide/timeseries.html#parametric-offsets

However the below runs perfectly fine.

for dt in pd.date_range(start='2000-01-01', end='2001-01-01'):
    for n in range(-2, 3):
        a = dt + pd.offsets.MonthEnd(n=n, normalize=True)
        b = dt + pd.offsets.MonthEnd(n=n, normalize=False)
        assert a == b
        c = dt - pd.offsets.MonthEnd(n=n, normalize=True)
        d = dt - pd.offsets.MonthEnd(n=n, normalize=False)
        assert c == d

jtimko16 · 2022-10-30T20:47:06Z

@justin4480
If normalize is True, then datetime is normalized to midnight. If you dont have any 'time' part of in your date/datetime variables, it makes no difference.

My examples to show functionality:

`dt_now = datetime.datetime.now()
print(f'Datetime now: {dt_now}\n')

a = dt_now + pd.offsets.MonthEnd(n=1, normalize=True)
b = dt_now + pd.offsets.MonthEnd(n=1, normalize=False)

print(f'Offset to MonthEnd; normalize = True: {a}')
print(f'Offset to MonthEnd; normalize = False: {b}')
`

And output is:

`Datetime now: 2022-10-30 20:42:39.526462

Offset to MonthEnd; normalize = True: 2022-10-31 00:00:00
Offset to MonthEnd; normalize = False: 2022-10-31 20:42:39.526462`

justin4480 · 2022-11-07T00:45:45Z

Sorry I was determined to do this, but have spent far too long trying to get my environment setup. I tried both the recommended docker & mamba dev environment setup, but just couldn't get the pytest suite to run without tones of errors (before making any code edits). I will revisit when I have more time as I do want to contribute, but have unassigned this particular issue for now.

zemnly · 2022-11-12T11:01:40Z

Hi, I'd like to take a shot at this. Gonna be my first contribution here. I've been using Pandas on and off for my job since the past 3 years (:

zemnly · 2022-11-12T11:02:06Z

take

jtimko16 · 2022-11-12T20:35:08Z

@zemnly Nice :) Good luck!!

natmokval · 2022-11-28T10:55:15Z

Hi, I am working on this issue as well.

pjousma added Bug Needs Triage Issue that has not been reviewed by a pandas team member labels Sep 26, 2022

MarcoGorelli added good first issue Frequency DateOffsets and removed Bug Needs Triage Issue that has not been reviewed by a pandas team member labels Sep 26, 2022

github-actions bot assigned ZDenizYStenhaug Sep 26, 2022

phofl added the Docs label Sep 29, 2022

github-actions bot assigned ChhaSahil Oct 3, 2022

github-actions bot assigned jtimko16 Oct 10, 2022

jtimko16 mentioned this issue Oct 17, 2022

BUG: pd.offsets.MonthEnd() delevers wrong month #49133

Closed

5 tasks

jtimko16 removed their assignment Oct 29, 2022

github-actions bot assigned justin4480 Oct 29, 2022

justin4480 removed their assignment Nov 7, 2022

github-actions bot assigned zemnly Nov 12, 2022

zemnly removed their assignment Nov 13, 2022

natmokval mentioned this issue Nov 29, 2022

48779-doc-MonthEnd #49958

Merged

2 tasks

MarcoGorelli closed this as completed in #49958 Nov 30, 2022

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

BUG: pd.offsets.MonthEnd() delevers wrong month #48779

BUG: pd.offsets.MonthEnd() delevers wrong month #48779

pjousma commented Sep 26, 2022

INSTALLED VERSIONS

MarcoGorelli commented Sep 26, 2022

ZDenizYStenhaug commented Sep 26, 2022

MarcoGorelli commented Sep 26, 2022

ZDenizYStenhaug commented Sep 26, 2022

ChhaSahil commented Oct 3, 2022

jtimko16 commented Oct 9, 2022

MarcoGorelli commented Oct 9, 2022

jtimko16 commented Oct 10, 2022

pjousma commented Oct 11, 2022 via email

jtimko16 commented Oct 11, 2022

MarcoGorelli commented Oct 12, 2022

jtimko16 commented Oct 12, 2022

jtimko16 commented Oct 15, 2022

MarcoGorelli commented Oct 15, 2022

jtimko16 commented Oct 16, 2022

pjousma commented Oct 19, 2022 via email

jtimko16 commented Oct 19, 2022

MarcoGorelli commented Oct 19, 2022

jtimko16 commented Oct 29, 2022

MarcoGorelli commented Oct 29, 2022

justin4480 commented Oct 29, 2022

justin4480 commented Oct 29, 2022 •

edited

Loading

jtimko16 commented Oct 29, 2022

jtimko16 commented Oct 30, 2022

MarcoGorelli commented Oct 30, 2022

justin4480 commented Oct 30, 2022

jtimko16 commented Oct 30, 2022 •

edited

Loading

justin4480 commented Nov 7, 2022

zemnly commented Nov 12, 2022

zemnly commented Nov 12, 2022

jtimko16 commented Nov 12, 2022

natmokval commented Nov 28, 2022

BUG: pd.offsets.MonthEnd() delevers wrong month #48779

BUG: pd.offsets.MonthEnd() delevers wrong month #48779

Comments

pjousma commented Sep 26, 2022

Pandas version checks

Reproducible Example

Issue Description

Expected Behavior

Installed Versions

INSTALLED VERSIONS

MarcoGorelli commented Sep 26, 2022

ZDenizYStenhaug commented Sep 26, 2022

MarcoGorelli commented Sep 26, 2022

ZDenizYStenhaug commented Sep 26, 2022

ChhaSahil commented Oct 3, 2022

jtimko16 commented Oct 9, 2022

MarcoGorelli commented Oct 9, 2022

jtimko16 commented Oct 10, 2022

pjousma commented Oct 11, 2022 via email

jtimko16 commented Oct 11, 2022

MarcoGorelli commented Oct 12, 2022

jtimko16 commented Oct 12, 2022

jtimko16 commented Oct 15, 2022

MarcoGorelli commented Oct 15, 2022

jtimko16 commented Oct 16, 2022

pjousma commented Oct 19, 2022 via email

jtimko16 commented Oct 19, 2022

MarcoGorelli commented Oct 19, 2022

jtimko16 commented Oct 29, 2022

MarcoGorelli commented Oct 29, 2022

justin4480 commented Oct 29, 2022

justin4480 commented Oct 29, 2022 • edited Loading

jtimko16 commented Oct 29, 2022

jtimko16 commented Oct 30, 2022

MarcoGorelli commented Oct 30, 2022

justin4480 commented Oct 30, 2022

jtimko16 commented Oct 30, 2022 • edited Loading

justin4480 commented Nov 7, 2022

zemnly commented Nov 12, 2022

zemnly commented Nov 12, 2022

jtimko16 commented Nov 12, 2022

natmokval commented Nov 28, 2022

justin4480 commented Oct 29, 2022 •

edited

Loading

jtimko16 commented Oct 30, 2022 •

edited

Loading