Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

BUG: DatetimeIndex.is_year_start and DatetimeIndex.is_quarter_start always return False on double-digit frequencies #58549

Conversation

natmokval
Copy link
Contributor

@natmokval natmokval commented May 3, 2024

@natmokval natmokval added Bug Frequency DateOffsets labels May 3, 2024
@@ -253,8 +254,7 @@ def get_start_end_field(
# month of year. Other offsets use month, startingMonth as ending
# month of year.

if (freqstr[0:2] in ["MS", "QS", "YS"]) or (
freqstr[1:3] in ["MS", "QS", "YS"]):
if re.split("[0-9]*", freqstr, maxsplit=1)[1][0:2] in ["MS", "QS", "YS"]:
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

i know this is still in draft, but is there any existing function that can be re-used here? it seems like a fairly common thing to get the frequency out of a string, I'm sure there's other places that do it - can something be reused? I don't think it's really feasible to have a regex each time this needs doing

if (freqstr[0:2] in ["MS", "QS", "YS"]) or (
freqstr[1:3] in ["MS", "QS", "YS"]):
offset = to_offset(freqstr)
if offset.freqstr.replace(str(offset.n), "")[0:2] in ["MS", "QS", "YS"]:
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

how about offset.name?

Copy link
Member

@MarcoGorelli MarcoGorelli left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

thanks for updating, this is getting closer

can we do to_offset(freqstr).name even further up, and pass that down to this function? as in, find where get_start_end_field is being called, calculate the frequency name from there, and pass that to the function - this way we avoid repeatedly calling to_offset, which is a bit expensive

@@ -587,7 +587,8 @@ cdef class _Timestamp(ABCTimestamp):
val = self._maybe_convert_value_to_local()

out = get_start_end_field(np.array([val], dtype=np.int64),
field, freqstr, month_kw, self._creso)
field, to_offset(freqstr).name ,
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

there's already freq a few lines above, can we just take it from there?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

sorry, I am not sure if I understand correctly. I replaced freqstr = freq.freqstr with freqstr = to_offset(freq.freqstr).name a few lines above.

Comment on lines 257 to 256
if to_offset(freqstr).name[0:2] in ["MS", "QS", "YS"]:
if freqstr[0:2] in ["MS", "QS", "YS"]:
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

we should also rename the variable name now if something else is being passed in (freq_name?)

Comment on lines 148 to 153
if self.freqstr is not None:
freqstr = to_offset(self.freqstr).name
else:
freqstr = self.freqstr
result = fields.get_start_end_field(
values, field, self.freqstr, month_kw, reso=self._creso
values, field, freqstr, month_kw, reso=self._creso
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

same here

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

done

Comment on lines 256 to 257
if freqstr[0:2] in ["MS", "QS", "YS"]:
freq_name = freqstr.lstrip("B")[0:2]
if freq_name in ["MS", "QS", "YS"]:
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

sorry i meant that the argument freqstr in get_start_end_field needs renaming, because you're no longer passing in freq.freqstr but freq.name, so function argument (line 213) needs renaming

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

thanks, it's clear now. I renamed the argument freqstr in get_start_end_field

Comment on lines 148 to 151
if freq is not None:
freqstr = to_offset(freq.freqstr).name
else:
freqstr = freq
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think this needs renaming too?

And in the else branch, you can just set it to =None

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

thanks, I agree, here freqstr needs renaming too. I replaced it with freq_name

@natmokval natmokval marked this pull request as ready for review May 9, 2024 11:21
@natmokval
Copy link
Contributor Author

@MarcoGorelli could you please take a look at this PR? I think CI failures are unrelated to my changes.

freqstr = freq.freqstr
freqstr = to_offset(freq.freqstr).name
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

does it work to directly do freq.name?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

thanks, it works with freq.name indeed. I simplified to_offset(freq.freqstr).name

freqstr = freq.freqstr
freqstr = freq.name
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

variable name

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

sorry, it's my mistake. I renamed the variable freqstr to freq_name

Copy link
Member

@MarcoGorelli MarcoGorelli left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

looks good to me on green, thanks @natmokval !

@natmokval
Copy link
Contributor Author

looks good to me on green, thanks @natmokval !

thank you for reviewing this PR!

@MarcoGorelli MarcoGorelli added this to the 3.0 milestone May 10, 2024
@MarcoGorelli MarcoGorelli merged commit 1c23c5e into pandas-dev:main May 10, 2024
47 checks passed
@@ -419,6 +419,7 @@ Interval
Indexing
^^^^^^^^
- Bug in :meth:`DataFrame.__getitem__` returning modified columns when called with ``slice`` in Python 3.12 (:issue:`57500`)
- Bug in :meth:`DatetimeIndex.is_year_start` and :meth:`DatetimeIndex.is_quarter_start` returning ``False`` on double-digit frequencies (:issue:`58523`)
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

this should have been in Datetimelike, "indexing" is more like .loc / .iloc / get/setitem stuff

but OK to address this as part of https://github.com/pandas-dev/pandas/pull/58665/files, as that one needs updating anyway

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

thanks, I addressed this comment in the PR you suggested

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
Bug Frequency DateOffsets
Projects
None yet
Development

Successfully merging this pull request may close these issues.

BUG: DatetimeIndex.is_year_start breaks on double-digit frequencies
2 participants