Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Potential regression induced by "Refactored pandas_timedelta_to_timedeltastruct" #57951

Open
DeaMariaLeon opened this issue Mar 21, 2024 · 6 comments
Labels
Needs Triage Issue that has not been reviewed by a pandas team member Performance Memory or execution speed performance Regression Functionality that used to work in a prior pandas version
Milestone

Comments

@DeaMariaLeon
Copy link
Member

PR #55999

If it is expected please ignore this issue.

@WillAyd

"eb55bca5bf91541a5c4f6213b18824589415127b": {
"tslibs.fields.TimeGetTimedeltaField.time_get_timedelta_field (Python) with field='microseconds', size=1000000": "http://57.128.112.95:5000/compare/benchmarks/065fb93bf74c75d480006fe97e7b82d8...065fb94f7876798380009c8d16cc367e",
"tslibs.fields.TimeGetTimedeltaField.time_get_timedelta_field (Python) with field='seconds', size=1000000": "http://57.128.112.95:5000/compare/benchmarks/065fb93bf6bf7a7d8000621519222b60...065fb94f77ee7e3e800087d3ad953647",
"tslibs.fields.TimeGetTimedeltaField.time_get_timedelta_field (Python) with field='nanoseconds', size=1000000": "http://57.128.112.95:5000/compare/benchmarks/065fb93bf7e07ec98000d31ecf9914d4...065fb94f78fa779080005b333b82e55a"
}
Screenshot 2024-03-21 at 17 00 27

@lithomas1 lithomas1 added Performance Memory or execution speed performance Regression Functionality that used to work in a prior pandas version Needs Triage Issue that has not been reviewed by a pandas team member labels Mar 21, 2024
@WillAyd
Copy link
Member

WillAyd commented Mar 22, 2024

I think the issue here is that we combined a few case statements into one:

  case NPY_FR_s:
  case NPY_FR_ms:
  case NPY_FR_us:
  case NPY_FR_ns: {
    npy_int64 per_sec;
    if (base == NPY_FR_s) {
      per_sec = 1;
    } else if (base == NPY_FR_ms) {
      per_sec = 1000;
    } else if (base == NPY_FR_us) {
      per_sec = 1000000;
    } else {
      per_sec = 1000000000;
    }

    ...

Whereas previously the code was copy/pasted and slightly tweaked for every case statement.

There are probably some tricks we can try like using a static lookup table instead of branches:

  case NPY_FR_s:
  case NPY_FR_ms:
  case NPY_FR_us:
  case NPY_FR_ns: {
    const npy_int64 sec_per_day = 86400;
    static npy_int64 per_secs[] = {
      1, // NPY_FR_s
      1000, // NPY_FR_ms
      1000000, // NPY_FR_us
      1000000000 // NPY_FR_ns
    };
    const npy_int64 per_sec = per_secs[base - NPY_FR_s];

but I'm not really sure the tricks are worth it. And I think the code de-duplication is probably the increased timing here

@dontgoto
Copy link
Contributor

There are probably some tricks we can try like using a static lookup table instead of branches:
I tried that, but it did not improve the situation on my machine.

I did some other refactoring that brought down the benchmark's runtime by 20% on my machine for s/us/ns.

There are also code paths that the benchmark is not really testing (those I saw a similar improvement from my changes): s/ms/us/ns values so large that the "day" parsing part of the code gets triggered as well as negative offsets, both don't get checked because the benchmark runs only with random numbers between 0 and 10.

@dontgoto
Copy link
Contributor

dontgoto commented Apr 3, 2024

I now have a version of the date time conversion that is correct and tested and runs up to 2.5x faster than main and the previous version before the regression (at least on my machine, M2 ARM). Interestingly, I do not see large differences in runtimes between the pre regression commit and main.

I would suggest changing the benchmark inputs to negative values that are at least on the order of one day to make most of the logic in the function actually show up in the runtime of the benchmark. What do you think? Is there a procedure to follow to update the history of benchmark runtimes?

To back that up, here are some of my benchmark results for all three versions, I will update my PR soon:

input range | version main main pre regression fix
+30±10ns 4.9 4.8 3.9
-30±10ns 7.9 7.2 5.2
±30ns 8.4 8.5 7.1
+4d±100s 11.6 11.4 5.2
-4d±100s 12.7 12.7 5.3
±4d 9.7 9.4 7.5

(Runtimes are in ms, runs are for 10^6 random values)

The current versions of the function have branch prediction and variable dependency issues that only show up when using larger ranges of inputs and negative values. The previously mentioned checks against the base unit size are of very low cost in reality since the unit size does not change (in the benchmark, but also in real world) and get branch predicted without failure.

Randomly negative and positive inputs still do not do all too well in my version, but realistically timestamps would not have this property.

@WillAyd
Copy link
Member

WillAyd commented Apr 4, 2024

@dontgoto thanks for the great research. If you have improvements to the benchmark I would just go ahead and submit it - we don't manage the history of changes to a benchmark itself that strictly

@ba05
Copy link

ba05 commented May 20, 2024

Possibly related?: #57035

@dontgoto
Copy link
Contributor

Possibly related?: #57035

Good find, but the regression here should be unrelated to that plotting issue. Converting 1M timestamps takes some milliseconds, but the referenced regression is in the seconds range.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
Needs Triage Issue that has not been reviewed by a pandas team member Performance Memory or execution speed performance Regression Functionality that used to work in a prior pandas version
Projects
None yet
Development

No branches or pull requests

6 participants