Skip to content

Special-case deriving PartialOrd for enums with dataless variants #103659

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Merged
merged 1 commit into from
Jan 29, 2023

Conversation

clubby789
Copy link
Contributor

I was able to get slightly better codegen by flipping the derived PartialOrd logic for two-variant enums. I also tried to document the implementation of the derive macro to make the special-case logic a little clearer.

#[derive(PartialEq, PartialOrd)]
pub enum A<T> {
    A,
    B(T)
}
impl<T: ::core::cmp::PartialOrd> ::core::cmp::PartialOrd for A<T> {
   #[inline]
   fn partial_cmp(
       &self,
       other: &A<T>,
   ) -> ::core::option::Option<::core::cmp::Ordering> {
       let __self_tag = ::core::intrinsics::discriminant_value(self);
       let __arg1_tag = ::core::intrinsics::discriminant_value(other);
-      match ::core::cmp::PartialOrd::partial_cmp(&__self_tag, &__arg1_tag) {
-          ::core::option::Option::Some(::core::cmp::Ordering::Equal) => {
-              match (self, other) {
-                  (A::B(__self_0), A::B(__arg1_0)) => {
-                      ::core::cmp::PartialOrd::partial_cmp(__self_0, __arg1_0)
-                  }
-                  _ => ::core::option::Option::Some(::core::cmp::Ordering::Equal),
-              }
+      match (self, other) {
+          (A::B(__self_0), A::B(__arg1_0)) => {
+              ::core::cmp::PartialOrd::partial_cmp(__self_0, __arg1_0)
           }
-          cmp => cmp,
+          _ => ::core::cmp::PartialOrd::partial_cmp(&__self_tag, &__arg1_tag),
       }
   }
}

Godbolt: Current, New
I'm not sure how common a case comparing two enums like this (such as Option) is, and if it's worth the slowdown of adding a special case to the derive. If it causes overall regressions it might be worth just manually implementing this for Option.

@rustbot
Copy link
Collaborator

rustbot commented Oct 27, 2022

r? @compiler-errors

(rustbot has picked a reviewer for you, use r? to override)

@rustbot rustbot added S-waiting-on-review Status: Awaiting review from the assignee but also interested parties. T-compiler Relevant to the compiler team, which will review and decide on the PR/issue. labels Oct 27, 2022
@compiler-errors
Copy link
Member

@bors try @rust-timer queue

@rust-timer
Copy link
Collaborator

Awaiting bors try build completion.

@rustbot label: +S-waiting-on-perf

@rustbot rustbot added the S-waiting-on-perf Status: Waiting on a perf run to be completed. label Oct 27, 2022
@bors
Copy link
Collaborator

bors commented Oct 27, 2022

⌛ Trying commit 4440b9954cd728433ce6232af319c79393984e9a with merge 7d4ed6a1e86edcf57e09c298a239018705744284...

@bors
Copy link
Collaborator

bors commented Oct 28, 2022

☀️ Try build successful - checks-actions
Build commit: 7d4ed6a1e86edcf57e09c298a239018705744284 (7d4ed6a1e86edcf57e09c298a239018705744284)

1 similar comment
@bors
Copy link
Collaborator

bors commented Oct 28, 2022

☀️ Try build successful - checks-actions
Build commit: 7d4ed6a1e86edcf57e09c298a239018705744284 (7d4ed6a1e86edcf57e09c298a239018705744284)

@rust-timer
Copy link
Collaborator

Queued 7d4ed6a1e86edcf57e09c298a239018705744284 with parent 0da281b, future comparison URL.

@vacuus
Copy link
Contributor

vacuus commented Oct 28, 2022

Can't this fairly easily (at the risk of uglier code) be generalized to enums with one dataful variant and an arbitrary number of dataless variants? The rest of the variants would be accounted for by comparing the tags just as you've done.

@rust-timer
Copy link
Collaborator

Finished benchmarking commit (7d4ed6a1e86edcf57e09c298a239018705744284): comparison URL.

Overall result: ✅ improvements - no action needed

Benchmarking this pull request likely means that it is perf-sensitive, so we're automatically marking it as not fit for rolling up. While you can manually mark this PR as fit for rollup, we strongly recommend not doing so since this PR may lead to changes in compiler perf.

@bors rollup=never
@rustbot label: +S-waiting-on-review -S-waiting-on-perf -perf-regression

Instruction count

This is a highly reliable metric that was used to determine the overall result at the top of this comment.

mean1 range count2
Regressions ❌
(primary)
- - 0
Regressions ❌
(secondary)
- - 0
Improvements ✅
(primary)
- - 0
Improvements ✅
(secondary)
-0.6% [-0.8%, -0.5%] 3
All ❌✅ (primary) - - 0

Max RSS (memory usage)

Results

This is a less reliable metric that may be of interest but was not used to determine the overall result at the top of this comment.

mean1 range count2
Regressions ❌
(primary)
- - 0
Regressions ❌
(secondary)
- - 0
Improvements ✅
(primary)
- - 0
Improvements ✅
(secondary)
-1.6% [-1.6%, -1.6%] 1
All ❌✅ (primary) - - 0

Cycles

This benchmark run did not return any relevant results for this metric.

Footnotes

  1. the arithmetic mean of the percent change 2

  2. number of relevant changes 2

@rustbot rustbot removed the S-waiting-on-perf Status: Waiting on a perf run to be completed. label Oct 28, 2022
@clubby789
Copy link
Contributor Author

clubby789 commented Oct 28, 2022

Can't this fairly easily (at the risk of uglier code) be generalized to enums with one dataful variant and an arbitrary number of dataless variants? The rest of the variants would be accounted for by comparing the tags just as you've done.

On further testing I'm actually able to get better codegen by applying this to all enums with any dataless variant (Old, New - output of cargo expand)

@thomcc
Copy link
Member

thomcc commented Oct 28, 2022

Perf run requested in discord.

@bors try @rust-timer queue

@rust-timer
Copy link
Collaborator

Awaiting bors try build completion.

@rustbot label: +S-waiting-on-perf

@rustbot rustbot added the S-waiting-on-perf Status: Waiting on a perf run to be completed. label Oct 28, 2022
@bors
Copy link
Collaborator

bors commented Oct 28, 2022

⌛ Trying commit 4c1bc43887286d919814c24ec5d7048351fcf78a with merge f58bb57191d1311a4ed7e7a0a77b057d32c27550...

@bors
Copy link
Collaborator

bors commented Oct 28, 2022

☀️ Try build successful - checks-actions
Build commit: f58bb57191d1311a4ed7e7a0a77b057d32c27550 (f58bb57191d1311a4ed7e7a0a77b057d32c27550)

@rust-timer
Copy link
Collaborator

Queued f58bb57191d1311a4ed7e7a0a77b057d32c27550 with parent a9ef100, future comparison URL.

@rust-timer
Copy link
Collaborator

Finished benchmarking commit (f58bb57191d1311a4ed7e7a0a77b057d32c27550): comparison URL.

Overall result: ✅ improvements - no action needed

Benchmarking this pull request likely means that it is perf-sensitive, so we're automatically marking it as not fit for rolling up. While you can manually mark this PR as fit for rollup, we strongly recommend not doing so since this PR may lead to changes in compiler perf.

@bors rollup=never
@rustbot label: +S-waiting-on-review -S-waiting-on-perf -perf-regression

Instruction count

This is a highly reliable metric that was used to determine the overall result at the top of this comment.

mean1 range count2
Regressions ❌
(primary)
- - 0
Regressions ❌
(secondary)
- - 0
Improvements ✅
(primary)
- - 0
Improvements ✅
(secondary)
-0.6% [-0.7%, -0.5%] 2
All ❌✅ (primary) - - 0

Max RSS (memory usage)

This benchmark run did not return any relevant results for this metric.

Cycles

This benchmark run did not return any relevant results for this metric.

Footnotes

  1. the arithmetic mean of the percent change

  2. number of relevant changes

@rustbot rustbot removed the S-waiting-on-perf Status: Waiting on a perf run to be completed. label Oct 28, 2022
@clubby789 clubby789 changed the title Special-case deriving PartialOrd for two-variant enums Special-case deriving PartialOrd for enums with dataless variants Nov 1, 2022
@compiler-errors
Copy link
Member

sorry for not getting around to reviewing this, gonna re-roll

r? compiler

@rustbot rustbot assigned nagisa and unassigned compiler-errors Nov 15, 2022
@nagisa
Copy link
Member

nagisa commented Nov 20, 2022

Is the current benchmark as-in origin/master or as-in the previous revision of the PR? I guess latter, but worth clarifying.


Unfortunately, I don’t know what the ideal cut-over point may be, but the characteristics between the two approaches will always have different tradeoffs depending on what data is passed in. Are these just different variants? In that case the tag check first should ~always be faster. Are the data in the same variant different? A match may be faster then, since the tag check is always going to pass anyway. No padding? memcmp will have the most consistent results in telling if the values are definitely equal or definitely inequal. Different architectures? Different branch predictors (if any!), and cut-over points.

This is all going to be further complicated by LLVM having its own heuristics for what kind of algorithm is appropriate for a switch – a table, a jump tree or something else.

With that in mind, demonstration that the improvement is ~consistent over origin/master for all sorts of scenarios (especially including degenerate scenarios such as enums with, like, 500 variants) is the only way I can see such a performance improvement getting approved.

@clubby789
Copy link
Contributor Author

clubby789 commented Nov 21, 2022

Is the current benchmark as-in origin/master or as-in the previous revision of the PR?

current is origin/master - I have not benchmarked the original revision.

I will try some more comprehensive benchmarks so we can get some more data on what gives us the best speed

@clubby789
Copy link
Contributor Author

@rustbot author

@rustbot rustbot added S-waiting-on-author Status: This is awaiting some action (such as code changes or more information) from the author. and removed S-waiting-on-review Status: Awaiting review from the assignee but also interested parties. labels Nov 21, 2022
@clubby789
Copy link
Contributor Author

I wrote a script to generate some code and some fairly rough benchmarks, but the numbers look a bit more promising. I tested many permutations - for each one, comparing

  • Two equal, dataless variants
  • Two different dataless variants
  • Two equal dataful variants
  • Two different dataful variants
    With this, the overall execution time decreases noticably and consistently across all tested types of enum (except for single-variant enums). Dataless-only enums were not tested as we know tag checking will be best there.

Times are in ms, and are given as 0 if the test was not applicable to the type of enum.

origin/master Derived

+─────────+─────────────────+───────────────────+────────────────+──────────────────+───────────+
| name    | equal_dataless  | inequal_dataless  | equal_dataful  | inequal_dataful  | total_ms  |
+─────────+─────────────────+───────────────────+────────────────+──────────────────+───────────+
| E       | 1335            | 0                 | 0              | 0                | 1335      |
| F       | 0               | 0                 | 971            | 0                | 971       |
| FE      | 1226            | 0                 | 949            | 0                | 2175      |
| FF      | 0               | 0                 | 993            | 1110             | 2103      |
| FEE     | 979             | 927               | 986            | 0                | 2892      |
| FEF     | 1034            | 0                 | 993            | 960              | 2987      |
| FFE     | 1020            | 0                 | 983            | 972              | 2975      |
| FFF     | 0               | 0                 | 985            | 953              | 1938      |
| FEEE    | 967             | 970               | 1016           | 0                | 2953      |
| FEEF    | 1047            | 923               | 977            | 951              | 3898      |
| FEFE    | 1029            | 980               | 1001           | 973              | 3983      |
| FEFF    | 1042            | 0                 | 979            | 969              | 2990      |
| FFEE    | 993             | 925               | 1010           | 944              | 3872      |
| FFEF    | 974             | 0                 | 968            | 966              | 2908      |
| FFFE    | 991             | 0                 | 995            | 954              | 2940      |
| FFFF    | 0               | 0                 | 1039           | 960              | 1999      |
| FEEEE   | 999             | 952               | 1007           | 0                | 2958      |
| FEEEF   | 984             | 957               | 990            | 962              | 3893      |
| FEEFE   | 986             | 962               | 996            | 959              | 3903      |
| FEEFF   | 984             | 950               | 1000           | 1000             | 3934      |
| FEFEE   | 952             | 909               | 986            | 934              | 3781      |
| FEFEF   | 1219            | 948               | 1015           | 954              | 4136      |
| FEFFE   | 957             | 924               | 961            | 955              | 3797      |
| FEFFF   | 996             | 0                 | 1019           | 980              | 2995      |
| FFEEE   | 984             | 930               | 976            | 962              | 3852      |
| FFEEF   | 1003            | 1022              | 995            | 960              | 3980      |
| FFEFE   | 972             | 923               | 987            | 941              | 3823      |
| FFEFF   | 1076            | 0                 | 976            | 996              | 3048      |
| FFFEE   | 973             | 959               | 971            | 956              | 3859      |
| FFFEF   | 1123            | 0                 | 988            | 953              | 3064      |
| FFFFE   | 1288            | 0                 | 1005           | 962              | 3255      |
| FFFFF   | 0               | 0                 | 1020           | 952              | 1972      |
| FEEEEE  | 977             | 931               | 1018           | 0                | 2926      |
| FEEEEF  | 990             | 963               | 958            | 966              | 3877      |
| FEEEFE  | 984             | 932               | 985            | 1015             | 3916      |
| FEEEFF  | 1046            | 930               | 1003           | 956              | 3935      |
| FEEFEE  | 975             | 930               | 976            | 952              | 3833      |
| FEEFEF  | 971             | 940               | 985            | 938              | 3834      |
| FEEFFE  | 983             | 934               | 978            | 936              | 3831      |
| FEEFFF  | 959             | 932               | 987            | 955              | 3833      |
| FEFEEE  | 973             | 950               | 981            | 948              | 3852      |
| FEFEEF  | 967             | 934               | 984            | 938              | 3823      |
| FEFEFE  | 971             | 926               | 985            | 952              | 3834      |
| FEFEFF  | 1055            | 943               | 973            | 941              | 3912      |
| FEFFEE  | 983             | 935               | 1001           | 937              | 3856      |
| FEFFEF  | 979             | 955               | 975            | 967              | 3876      |
| FEFFFE  | 980             | 930               | 1007           | 955              | 3872      |
| FEFFFF  | 1013            | 0                 | 1033           | 985              | 3031      |
| FFEEEE  | 985             | 947               | 996            | 944              | 3872      |
| FFEEEF  | 997             | 947               | 1002           | 954              | 3900      |
+─────────+─────────────────+───────────────────+────────────────+──────────────────+───────────+

Using 'match then tag' scheme

+─────────+─────────────────+───────────────────+────────────────+──────────────────+───────────+
| name    | equal_dataless  | inequal_dataless  | equal_dataful  | inequal_dataful  | total_ms  |
+─────────+─────────────────+───────────────────+────────────────+──────────────────+───────────+
| E       | 1203            | 0                 | 0              | 0                | 1203      |
| F       | 0               | 0                 | 985            | 0                | 985       |
| FE      | 1042            | 0                 | 975            | 0                | 2017      |
| FF      | 0               | 0                 | 993            | 968              | 1961      |
| FEE     | 1043            | 943               | 979            | 0                | 2965      |
| FEF     | 972             | 0                 | 974            | 947              | 2893      |
| FFE     | 969             | 0                 | 985            | 934              | 2888      |
| FFF     | 0               | 0                 | 1027           | 951              | 1978      |
| FEEE    | 992             | 1104              | 1027           | 0                | 3123      |
| FEEF    | 977             | 947               | 1029           | 932              | 3885      |
| FEFE    | 981             | 963               | 992            | 938              | 3874      |
| FEFF    | 1024            | 0                 | 992            | 964              | 2980      |
| FFEE    | 993             | 951               | 996            | 941              | 3881      |
| FFEF    | 983             | 0                 | 974            | 944              | 2901      |
| FFFE    | 962             | 0                 | 973            | 962              | 2897      |
| FFFF    | 0               | 0                 | 1063           | 938              | 2001      |
| FEEEE   | 977             | 948               | 995            | 0                | 2920      |
| FEEEF   | 997             | 931               | 1016           | 991              | 3935      |
| FEEFE   | 964             | 956               | 986            | 963              | 3869      |
| FEEFF   | 993             | 918               | 1001           | 941              | 3853      |
| FEFEE   | 976             | 939               | 975            | 958              | 3848      |
| FEFEF   | 991             | 930               | 974            | 937              | 3832      |
| FEFFE   | 981             | 975               | 984            | 961              | 3901      |
| FEFFF   | 985             | 0                 | 1031           | 951              | 2967      |
| FFEEE   | 1018            | 938               | 977            | 960              | 3893      |
| FFEEF   | 961             | 1170              | 998            | 980              | 4109      |
| FFEFE   | 1252            | 931               | 994            | 957              | 4134      |
| FFEFF   | 1061            | 0                 | 1000           | 998              | 3059      |
| FFFEE   | 978             | 982               | 992            | 946              | 3898      |
| FFFEF   | 1067            | 0                 | 1013           | 954              | 3034      |
| FFFFE   | 1014            | 0                 | 996            | 949              | 2959      |
| FFFFF   | 0               | 0                 | 1009           | 941              | 1950      |
| FEEEEE  | 1004            | 942               | 976            | 0                | 2922      |
| FEEEEF  | 968             | 953               | 974            | 1016             | 3911      |
| FEEEFE  | 978             | 934               | 993            | 970              | 3875      |
| FEEEFF  | 966             | 948               | 1009           | 956              | 3879      |
| FEEFEE  | 995             | 932               | 977            | 939              | 3843      |
| FEEFEF  | 977             | 937               | 985            | 943              | 3842      |
| FEEFFE  | 965             | 920               | 990            | 950              | 3825      |
| FEEFFF  | 986             | 930               | 975            | 962              | 3853      |
| FEFEEE  | 989             | 936               | 976            | 968              | 3869      |
| FEFEEF  | 978             | 936               | 1000           | 961              | 3875      |
| FEFEFE  | 958             | 941               | 1000           | 956              | 3855      |
| FEFEFF  | 982             | 936               | 1005           | 947              | 3870      |
| FEFFEE  | 984             | 957               | 993            | 936              | 3870      |
| FEFFEF  | 960             | 960               | 983            | 948              | 3851      |
| FEFFFE  | 973             | 926               | 996            | 978              | 3873      |
| FEFFFF  | 988             | 0                 | 1021           | 945              | 2954      |
| FFEEEE  | 981             | 941               | 1014           | 967              | 3903      |
| FFEEEF  | 966             | 947               | 987            | 931              | 3831      |
+─────────+─────────────────+───────────────────+────────────────+──────────────────+───────────+

Script used to generate code

import string

def make_name(pat):
    return ''.join(["F" if x else "E" for x in pat])

def make_enum(pat):
    name = make_name(pat)
    buf = "#[cfg_attr(derived, derive(PartialOrd))]\n"
    buf += "#[derive(PartialEq)]\n"
    buf += f"pub enum {name}" + " {\n"
    for var, x in zip(string.ascii_uppercase, pat):
        buf += '    ' + var
        if x:
            buf += "(usize)"
        buf += ",\n"
    buf += "}\n"
    buf += "#[cfg(not(derived))]\n"
    buf += f"impl PartialOrd for {name} {{\n"
    buf += "    #[inline]\n"
    buf += "    fn partial_cmp(&self, other: &Self) -> Option<core::cmp::Ordering> {\n"
    buf += "        let l_tag = ::core::intrinsics::discriminant_value(self);\n"
    buf += "        let r_tag = ::core::intrinsics::discriminant_value(other);\n"
    buf += "        match (self, other) {\n"
    for var, x in zip(string.ascii_uppercase, pat):
        if x:
            buf += f"           (Self::{var}(l), Self::{var}(r)) => PartialOrd::partial_cmp(l, r),\n"
    buf += "            _ => PartialOrd::partial_cmp(&l_tag, &r_tag),\n"
    buf += "        }\n"
    buf += "    }\n"
    buf += "}\n"
    return buf

out = """#![feature(core_intrinsics, bench_black_box)]
#![allow(non_snake_case, unreachable_patterns)]
use std::hint::black_box;
"""

# Test cases: Two equal dataless
# Two different dataless
# Two equal dataful with same data
# Two equal dataful with different data

def make_tests(pat):
    name = make_name(pat)
    buf = f"fn test_{name}() -> (u128, u128, u128, u128)" + " {\n"
    buf += "    let one = {"
    dataless = [name for name, x in zip(string.ascii_uppercase, pat) if not x]
    dataful = [name for name, x in zip(string.ascii_uppercase, pat) if x]
    if not dataless:
        buf += "0\n"
    else:
        variant = dataless[0]
        buf += f"let (l, r) = ({name}::{variant}, {name}::{variant});" + '\n'
        buf += "let now = std::time::Instant::now();\n"
        buf += "for _ in 0..50 {black_box(PartialOrd::partial_cmp(black_box(&l), black_box(&r)));}\n"
        buf += "now.elapsed().as_nanos()\n"
    buf += "    };\n"
    
    buf += "    let two = {"
    if len(dataless) < 2:
        buf += "0\n"
    else:
        var_l, var_r = dataless[:2]
        buf += f"let (l, r) = ({name}::{var_l}, {name}::{var_r});" + '\n'
        buf += "let now = std::time::Instant::now();\n"
        buf += "for _ in 0..50 {black_box(PartialOrd::partial_cmp(black_box(&l), black_box(&r)));}\n"
        buf += "now.elapsed().as_nanos()\n"
    buf += "    };\n"
    
    buf += "    let three = {"
    if not dataful:
        buf += "0\n"
    else:
        variant = dataful[0]
        buf += f"let (l, r) = ({name}::{variant}(10), {name}::{variant}(10));" + '\n'
        buf += "let now = std::time::Instant::now();\n"
        buf += "for _ in 0..50 {black_box(PartialOrd::partial_cmp(black_box(&l), black_box(&r)));}\n"
        buf += "now.elapsed().as_nanos()\n"
    buf += "    };\n"
    
    buf += "    let four = {"
    if len(dataful) < 2:
        buf += "0\n"
    else:
        var_l, var_r = dataful[:2]
        buf += f"let (l, r) = ({name}::{var_l}(10), {name}::{var_r}(10));" + '\n'
        buf += "let now = std::time::Instant::now();\n"
        buf += "for _ in 0..50 {black_box(PartialOrd::partial_cmp(black_box(&l), black_box(&r)));}\n"
        buf += "now.elapsed().as_nanos()\n"
    buf += "    };\n"
    
    buf += "    (one, two, three, four)\n"
    buf += "}\n"
    return buf

names = []
for i in range(50):
    pat = [int(x) for x in f"{i:b}"]
    names.append(make_name(pat))
    out += make_enum(pat) + '\n'
    out += make_tests(pat)
out += "fn main() {\n"
out += 'println!("name,equal_dataless,inequal_dataless,equal_dataful,inequal_dataful, total_ms");\n'
for name in names:
    out += f"let (a, b, c, d) = test_{name}();\n"
    out += f'println!("{name},{{a}},{{b}},{{c}},{{d}},{{}}", a+b+c+d);\n'
out += "}\n"
print(out)

@nagisa
Copy link
Member

nagisa commented Nov 26, 2022

Thanks for preparing those. A couple of caveats, the times are actually nanoseconds, and thus running just 50 iterations of each is probably not going to be enough to gain any confidence in the measurement.

I went on to modify the generator a little: to increase the iteration count, and to add some ad-hoc test cases with a large number of variants at once.

Modified generator script
import string

variant_names = [f"V{i}" for i in range(1024)]

def make_name(pat):
    return ''.join(["F" if x else "E" for x in pat])

def make_enum(pat):
    name = make_name(pat)
    buf = "#[cfg_attr(derived, derive(PartialOrd))]\n"
    buf += "#[derive(PartialEq)]\n"
    buf += f"pub enum {name}" + " {\n"
    for var, x in zip(variant_names, pat):
        buf += '    ' + var
        if x:
            buf += "(usize)"
        buf += ",\n"
    buf += "}\n"
    buf += "#[cfg(not(derived))]\n"
    buf += f"impl PartialOrd for {name} {{\n"
    buf += "    #[inline]\n"
    buf += "    fn partial_cmp(&self, other: &Self) -> Option<core::cmp::Ordering> {\n"
    buf += "        let l_tag = ::core::intrinsics::discriminant_value(self);\n"
    buf += "        let r_tag = ::core::intrinsics::discriminant_value(other);\n"
    buf += "        match (self, other) {\n"
    for var, x in zip(variant_names, pat):
        if x:
            buf += f"           (Self::{var}(l), Self::{var}(r)) => PartialOrd::partial_cmp(l, r),\n"
    buf += "            _ => PartialOrd::partial_cmp(&l_tag, &r_tag),\n"
    buf += "        }\n"
    buf += "    }\n"
    buf += "}\n"
    return buf

out = """#![feature(core_intrinsics, bench_black_box)]
#![allow(non_snake_case, unreachable_patterns)]
use std::hint::black_box;
extern crate core;
"""

# Test cases: Two equal dataless
# Two different dataless
# Two equal dataful with same data
# Two equal dataful with different data

def make_tests(pat):
    name = make_name(pat)
    buf = f"fn test_{name}() -> (u128, u128, u128, u128)" + " {\n"
    buf += "    let one = {"
    dataless = [name for name, x in zip(variant_names, pat) if not x]
    dataful = [name for name, x in zip(variant_names, pat) if x]
    if not dataless:
        buf += "0\n"
    else:
        variant = dataless[0]
        buf += f"let (l, r) = ({name}::{variant}, {name}::{variant});" + '\n'
        buf += "let now = std::time::Instant::now();\n"
        buf += "for _ in 0..500000 {black_box(PartialOrd::partial_cmp(black_box(&l), black_box(&r)));}\n"
        buf += "now.elapsed().as_nanos()\n"
    buf += "    };\n"

    buf += "    let two = {"
    if len(dataless) < 2:
        buf += "0\n"
    else:
        var_l, var_r = dataless[:2]
        buf += f"let (l, r) = ({name}::{var_l}, {name}::{var_r});" + '\n'
        buf += "let now = std::time::Instant::now();\n"
        buf += "for _ in 0..500000 {black_box(PartialOrd::partial_cmp(black_box(&l), black_box(&r)));}\n"
        buf += "now.elapsed().as_nanos()\n"
    buf += "    };\n"

    buf += "    let three = {"
    if not dataful:
        buf += "0\n"
    else:
        variant = dataful[0]
        buf += f"let (l, r) = ({name}::{variant}(10), {name}::{variant}(10));" + '\n'
        buf += "let now = std::time::Instant::now();\n"
        buf += "for _ in 0..500000 {black_box(PartialOrd::partial_cmp(black_box(&l), black_box(&r)));}\n"
        buf += "now.elapsed().as_nanos()\n"
    buf += "    };\n"

    buf += "    let four = {"
    if len(dataful) < 2:
        buf += "0\n"
    else:
        var_l, var_r = dataful[:2]
        buf += f"let (l, r) = ({name}::{var_l}(10), {name}::{var_r}(10));" + '\n'
        buf += "let now = std::time::Instant::now();\n"
        buf += "for _ in 0..500000 {black_box(PartialOrd::partial_cmp(black_box(&l), black_box(&r)));}\n"
        buf += "now.elapsed().as_nanos()\n"
    buf += "    };\n"

    buf += "    (one, two, three, four)\n"
    buf += "}\n"
    return buf

names = []
for i in range(50):
    pat = [int(x) for x in f"{i:b}"]
    names.append(make_name(pat))
    out += make_enum(pat) + '\n'
    out += make_tests(pat)

for pat in [0b10101010101010101010101, 1<<250 | 1 << 100, (1<<500) - 1, ((1<<500)-1) ^ (1<<100) ^ (1<<200)]:
    pat = [int(x) for x in f"{pat:b}"]
    name = make_name(pat)
    names.append(name)
    out += make_enum(pat) + '\n'
    out += make_tests(pat)

out += "fn main() {\n"
out += 'println!("name,equal_dataless,inequal_dataless,equal_dataful,inequal_dataful,total_ns");\n'
for name in names:
    out += f"let (a, b, c, d) = test_{name}();\n"
    out += f'println!("{name},{{a}},{{b}},{{c}},{{d}},{{}}", a+b+c+d);\n'
out += "}\n"
print(out)

The results that I got on my somewhat noisy server can be seen in this spreadsheet (sorry; non-free software T_T) Looking at that my conclusion is roughly that the actual difference at least on my machine is rare, but where there is a difference, it is roughly positive. (I also checked the results without optimizations and the results are similarly largely neutral.)

I’m still worried about the fact that these benchmarks are the best-case scenario for the branch predictor, but I don’t think its going to be easy to write a benchmark that exercises it in a meaningfully different ways. With that in mind, I’m quite happy to merge the algorithm as benchmarked.

@clubby789 clubby789 force-pushed the improve-partialord-derive branch from 515ec74 to a8fb1e1 Compare January 5, 2023 22:33
@clubby789
Copy link
Contributor Author

clubby789 commented Jan 5, 2023

Squashed and rebased onto master as well as fixing tests

@rustbot ready

@rustbot rustbot added S-waiting-on-review Status: Awaiting review from the assignee but also interested parties. and removed S-waiting-on-author Status: This is awaiting some action (such as code changes or more information) from the author. labels Jan 5, 2023
Copy link
Member

@nagisa nagisa left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I had these queued since my last review but for some reason never got to submitting them…

@clubby789 clubby789 force-pushed the improve-partialord-derive branch from a8fb1e1 to 750ea4f Compare January 9, 2023 12:37
@clubby789
Copy link
Contributor Author

Applied the formatting suggestions as well as a link back to the benchmarks for future reference

@clubby789 clubby789 force-pushed the improve-partialord-derive branch from 750ea4f to 2883148 Compare January 15, 2023 01:37
@nagisa
Copy link
Member

nagisa commented Jan 28, 2023

@bors r+

Sorry for taking a long time to get back to this!

@bors
Copy link
Collaborator

bors commented Jan 28, 2023

📌 Commit 2883148 has been approved by nagisa

It is now in the queue for this repository.

@bors bors added S-waiting-on-bors Status: Waiting on bors to run and complete tests. Bors will change the label on completion. and removed S-waiting-on-review Status: Awaiting review from the assignee but also interested parties. labels Jan 28, 2023
@bors
Copy link
Collaborator

bors commented Jan 28, 2023

⌛ Testing commit 2883148 with merge 9f82651...

@bors
Copy link
Collaborator

bors commented Jan 29, 2023

☀️ Test successful - checks-actions
Approved by: nagisa
Pushing 9f82651 to master...

@bors bors added the merged-by-bors This PR was explicitly merged by bors. label Jan 29, 2023
@bors bors merged commit 9f82651 into rust-lang:master Jan 29, 2023
@rustbot rustbot added this to the 1.69.0 milestone Jan 29, 2023
@rust-timer
Copy link
Collaborator

Finished benchmarking commit (9f82651): comparison URL.

Overall result: ✅ improvements - no action needed

@rustbot label: -perf-regression

Instruction count

This is a highly reliable metric that was used to determine the overall result at the top of this comment.

mean range count
Regressions ❌
(primary)
- - 0
Regressions ❌
(secondary)
- - 0
Improvements ✅
(primary)
- - 0
Improvements ✅
(secondary)
-0.2% [-0.2%, -0.2%] 1
All ❌✅ (primary) - - 0

Max RSS (memory usage)

Results

This is a less reliable metric that may be of interest but was not used to determine the overall result at the top of this comment.

mean range count
Regressions ❌
(primary)
2.0% [2.0%, 2.0%] 1
Regressions ❌
(secondary)
2.2% [2.1%, 2.2%] 2
Improvements ✅
(primary)
-2.7% [-2.7%, -2.7%] 1
Improvements ✅
(secondary)
- - 0
All ❌✅ (primary) -0.3% [-2.7%, 2.0%] 2

Cycles

This benchmark run did not return any relevant results for this metric.

@nnethercote
Copy link
Contributor

@clubby789 I see you've made multiple improvements to deriving code. I did a bunch of work last year on that code and would be happy to be CC'd on any future changes you make to that code. Thanks!

@clubby789 clubby789 deleted the improve-partialord-derive branch February 11, 2023 14:44
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
merged-by-bors This PR was explicitly merged by bors. S-waiting-on-bors Status: Waiting on bors to run and complete tests. Bors will change the label on completion. T-compiler Relevant to the compiler team, which will review and decide on the PR/issue.
Projects
None yet
Development

Successfully merging this pull request may close these issues.