Special-case deriving `PartialOrd` for enums with dataless variants #103659

clubby789 · 2022-10-27T23:29:36Z

I was able to get slightly better codegen by flipping the derived PartialOrd logic for two-variant enums. I also tried to document the implementation of the derive macro to make the special-case logic a little clearer.

#[derive(PartialEq, PartialOrd)]
pub enum A<T> {
    A,
    B(T)
}

impl<T: ::core::cmp::PartialOrd> ::core::cmp::PartialOrd for A<T> {
   #[inline]
   fn partial_cmp(
       &self,
       other: &A<T>,
   ) -> ::core::option::Option<::core::cmp::Ordering> {
       let __self_tag = ::core::intrinsics::discriminant_value(self);
       let __arg1_tag = ::core::intrinsics::discriminant_value(other);
-      match ::core::cmp::PartialOrd::partial_cmp(&__self_tag, &__arg1_tag) {
-          ::core::option::Option::Some(::core::cmp::Ordering::Equal) => {
-              match (self, other) {
-                  (A::B(__self_0), A::B(__arg1_0)) => {
-                      ::core::cmp::PartialOrd::partial_cmp(__self_0, __arg1_0)
-                  }
-                  _ => ::core::option::Option::Some(::core::cmp::Ordering::Equal),
-              }
+      match (self, other) {
+          (A::B(__self_0), A::B(__arg1_0)) => {
+              ::core::cmp::PartialOrd::partial_cmp(__self_0, __arg1_0)
           }
-          cmp => cmp,
+          _ => ::core::cmp::PartialOrd::partial_cmp(&__self_tag, &__arg1_tag),
       }
   }
}

Godbolt: Current, New
I'm not sure how common a case comparing two enums like this (such as Option) is, and if it's worth the slowdown of adding a special case to the derive. If it causes overall regressions it might be worth just manually implementing this for Option.

rustbot · 2022-10-27T23:29:43Z

r? @compiler-errors

(rustbot has picked a reviewer for you, use r? to override)

compiler-errors · 2022-10-27T23:30:31Z

@bors try @rust-timer queue

rust-timer · 2022-10-27T23:30:33Z

Awaiting bors try build completion.

@rustbot label: +S-waiting-on-perf

bors · 2022-10-27T23:30:43Z

⌛ Trying commit 4440b9954cd728433ce6232af319c79393984e9a with merge 7d4ed6a1e86edcf57e09c298a239018705744284...

bors · 2022-10-28T02:11:06Z

☀️ Try build successful - checks-actions
Build commit: 7d4ed6a1e86edcf57e09c298a239018705744284 (7d4ed6a1e86edcf57e09c298a239018705744284)

bors · 2022-10-28T02:11:07Z

☀️ Try build successful - checks-actions
Build commit: 7d4ed6a1e86edcf57e09c298a239018705744284 (7d4ed6a1e86edcf57e09c298a239018705744284)

rust-timer · 2022-10-28T02:11:09Z

Queued 7d4ed6a1e86edcf57e09c298a239018705744284 with parent 0da281b, future comparison URL.

vacuus · 2022-10-28T02:45:26Z

Can't this fairly easily (at the risk of uglier code) be generalized to enums with one dataful variant and an arbitrary number of dataless variants? The rest of the variants would be accounted for by comparing the tags just as you've done.

compiler/rustc_builtin_macros/src/deriving/cmp/partial_ord.rs

rust-timer · 2022-10-28T09:26:11Z

Finished benchmarking commit (7d4ed6a1e86edcf57e09c298a239018705744284): comparison URL.

Overall result: ✅ improvements - no action needed

Benchmarking this pull request likely means that it is perf-sensitive, so we're automatically marking it as not fit for rolling up. While you can manually mark this PR as fit for rollup, we strongly recommend not doing so since this PR may lead to changes in compiler perf.

@bors rollup=never
@rustbot label: +S-waiting-on-review -S-waiting-on-perf -perf-regression

Instruction count

This is a highly reliable metric that was used to determine the overall result at the top of this comment.

	mean¹	range	count²
Regressions ❌ (primary)	-	-	0
Regressions ❌ (secondary)	-	-	0
Improvements ✅ (primary)	-	-	0
Improvements ✅ (secondary)	-0.6%	[-0.8%, -0.5%]	3
All ❌✅ (primary)	-	-	0

Max RSS (memory usage)

Results

This is a less reliable metric that may be of interest but was not used to determine the overall result at the top of this comment.

	mean¹	range	count²
Regressions ❌ (primary)	-	-	0
Regressions ❌ (secondary)	-	-	0
Improvements ✅ (primary)	-	-	0
Improvements ✅ (secondary)	-1.6%	[-1.6%, -1.6%]	1
All ❌✅ (primary)	-	-	0

Cycles

This benchmark run did not return any relevant results for this metric.

the arithmetic mean of the percent change ↩ ↩²
number of relevant changes ↩ ↩²

clubby789 · 2022-10-28T11:04:20Z

Can't this fairly easily (at the risk of uglier code) be generalized to enums with one dataful variant and an arbitrary number of dataless variants? The rest of the variants would be accounted for by comparing the tags just as you've done.

On further testing I'm actually able to get better codegen by applying this to all enums with any dataless variant (Old, New - output of cargo expand)

thomcc · 2022-10-28T15:14:03Z

Perf run requested in discord.

@bors try @rust-timer queue

rust-timer · 2022-10-28T15:14:05Z

Awaiting bors try build completion.

@rustbot label: +S-waiting-on-perf

bors · 2022-10-28T15:14:12Z

⌛ Trying commit 4c1bc43887286d919814c24ec5d7048351fcf78a with merge f58bb57191d1311a4ed7e7a0a77b057d32c27550...

bors · 2022-10-28T17:19:51Z

☀️ Try build successful - checks-actions
Build commit: f58bb57191d1311a4ed7e7a0a77b057d32c27550 (f58bb57191d1311a4ed7e7a0a77b057d32c27550)

rust-timer · 2022-10-28T17:19:53Z

Queued f58bb57191d1311a4ed7e7a0a77b057d32c27550 with parent a9ef100, future comparison URL.

rust-timer · 2022-10-28T20:05:10Z

Finished benchmarking commit (f58bb57191d1311a4ed7e7a0a77b057d32c27550): comparison URL.

Overall result: ✅ improvements - no action needed

Benchmarking this pull request likely means that it is perf-sensitive, so we're automatically marking it as not fit for rolling up. While you can manually mark this PR as fit for rollup, we strongly recommend not doing so since this PR may lead to changes in compiler perf.

@bors rollup=never
@rustbot label: +S-waiting-on-review -S-waiting-on-perf -perf-regression

Instruction count

This is a highly reliable metric that was used to determine the overall result at the top of this comment.

	mean¹	range	count²
Regressions ❌ (primary)	-	-	0
Regressions ❌ (secondary)	-	-	0
Improvements ✅ (primary)	-	-	0
Improvements ✅ (secondary)	-0.6%	[-0.7%, -0.5%]	2
All ❌✅ (primary)	-	-	0

Max RSS (memory usage)

This benchmark run did not return any relevant results for this metric.

Cycles

This benchmark run did not return any relevant results for this metric.

the arithmetic mean of the percent change ↩
number of relevant changes ↩

compiler-errors · 2022-11-15T03:27:41Z

sorry for not getting around to reviewing this, gonna re-roll

r? compiler

nagisa · 2022-11-20T20:42:10Z

Is the current benchmark as-in origin/master or as-in the previous revision of the PR? I guess latter, but worth clarifying.

Unfortunately, I don’t know what the ideal cut-over point may be, but the characteristics between the two approaches will always have different tradeoffs depending on what data is passed in. Are these just different variants? In that case the tag check first should ~always be faster. Are the data in the same variant different? A match may be faster then, since the tag check is always going to pass anyway. No padding? memcmp will have the most consistent results in telling if the values are definitely equal or definitely inequal. Different architectures? Different branch predictors (if any!), and cut-over points.

This is all going to be further complicated by LLVM having its own heuristics for what kind of algorithm is appropriate for a switch – a table, a jump tree or something else.

With that in mind, demonstration that the improvement is ~consistent over origin/master for all sorts of scenarios (especially including degenerate scenarios such as enums with, like, 500 variants) is the only way I can see such a performance improvement getting approved.

clubby789 · 2022-11-21T14:08:31Z

Is the current benchmark as-in origin/master or as-in the previous revision of the PR?

current is origin/master - I have not benchmarked the original revision.

I will try some more comprehensive benchmarks so we can get some more data on what gives us the best speed

clubby789 · 2022-11-21T14:13:04Z

@rustbot author

clubby789 · 2022-11-21T18:20:41Z

I wrote a script to generate some code and some fairly rough benchmarks, but the numbers look a bit more promising. I tested many permutations - for each one, comparing

Two equal, dataless variants
Two different dataless variants
Two equal dataful variants
Two different dataful variants
With this, the overall execution time decreases noticably and consistently across all tested types of enum (except for single-variant enums). Dataless-only enums were not tested as we know tag checking will be best there.

Times are in ms, and are given as 0 if the test was not applicable to the type of enum.

`origin/master` Derived

+─────────+─────────────────+───────────────────+────────────────+──────────────────+───────────+
| name    | equal_dataless  | inequal_dataless  | equal_dataful  | inequal_dataful  | total_ms  |
+─────────+─────────────────+───────────────────+────────────────+──────────────────+───────────+
| E       | 1335            | 0                 | 0              | 0                | 1335      |
| F       | 0               | 0                 | 971            | 0                | 971       |
| FE      | 1226            | 0                 | 949            | 0                | 2175      |
| FF      | 0               | 0                 | 993            | 1110             | 2103      |
| FEE     | 979             | 927               | 986            | 0                | 2892      |
| FEF     | 1034            | 0                 | 993            | 960              | 2987      |
| FFE     | 1020            | 0                 | 983            | 972              | 2975      |
| FFF     | 0               | 0                 | 985            | 953              | 1938      |
| FEEE    | 967             | 970               | 1016           | 0                | 2953      |
| FEEF    | 1047            | 923               | 977            | 951              | 3898      |
| FEFE    | 1029            | 980               | 1001           | 973              | 3983      |
| FEFF    | 1042            | 0                 | 979            | 969              | 2990      |
| FFEE    | 993             | 925               | 1010           | 944              | 3872      |
| FFEF    | 974             | 0                 | 968            | 966              | 2908      |
| FFFE    | 991             | 0                 | 995            | 954              | 2940      |
| FFFF    | 0               | 0                 | 1039           | 960              | 1999      |
| FEEEE   | 999             | 952               | 1007           | 0                | 2958      |
| FEEEF   | 984             | 957               | 990            | 962              | 3893      |
| FEEFE   | 986             | 962               | 996            | 959              | 3903      |
| FEEFF   | 984             | 950               | 1000           | 1000             | 3934      |
| FEFEE   | 952             | 909               | 986            | 934              | 3781      |
| FEFEF   | 1219            | 948               | 1015           | 954              | 4136      |
| FEFFE   | 957             | 924               | 961            | 955              | 3797      |
| FEFFF   | 996             | 0                 | 1019           | 980              | 2995      |
| FFEEE   | 984             | 930               | 976            | 962              | 3852      |
| FFEEF   | 1003            | 1022              | 995            | 960              | 3980      |
| FFEFE   | 972             | 923               | 987            | 941              | 3823      |
| FFEFF   | 1076            | 0                 | 976            | 996              | 3048      |
| FFFEE   | 973             | 959               | 971            | 956              | 3859      |
| FFFEF   | 1123            | 0                 | 988            | 953              | 3064      |
| FFFFE   | 1288            | 0                 | 1005           | 962              | 3255      |
| FFFFF   | 0               | 0                 | 1020           | 952              | 1972      |
| FEEEEE  | 977             | 931               | 1018           | 0                | 2926      |
| FEEEEF  | 990             | 963               | 958            | 966              | 3877      |
| FEEEFE  | 984             | 932               | 985            | 1015             | 3916      |
| FEEEFF  | 1046            | 930               | 1003           | 956              | 3935      |
| FEEFEE  | 975             | 930               | 976            | 952              | 3833      |
| FEEFEF  | 971             | 940               | 985            | 938              | 3834      |
| FEEFFE  | 983             | 934               | 978            | 936              | 3831      |
| FEEFFF  | 959             | 932               | 987            | 955              | 3833      |
| FEFEEE  | 973             | 950               | 981            | 948              | 3852      |
| FEFEEF  | 967             | 934               | 984            | 938              | 3823      |
| FEFEFE  | 971             | 926               | 985            | 952              | 3834      |
| FEFEFF  | 1055            | 943               | 973            | 941              | 3912      |
| FEFFEE  | 983             | 935               | 1001           | 937              | 3856      |
| FEFFEF  | 979             | 955               | 975            | 967              | 3876      |
| FEFFFE  | 980             | 930               | 1007           | 955              | 3872      |
| FEFFFF  | 1013            | 0                 | 1033           | 985              | 3031      |
| FFEEEE  | 985             | 947               | 996            | 944              | 3872      |
| FFEEEF  | 997             | 947               | 1002           | 954              | 3900      |
+─────────+─────────────────+───────────────────+────────────────+──────────────────+───────────+

Using 'match then tag' scheme

+─────────+─────────────────+───────────────────+────────────────+──────────────────+───────────+
| name    | equal_dataless  | inequal_dataless  | equal_dataful  | inequal_dataful  | total_ms  |
+─────────+─────────────────+───────────────────+────────────────+──────────────────+───────────+
| E       | 1203            | 0                 | 0              | 0                | 1203      |
| F       | 0               | 0                 | 985            | 0                | 985       |
| FE      | 1042            | 0                 | 975            | 0                | 2017      |
| FF      | 0               | 0                 | 993            | 968              | 1961      |
| FEE     | 1043            | 943               | 979            | 0                | 2965      |
| FEF     | 972             | 0                 | 974            | 947              | 2893      |
| FFE     | 969             | 0                 | 985            | 934              | 2888      |
| FFF     | 0               | 0                 | 1027           | 951              | 1978      |
| FEEE    | 992             | 1104              | 1027           | 0                | 3123      |
| FEEF    | 977             | 947               | 1029           | 932              | 3885      |
| FEFE    | 981             | 963               | 992            | 938              | 3874      |
| FEFF    | 1024            | 0                 | 992            | 964              | 2980      |
| FFEE    | 993             | 951               | 996            | 941              | 3881      |
| FFEF    | 983             | 0                 | 974            | 944              | 2901      |
| FFFE    | 962             | 0                 | 973            | 962              | 2897      |
| FFFF    | 0               | 0                 | 1063           | 938              | 2001      |
| FEEEE   | 977             | 948               | 995            | 0                | 2920      |
| FEEEF   | 997             | 931               | 1016           | 991              | 3935      |
| FEEFE   | 964             | 956               | 986            | 963              | 3869      |
| FEEFF   | 993             | 918               | 1001           | 941              | 3853      |
| FEFEE   | 976             | 939               | 975            | 958              | 3848      |
| FEFEF   | 991             | 930               | 974            | 937              | 3832      |
| FEFFE   | 981             | 975               | 984            | 961              | 3901      |
| FEFFF   | 985             | 0                 | 1031           | 951              | 2967      |
| FFEEE   | 1018            | 938               | 977            | 960              | 3893      |
| FFEEF   | 961             | 1170              | 998            | 980              | 4109      |
| FFEFE   | 1252            | 931               | 994            | 957              | 4134      |
| FFEFF   | 1061            | 0                 | 1000           | 998              | 3059      |
| FFFEE   | 978             | 982               | 992            | 946              | 3898      |
| FFFEF   | 1067            | 0                 | 1013           | 954              | 3034      |
| FFFFE   | 1014            | 0                 | 996            | 949              | 2959      |
| FFFFF   | 0               | 0                 | 1009           | 941              | 1950      |
| FEEEEE  | 1004            | 942               | 976            | 0                | 2922      |
| FEEEEF  | 968             | 953               | 974            | 1016             | 3911      |
| FEEEFE  | 978             | 934               | 993            | 970              | 3875      |
| FEEEFF  | 966             | 948               | 1009           | 956              | 3879      |
| FEEFEE  | 995             | 932               | 977            | 939              | 3843      |
| FEEFEF  | 977             | 937               | 985            | 943              | 3842      |
| FEEFFE  | 965             | 920               | 990            | 950              | 3825      |
| FEEFFF  | 986             | 930               | 975            | 962              | 3853      |
| FEFEEE  | 989             | 936               | 976            | 968              | 3869      |
| FEFEEF  | 978             | 936               | 1000           | 961              | 3875      |
| FEFEFE  | 958             | 941               | 1000           | 956              | 3855      |
| FEFEFF  | 982             | 936               | 1005           | 947              | 3870      |
| FEFFEE  | 984             | 957               | 993            | 936              | 3870      |
| FEFFEF  | 960             | 960               | 983            | 948              | 3851      |
| FEFFFE  | 973             | 926               | 996            | 978              | 3873      |
| FEFFFF  | 988             | 0                 | 1021           | 945              | 2954      |
| FFEEEE  | 981             | 941               | 1014           | 967              | 3903      |
| FFEEEF  | 966             | 947               | 987            | 931              | 3831      |
+─────────+─────────────────+───────────────────+────────────────+──────────────────+───────────+

Script used to generate code

import string

def make_name(pat):
    return ''.join(["F" if x else "E" for x in pat])

def make_enum(pat):
    name = make_name(pat)
    buf = "#[cfg_attr(derived, derive(PartialOrd))]\n"
    buf += "#[derive(PartialEq)]\n"
    buf += f"pub enum {name}" + " {\n"
    for var, x in zip(string.ascii_uppercase, pat):
        buf += '    ' + var
        if x:
            buf += "(usize)"
        buf += ",\n"
    buf += "}\n"
    buf += "#[cfg(not(derived))]\n"
    buf += f"impl PartialOrd for {name} {{\n"
    buf += "    #[inline]\n"
    buf += "    fn partial_cmp(&self, other: &Self) -> Option<core::cmp::Ordering> {\n"
    buf += "        let l_tag = ::core::intrinsics::discriminant_value(self);\n"
    buf += "        let r_tag = ::core::intrinsics::discriminant_value(other);\n"
    buf += "        match (self, other) {\n"
    for var, x in zip(string.ascii_uppercase, pat):
        if x:
            buf += f"           (Self::{var}(l), Self::{var}(r)) => PartialOrd::partial_cmp(l, r),\n"
    buf += "            _ => PartialOrd::partial_cmp(&l_tag, &r_tag),\n"
    buf += "        }\n"
    buf += "    }\n"
    buf += "}\n"
    return buf

out = """#![feature(core_intrinsics, bench_black_box)]
#![allow(non_snake_case, unreachable_patterns)]
use std::hint::black_box;
"""

# Test cases: Two equal dataless
# Two different dataless
# Two equal dataful with same data
# Two equal dataful with different data

def make_tests(pat):
    name = make_name(pat)
    buf = f"fn test_{name}() -> (u128, u128, u128, u128)" + " {\n"
    buf += "    let one = {"
    dataless = [name for name, x in zip(string.ascii_uppercase, pat) if not x]
    dataful = [name for name, x in zip(string.ascii_uppercase, pat) if x]
    if not dataless:
        buf += "0\n"
    else:
        variant = dataless[0]
        buf += f"let (l, r) = ({name}::{variant}, {name}::{variant});" + '\n'
        buf += "let now = std::time::Instant::now();\n"
        buf += "for _ in 0..50 {black_box(PartialOrd::partial_cmp(black_box(&l), black_box(&r)));}\n"
        buf += "now.elapsed().as_nanos()\n"
    buf += "    };\n"
    
    buf += "    let two = {"
    if len(dataless) < 2:
        buf += "0\n"
    else:
        var_l, var_r = dataless[:2]
        buf += f"let (l, r) = ({name}::{var_l}, {name}::{var_r});" + '\n'
        buf += "let now = std::time::Instant::now();\n"
        buf += "for _ in 0..50 {black_box(PartialOrd::partial_cmp(black_box(&l), black_box(&r)));}\n"
        buf += "now.elapsed().as_nanos()\n"
    buf += "    };\n"
    
    buf += "    let three = {"
    if not dataful:
        buf += "0\n"
    else:
        variant = dataful[0]
        buf += f"let (l, r) = ({name}::{variant}(10), {name}::{variant}(10));" + '\n'
        buf += "let now = std::time::Instant::now();\n"
        buf += "for _ in 0..50 {black_box(PartialOrd::partial_cmp(black_box(&l), black_box(&r)));}\n"
        buf += "now.elapsed().as_nanos()\n"
    buf += "    };\n"
    
    buf += "    let four = {"
    if len(dataful) < 2:
        buf += "0\n"
    else:
        var_l, var_r = dataful[:2]
        buf += f"let (l, r) = ({name}::{var_l}(10), {name}::{var_r}(10));" + '\n'
        buf += "let now = std::time::Instant::now();\n"
        buf += "for _ in 0..50 {black_box(PartialOrd::partial_cmp(black_box(&l), black_box(&r)));}\n"
        buf += "now.elapsed().as_nanos()\n"
    buf += "    };\n"
    
    buf += "    (one, two, three, four)\n"
    buf += "}\n"
    return buf

names = []
for i in range(50):
    pat = [int(x) for x in f"{i:b}"]
    names.append(make_name(pat))
    out += make_enum(pat) + '\n'
    out += make_tests(pat)
out += "fn main() {\n"
out += 'println!("name,equal_dataless,inequal_dataless,equal_dataful,inequal_dataful, total_ms");\n'
for name in names:
    out += f"let (a, b, c, d) = test_{name}();\n"
    out += f'println!("{name},{{a}},{{b}},{{c}},{{d}},{{}}", a+b+c+d);\n'
out += "}\n"
print(out)

nagisa · 2022-11-26T22:48:11Z

Thanks for preparing those. A couple of caveats, the times are actually nanoseconds, and thus running just 50 iterations of each is probably not going to be enough to gain any confidence in the measurement.

I went on to modify the generator a little: to increase the iteration count, and to add some ad-hoc test cases with a large number of variants at once.

Modified generator script

import string

variant_names = [f"V{i}" for i in range(1024)]

def make_name(pat):
    return ''.join(["F" if x else "E" for x in pat])

def make_enum(pat):
    name = make_name(pat)
    buf = "#[cfg_attr(derived, derive(PartialOrd))]\n"
    buf += "#[derive(PartialEq)]\n"
    buf += f"pub enum {name}" + " {\n"
    for var, x in zip(variant_names, pat):
        buf += '    ' + var
        if x:
            buf += "(usize)"
        buf += ",\n"
    buf += "}\n"
    buf += "#[cfg(not(derived))]\n"
    buf += f"impl PartialOrd for {name} {{\n"
    buf += "    #[inline]\n"
    buf += "    fn partial_cmp(&self, other: &Self) -> Option<core::cmp::Ordering> {\n"
    buf += "        let l_tag = ::core::intrinsics::discriminant_value(self);\n"
    buf += "        let r_tag = ::core::intrinsics::discriminant_value(other);\n"
    buf += "        match (self, other) {\n"
    for var, x in zip(variant_names, pat):
        if x:
            buf += f"           (Self::{var}(l), Self::{var}(r)) => PartialOrd::partial_cmp(l, r),\n"
    buf += "            _ => PartialOrd::partial_cmp(&l_tag, &r_tag),\n"
    buf += "        }\n"
    buf += "    }\n"
    buf += "}\n"
    return buf

out = """#![feature(core_intrinsics, bench_black_box)]
#![allow(non_snake_case, unreachable_patterns)]
use std::hint::black_box;
extern crate core;
"""

# Test cases: Two equal dataless
# Two different dataless
# Two equal dataful with same data
# Two equal dataful with different data

def make_tests(pat):
    name = make_name(pat)
    buf = f"fn test_{name}() -> (u128, u128, u128, u128)" + " {\n"
    buf += "    let one = {"
    dataless = [name for name, x in zip(variant_names, pat) if not x]
    dataful = [name for name, x in zip(variant_names, pat) if x]
    if not dataless:
        buf += "0\n"
    else:
        variant = dataless[0]
        buf += f"let (l, r) = ({name}::{variant}, {name}::{variant});" + '\n'
        buf += "let now = std::time::Instant::now();\n"
        buf += "for _ in 0..500000 {black_box(PartialOrd::partial_cmp(black_box(&l), black_box(&r)));}\n"
        buf += "now.elapsed().as_nanos()\n"
    buf += "    };\n"

    buf += "    let two = {"
    if len(dataless) < 2:
        buf += "0\n"
    else:
        var_l, var_r = dataless[:2]
        buf += f"let (l, r) = ({name}::{var_l}, {name}::{var_r});" + '\n'
        buf += "let now = std::time::Instant::now();\n"
        buf += "for _ in 0..500000 {black_box(PartialOrd::partial_cmp(black_box(&l), black_box(&r)));}\n"
        buf += "now.elapsed().as_nanos()\n"
    buf += "    };\n"

    buf += "    let three = {"
    if not dataful:
        buf += "0\n"
    else:
        variant = dataful[0]
        buf += f"let (l, r) = ({name}::{variant}(10), {name}::{variant}(10));" + '\n'
        buf += "let now = std::time::Instant::now();\n"
        buf += "for _ in 0..500000 {black_box(PartialOrd::partial_cmp(black_box(&l), black_box(&r)));}\n"
        buf += "now.elapsed().as_nanos()\n"
    buf += "    };\n"

    buf += "    let four = {"
    if len(dataful) < 2:
        buf += "0\n"
    else:
        var_l, var_r = dataful[:2]
        buf += f"let (l, r) = ({name}::{var_l}(10), {name}::{var_r}(10));" + '\n'
        buf += "let now = std::time::Instant::now();\n"
        buf += "for _ in 0..500000 {black_box(PartialOrd::partial_cmp(black_box(&l), black_box(&r)));}\n"
        buf += "now.elapsed().as_nanos()\n"
    buf += "    };\n"

    buf += "    (one, two, three, four)\n"
    buf += "}\n"
    return buf

names = []
for i in range(50):
    pat = [int(x) for x in f"{i:b}"]
    names.append(make_name(pat))
    out += make_enum(pat) + '\n'
    out += make_tests(pat)

for pat in [0b10101010101010101010101, 1<<250 | 1 << 100, (1<<500) - 1, ((1<<500)-1) ^ (1<<100) ^ (1<<200)]:
    pat = [int(x) for x in f"{pat:b}"]
    name = make_name(pat)
    names.append(name)
    out += make_enum(pat) + '\n'
    out += make_tests(pat)

out += "fn main() {\n"
out += 'println!("name,equal_dataless,inequal_dataless,equal_dataful,inequal_dataful,total_ns");\n'
for name in names:
    out += f"let (a, b, c, d) = test_{name}();\n"
    out += f'println!("{name},{{a}},{{b}},{{c}},{{d}},{{}}", a+b+c+d);\n'
out += "}\n"
print(out)

The results that I got on my somewhat noisy server can be seen in this spreadsheet (sorry; non-free software T_T) Looking at that my conclusion is roughly that the actual difference at least on my machine is rare, but where there is a difference, it is roughly positive. (I also checked the results without optimizations and the results are similarly largely neutral.)

I’m still worried about the fact that these benchmarks are the best-case scenario for the branch predictor, but I don’t think its going to be easy to write a benchmark that exercises it in a meaningfully different ways. With that in mind, I’m quite happy to merge the algorithm as benchmarked.

clubby789 · 2023-01-05T23:16:57Z

Squashed and rebased onto master as well as fixing tests

@rustbot ready

nagisa

I had these queued since my last review but for some reason never got to submitting them…

compiler/rustc_builtin_macros/src/deriving/cmp/partial_ord.rs

clubby789 · 2023-01-09T12:38:37Z

Applied the formatting suggestions as well as a link back to the benchmarks for future reference

nagisa · 2023-01-28T20:50:26Z

@bors r+

Sorry for taking a long time to get back to this!

bors · 2023-01-28T20:50:28Z

📌 Commit 2883148 has been approved by nagisa

It is now in the queue for this repository.

bors · 2023-01-28T22:11:15Z

⌛ Testing commit 2883148 with merge 9f82651...

bors · 2023-01-29T01:27:05Z

☀️ Test successful - checks-actions
Approved by: nagisa
Pushing 9f82651 to master...

rust-timer · 2023-01-29T02:46:22Z

Finished benchmarking commit (9f82651): comparison URL.

Overall result: ✅ improvements - no action needed

@rustbot label: -perf-regression

Instruction count

This is a highly reliable metric that was used to determine the overall result at the top of this comment.

	mean	range	count
Regressions ❌ (primary)	-	-	0
Regressions ❌ (secondary)	-	-	0
Improvements ✅ (primary)	-	-	0
Improvements ✅ (secondary)	-0.2%	[-0.2%, -0.2%]	1
All ❌✅ (primary)	-	-	0

Max RSS (memory usage)

Results

This is a less reliable metric that may be of interest but was not used to determine the overall result at the top of this comment.

	mean	range	count
Regressions ❌ (primary)	2.0%	[2.0%, 2.0%]	1
Regressions ❌ (secondary)	2.2%	[2.1%, 2.2%]	2
Improvements ✅ (primary)	-2.7%	[-2.7%, -2.7%]	1
Improvements ✅ (secondary)	-	-	0
All ❌✅ (primary)	-0.3%	[-2.7%, 2.0%]	2

Cycles

This benchmark run did not return any relevant results for this metric.

nnethercote · 2023-01-30T01:00:33Z

@clubby789 I see you've made multiple improvements to deriving code. I did a bunch of work last year on that code and would be happy to be CC'd on any future changes you make to that code. Thanks!

rustbot assigned compiler-errors Oct 27, 2022

rustbot added S-waiting-on-review Status: Awaiting review from the assignee but also interested parties. T-compiler Relevant to the compiler team, which will review and decide on the PR/issue. labels Oct 27, 2022

rustbot added the S-waiting-on-perf Status: Waiting on a perf run to be completed. label Oct 27, 2022

ptrca reviewed Oct 28, 2022

View reviewed changes

compiler/rustc_builtin_macros/src/deriving/cmp/partial_ord.rs Outdated Show resolved Hide resolved

rustbot removed the S-waiting-on-perf Status: Waiting on a perf run to be completed. label Oct 28, 2022

rustbot added the S-waiting-on-perf Status: Waiting on a perf run to be completed. label Oct 28, 2022

rustbot removed the S-waiting-on-perf Status: Waiting on a perf run to be completed. label Oct 28, 2022

clubby789 changed the title ~~Special-case deriving PartialOrd for two-variant enums~~ Special-case deriving PartialOrd for enums with dataless variants Nov 1, 2022

rustbot assigned nagisa and unassigned compiler-errors Nov 15, 2022

rustbot added S-waiting-on-author Status: This is awaiting some action (such as code changes or more information) from the author. and removed S-waiting-on-review Status: Awaiting review from the assignee but also interested parties. labels Nov 21, 2022

clubby789 force-pushed the improve-partialord-derive branch from 515ec74 to a8fb1e1 Compare January 5, 2023 22:33

rustbot added S-waiting-on-review Status: Awaiting review from the assignee but also interested parties. and removed S-waiting-on-author Status: This is awaiting some action (such as code changes or more information) from the author. labels Jan 5, 2023

nagisa reviewed Jan 9, 2023

View reviewed changes

compiler/rustc_builtin_macros/src/deriving/cmp/partial_ord.rs Outdated Show resolved Hide resolved

compiler/rustc_builtin_macros/src/deriving/cmp/partial_ord.rs Outdated Show resolved Hide resolved

nagisa reviewed Jan 9, 2023

View reviewed changes

compiler/rustc_builtin_macros/src/deriving/cmp/partial_ord.rs Outdated Show resolved Hide resolved

clubby789 force-pushed the improve-partialord-derive branch from a8fb1e1 to 750ea4f Compare January 9, 2023 12:37

Special case deriving PartialOrd for certain enum layouts

2883148

clubby789 force-pushed the improve-partialord-derive branch from 750ea4f to 2883148 Compare January 15, 2023 01:37

bors added S-waiting-on-bors Status: Waiting on bors to run and complete tests. Bors will change the label on completion. and removed S-waiting-on-review Status: Awaiting review from the assignee but also interested parties. labels Jan 28, 2023

bors mentioned this pull request Jan 29, 2023

Gracefully exit if --keep-stage flag is used on a clean source tree #107397

Merged

bors added the merged-by-bors This PR was explicitly merged by bors. label Jan 29, 2023

bors merged commit 9f82651 into rust-lang:master Jan 29, 2023

rustbot added this to the 1.69.0 milestone Jan 29, 2023

clubby789 deleted the improve-partialord-derive branch February 11, 2023 14:44

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Special-case deriving `PartialOrd` for enums with dataless variants #103659

Special-case deriving `PartialOrd` for enums with dataless variants #103659

clubby789 commented Oct 27, 2022

rustbot commented Oct 27, 2022

compiler-errors commented Oct 27, 2022

rust-timer commented Oct 27, 2022

bors commented Oct 27, 2022

bors commented Oct 28, 2022

bors commented Oct 28, 2022

rust-timer commented Oct 28, 2022

vacuus commented Oct 28, 2022 •

edited

Loading

rust-timer commented Oct 28, 2022

clubby789 commented Oct 28, 2022 •

edited

Loading

thomcc commented Oct 28, 2022

rust-timer commented Oct 28, 2022

bors commented Oct 28, 2022

bors commented Oct 28, 2022

rust-timer commented Oct 28, 2022

rust-timer commented Oct 28, 2022

compiler-errors commented Nov 15, 2022

nagisa commented Nov 20, 2022 •

edited

Loading

clubby789 commented Nov 21, 2022 •

edited

Loading

clubby789 commented Nov 21, 2022

clubby789 commented Nov 21, 2022

nagisa commented Nov 26, 2022

clubby789 commented Jan 5, 2023 •

edited

Loading

nagisa left a comment

clubby789 commented Jan 9, 2023

nagisa commented Jan 28, 2023

bors commented Jan 28, 2023

bors commented Jan 28, 2023

bors commented Jan 29, 2023

rust-timer commented Jan 29, 2023

nnethercote commented Jan 30, 2023

Special-case deriving PartialOrd for enums with dataless variants #103659

Special-case deriving PartialOrd for enums with dataless variants #103659

Conversation

clubby789 commented Oct 27, 2022

rustbot commented Oct 27, 2022

compiler-errors commented Oct 27, 2022

rust-timer commented Oct 27, 2022

bors commented Oct 27, 2022

bors commented Oct 28, 2022

bors commented Oct 28, 2022

rust-timer commented Oct 28, 2022

vacuus commented Oct 28, 2022 • edited Loading

rust-timer commented Oct 28, 2022

Overall result: ✅ improvements - no action needed

Footnotes

clubby789 commented Oct 28, 2022 • edited Loading

thomcc commented Oct 28, 2022

rust-timer commented Oct 28, 2022

bors commented Oct 28, 2022

bors commented Oct 28, 2022

rust-timer commented Oct 28, 2022

rust-timer commented Oct 28, 2022

Overall result: ✅ improvements - no action needed

Footnotes

compiler-errors commented Nov 15, 2022

nagisa commented Nov 20, 2022 • edited Loading

clubby789 commented Nov 21, 2022 • edited Loading

clubby789 commented Nov 21, 2022

clubby789 commented Nov 21, 2022

origin/master Derived

Using 'match then tag' scheme

Script used to generate code

nagisa commented Nov 26, 2022

clubby789 commented Jan 5, 2023 • edited Loading

nagisa left a comment

Choose a reason for hiding this comment

clubby789 commented Jan 9, 2023

nagisa commented Jan 28, 2023

bors commented Jan 28, 2023

bors commented Jan 28, 2023

bors commented Jan 29, 2023

rust-timer commented Jan 29, 2023

Overall result: ✅ improvements - no action needed

nnethercote commented Jan 30, 2023

Special-case deriving `PartialOrd` for enums with dataless variants #103659

Special-case deriving `PartialOrd` for enums with dataless variants #103659

vacuus commented Oct 28, 2022 •

edited

Loading

clubby789 commented Oct 28, 2022 •

edited

Loading

nagisa commented Nov 20, 2022 •

edited

Loading

clubby789 commented Nov 21, 2022 •

edited

Loading

`origin/master` Derived

clubby789 commented Jan 5, 2023 •

edited

Loading