Missed optimization on array comparison #62531

tesuji · 2019-07-09T13:38:56Z

The godbolt link: https://godbolt.org/z/dc9o3x

I think this snippet should just return true:

pub fn compare() -> bool {
    let bytes = 12.5f32.to_ne_bytes();
    bytes == if cfg!(target_endian = "big") {
        [0x41, 0x48, 0x00, 0x00]
    } else {
        [0x00, 0x00, 0x48, 0x41]
    }
}

The generated asm with opt-level=3:

example::compare:
        push    rax
        mov     al, 1
        pop     rcx
        ret

The text was updated successfully, but these errors were encountered:

mati865 · 2019-07-09T16:11:50Z

I think there are 2 issues:

to_ne_bytes is not const fn
if cfg! is not const fn

See this modified example: https://godbolt.org/z/86WRHF

tesuji · 2019-07-09T16:16:06Z

I may misunderstand but in the C version, does clang require similar conditions to optimize code?
@rustbot modify labels: A-codegen A-const-eval

mati865 · 2019-07-09T16:21:35Z

I may misunderstand but in the C version, does clang require similar condition to optimize code?

#if __BYTE_ORDER__ == __ORDER_BIG_ENDIAN__
    return *(uint32_t *)gf_u.bytes == *(uint32_t *)big;
#elif __BYTE_ORDER__ == __ORDER_LITTLE_ENDIAN__
    return *(uint32_t *)gf_u.bytes == *(uint32_t *)lit;
#endif

Those are preprocessor directives and are handled at compile time so their Rust equivalent would be:
https://godbolt.org/z/Ou1bTH

#![feature(float_to_from_bytes)]
#![feature(stmt_expr_attributes)]
pub fn compare() -> bool {
    let gf_u = 12.5f32;
    #[cfg(target_endian = "big")]
    return gf_u.to_ne_bytes() == [0x41, 0x48, 0x00, 0x00];
    #[cfg(not(target_endian = "big"))]
    return gf_u.to_ne_bytes() == [0x00, 0x00, 0x48, 0x41];
}

tesuji · 2019-07-09T16:27:20Z

https://doc.rust-lang.org/nightly/std/macro.cfg.html states that cfg! macro:

Evaluates boolean combinations of configuration flags at compile-time.

mati865 · 2019-07-09T16:54:11Z

Good point, those two are equal:
https://godbolt.org/z/V7Iuoo

#![feature(float_to_from_bytes)]
pub fn compare() -> bool {
    let gf_u = 12.5f32;
    #[cfg(target_endian = "big")]
    return gf_u.to_ne_bytes() == [0x41, 0x48, 0x00, 0x00];
    #[cfg(not(target_endian = "big"))]
    return gf_u.to_ne_bytes() == [0x00, 0x00, 0x48, 0x41];
}

pub fn compare2() -> bool {
    let bytes = 12.5f32.to_ne_bytes();
    if cfg!(target_endian = "big") {
        bytes == [0x41, 0x48, 0x00, 0x00]
    } else {
        bytes == [0x00, 0x00, 0x48, 0x41]
    }
}

I don't know enough about const and I could be totally wrong but I think in the first case if cfg! expands to something like:

#![feature(float_to_from_bytes)]
pub fn compare() -> bool {
    let bytes = 12.5f32.to_ne_bytes();
    let arr = if cfg!(target_endian = "big") {
         [0x41, 0x48, 0x00, 0x00]
    } else {
        [0x00, 0x00, 0x48, 0x41]
    };

    bytes == arr
}

It cannot make arr constant because #49146 is not yet complete.

tesuji · 2019-07-10T04:50:42Z

Interesting, If I look at MIR output, I think rustc already have enough information:

MIR

// WARNING: This output format is intended for human consumers only
// and is subject to change without notice. Knock yourself out.
fn  compare() -> bool {
    let mut _0: bool;                    // return place in scope 0 at src/lib.rs:2:21: 2:25
    let _1: [u8; 4];                     // "bytes" in scope 0 at src/lib.rs:3:9: 3:14
    let mut _2: &[u8; 4];                // in scope 0 at src/lib.rs:4:5: 4:10
    let mut _3: &[u8; 4];                // in scope 0 at src/lib.rs:4:14: 8:6
    let _4: [u8; 4];                     // in scope 0 at src/lib.rs:4:14: 8:6
    let mut _5: bool;                    // in scope 0 at src/lib.rs:4:17: 4:44
    scope 1 {
    }

    bb0: {
        StorageLive(_1);                 // bb0[0]: scope 0 at src/lib.rs:3:9: 3:14
        _1 = const core::f32::<impl f32>::to_ne_bytes(const 12.5f32) -> bb1; // bb0[1]: scope 0 at src/lib.rs:3:17: 3:38
                                         // ty::Const
                                         // + ty: fn(f32) -> [u8; _] {core::f32::<impl f32>::to_ne_bytes}
                                         // + val: Scalar(<ZST>)
                                         // mir::Constant
                                         // + span: src/lib.rs:3:25: 3:36
                                         // + ty: fn(f32) -> [u8; _] {core::f32::<impl f32>::to_ne_bytes}
                                         // + literal: Const { ty: fn(f32) -> [u8; _] {core::f32::<impl f32>::to_ne_bytes}, val: Scalar(<ZST>) }
                                         // ty::Const
                                         // + ty: f32
                                         // + val: Scalar(0x41480000)
                                         // mir::Constant
                                         // + span: src/lib.rs:3:17: 3:24
                                         // + ty: f32
                                         // + literal: Const { ty: f32, val: Scalar(0x41480000) }
    }

    bb1: {
        StorageLive(_2);                 // bb1[0]: scope 1 at src/lib.rs:4:5: 4:10
        _2 = &_1;                        // bb1[1]: scope 1 at src/lib.rs:4:5: 4:10
        StorageLive(_3);                 // bb1[2]: scope 1 at src/lib.rs:4:14: 8:6
        StorageLive(_4);                 // bb1[3]: scope 1 at src/lib.rs:4:14: 8:6
        StorageLive(_5);                 // bb1[4]: scope 1 at src/lib.rs:4:17: 4:44
        _5 = const false;                // bb1[5]: scope 1 at src/lib.rs:4:17: 4:44
                                         // ty::Const
                                         // + ty: bool
                                         // + val: Scalar(0x00)
                                         // mir::Constant
                                         // + span: src/lib.rs:4:17: 4:44
                                         // + ty: bool
                                         // + literal: Const { ty: bool, val: Scalar(0x00) }
        switchInt(_5) -> [false: bb2, otherwise: bb3]; // bb1[6]: scope 1 at src/lib.rs:4:14: 8:6
    }

    bb2: {
        _4 = [const 0u8, const 0u8, const 72u8, const 65u8]; // bb2[0]: scope 1 at src/lib.rs:7:9: 7:33
                                         // ty::Const
                                         // + ty: u8
                                         // + val: Scalar(0x00)
                                         // mir::Constant
                                         // + span: src/lib.rs:7:10: 7:14
                                         // + ty: u8
                                         // + literal: Const { ty: u8, val: Scalar(0x00) }
                                         // ty::Const
                                         // + ty: u8
                                         // + val: Scalar(0x00)
                                         // mir::Constant
                                         // + span: src/lib.rs:7:16: 7:20
                                         // + ty: u8
                                         // + literal: Const { ty: u8, val: Scalar(0x00) }
                                         // ty::Const
                                         // + ty: u8
                                         // + val: Scalar(0x48)
                                         // mir::Constant
                                         // + span: src/lib.rs:7:22: 7:26
                                         // + ty: u8
                                         // + literal: Const { ty: u8, val: Scalar(0x48) }
                                         // ty::Const
                                         // + ty: u8
                                         // + val: Scalar(0x41)
                                         // mir::Constant
                                         // + span: src/lib.rs:7:28: 7:32
                                         // + ty: u8
                                         // + literal: Const { ty: u8, val: Scalar(0x41) }
        goto -> bb4;                     // bb2[1]: scope 1 at src/lib.rs:4:14: 8:6
    }

    bb3: {
        _4 = [const 65u8, const 72u8, const 0u8, const 0u8]; // bb3[0]: scope 1 at src/lib.rs:5:9: 5:33
                                         // ty::Const
                                         // + ty: u8
                                         // + val: Scalar(0x41)
                                         // mir::Constant
                                         // + span: src/lib.rs:5:10: 5:14
                                         // + ty: u8
                                         // + literal: Const { ty: u8, val: Scalar(0x41) }
                                         // ty::Const
                                         // + ty: u8
                                         // + val: Scalar(0x48)
                                         // mir::Constant
                                         // + span: src/lib.rs:5:16: 5:20
                                         // + ty: u8
                                         // + literal: Const { ty: u8, val: Scalar(0x48) }
                                         // ty::Const
                                         // + ty: u8
                                         // + val: Scalar(0x00)
                                         // mir::Constant
                                         // + span: src/lib.rs:5:22: 5:26
                                         // + ty: u8
                                         // + literal: Const { ty: u8, val: Scalar(0x00) }
                                         // ty::Const
                                         // + ty: u8
                                         // + val: Scalar(0x00)
                                         // mir::Constant
                                         // + span: src/lib.rs:5:28: 5:32
                                         // + ty: u8
                                         // + literal: Const { ty: u8, val: Scalar(0x00) }
        goto -> bb4;                     // bb3[1]: scope 1 at src/lib.rs:4:14: 8:6
    }

    bb4: {
        _3 = &_4;                        // bb4[0]: scope 1 at src/lib.rs:4:14: 8:6
        _0 = const <[u8; 4] as std::cmp::PartialEq>::eq(move _2, move _3) -> bb5; // bb4[1]: scope 1 at src/lib.rs:4:5: 8:6
                                         // ty::Const
                                         // + ty: for<'r, 's> fn(&'r [u8; 4], &'s [u8; 4]) -> bool {<[u8; 4] as std::cmp::PartialEq>::eq}
                                         // + val: Scalar(<ZST>)
                                         // mir::Constant
                                         // + span: src/lib.rs:4:5: 8:6
                                         // + ty: for<'r, 's> fn(&'r [u8; 4], &'s [u8; 4]) -> bool {<[u8; 4] as std::cmp::PartialEq>::eq}
                                         // + literal: Const { ty: for<'r, 's> fn(&'r [u8; 4], &'s [u8; 4]) -> bool {<[u8; 4] as std::cmp::PartialEq>::eq}, val: Scalar(<ZST>) }
    }

    bb5: {
        StorageDead(_3);                 // bb5[0]: scope 1 at src/lib.rs:8:5: 8:6
        StorageDead(_2);                 // bb5[1]: scope 1 at src/lib.rs:8:5: 8:6
        StorageDead(_1);                 // bb5[2]: scope 0 at src/lib.rs:9:1: 9:2
        StorageDead(_5);                 // bb5[3]: scope 0 at src/lib.rs:9:1: 9:2
        StorageDead(_4);                 // bb5[4]: scope 0 at src/lib.rs:9:1: 9:2
        return;                          // bb5[5]: scope 0 at src/lib.rs:9:2: 9:2
    }
}

ecstatic-morse · 2019-07-10T16:36:29Z

@mati865 Whether a function is marked const at the source level has no effect on whether or not it can be optimized by LLVM. Even if to_ne_bytes was not const, inlining and const propagation could conceivably reduce this to a constant.

ecstatic-morse · 2019-10-23T02:48:52Z

Rust now generates the following (almost optimal) assembly for the OP's code:

example::compare:
  push rax
  mov al, 1
  pop rcx
  ret

I would guess that recent improvements to const propagation in the MIR are responsible for this (thanks @wesleywiser!), but it might also have been a change in LLVM. This seems like a decent candidate for a codegen regression test, although an equivalent one might already exist.

tesuji · 2019-10-23T02:58:12Z

It is better now. But are push rax and pop rcx instructions necessary?

ecstatic-morse · 2019-10-23T03:07:16Z

I assume it is due to a difference in calling convention, but it does seem kind of silly. You'll need to ask someone more knowledgeable unfortunately.

ecstatic-morse · 2019-10-23T03:24:11Z

@lzutao A quick google leads me to believe that the push is used to align the stack to a 16-byte boundary.

tesuji · 2019-10-23T03:39:49Z

So does it: https://stackoverflow.com/a/37774474/5456794. Looks like only regression tests are needed.

tesuji · 2019-10-23T05:08:53Z

This version doesn't need any stack alignment: https://godbolt.org/z/6donWz

pub fn compare() -> bool {
    let gf_u = 12.5f32;
    #[cfg(target_endian = "big")]
    return gf_u.to_ne_bytes() == [0x41, 0x48, 0x00, 0x00];
    #[cfg(not(target_endian = "big"))]
    return gf_u.to_ne_bytes() == [0x00, 0x00, 0x48, 0x41];
}

ecstatic-morse · 2019-10-23T18:10:30Z

I tried to leave this alone since I'm not very knowledgeable in this area, but curiosity got the better of me. The following might be incorrect, but it's my best guess.

In the optimized LLVM IR for rust, a call to bcmp remains which has been optimized out in the C++ version. I believe stack alignment on x64 is required only prior to a call instruction, so LLVM sees that a non-trivial call instruction exists in the IR from rust and that compare only has an 8-byte return pointer on the stack, meaning the stack would be misaligned for that call. That's why it inserts the stack alignment code in the rust version only.

At some point during translation to x86 assembly, the bcmp gets removed entirely (I have no idea how), resulting in a function that has a push/pop to align the stack for a call but that doesn't actually call anything.

Perhaps someone more familiar with the various LLVM passes could say more.

BTW, a fairer C++ comparison is the following, but this gives the same results:

int compare() {
    uint8_t big[4] = {0x41, 0x48, 0x00, 0x00};
    uint8_t lit[4] = {0x00, 0x00, 0x48, 0x41};
    ieee_float_shape_type gf_u;
    gf_u.value = 12.5;

    if (__BYTE_ORDER__ == __ORDER_BIG_ENDIAN__) {
        return *(uint32_t *)gf_u.bytes == *(uint32_t *)big;
    } else {
        return *(uint32_t *)gf_u.bytes == *(uint32_t *)lit;
    }
}

ecstatic-morse · 2019-10-23T19:31:17Z

There's some discussion of this in Zulip. I was indeed wrong BTW 😄.

tesuji · 2021-02-24T00:45:32Z

This issue appears to be fixed on beta. Maybe we need a codegen test for it.

Edit: It seems that the llvm-ir generated by rustc are the same between beta and stable.
Only the optimized asm for x86 is different. Should I just write assembly codegen or just close this issue.

nagisa · 2021-05-09T19:19:03Z

The issue seems to have re-appeared in master :(

JOE1994 · 2024-02-19T23:14:23Z

Both rustc 1.76.0 & rustc nightly (as of Feb 19 2024) generates the following asm (with -C opt-level=3)

example::compare:
        mov     al, 1
        ret

Add codegen test for array comparision opt Fixed since rust 1.55 closes rust-lang#62531

rustbot added A-codegen Area: Code generation A-const-eval Area: constant evaluation (mir interpretation) labels Jul 10, 2019

nikic added A-LLVM Area: Code generation parts specific to LLVM. Both correctness bugs and optimization-related issues. I-slow Issue: Problems and improvements with respect to performance of generated code. labels Jul 10, 2019

ordian mentioned this issue Feb 10, 2020

Missed optimization on fixed-size bytes comparison #69024

Open

jonas-schievink added T-compiler Relevant to the compiler team, which will review and decide on the PR/issue. C-enhancement Category: An issue proposing an enhancement or a PR with one. labels Mar 14, 2020

rustbot added the E-needs-test Call for participation: Writing correctness tests. label Feb 24, 2021

tesuji mentioned this issue Feb 24, 2021

Add codegen test for array comparision opt #82459

Closed

nagisa removed the E-needs-test Call for participation: Writing correctness tests. label May 9, 2021

workingjubilee added the C-optimization Category: An issue highlighting optimization opportunities or PRs implementing such label Oct 8, 2023

nikic added the E-needs-test Call for participation: Writing correctness tests. label Feb 20, 2024

tesuji mentioned this issue May 19, 2024

Add codegen test for array comparision opt #125298

Merged

tesuji closed this as completed May 19, 2024

matthiaskrgr added a commit to matthiaskrgr/rust that referenced this issue May 20, 2024

Rollup merge of rust-lang#125298 - tesuji:arrcmp, r=nikic

65dff32

Add codegen test for array comparision opt Fixed since rust 1.55 closes rust-lang#62531

bors added a commit to rust-lang-ci/rust that referenced this issue May 21, 2024

Auto merge of rust-lang#125298 - tesuji:arrcmp, r=nikic

1d0e4af

Add codegen test for array comparision opt Fixed since rust 1.55 closes rust-lang#62531

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Missed optimization on array comparison #62531

Missed optimization on array comparison #62531

tesuji commented Jul 9, 2019 •

edited

mati865 commented Jul 9, 2019

tesuji commented Jul 9, 2019 •

edited

mati865 commented Jul 9, 2019 •

edited

tesuji commented Jul 9, 2019

mati865 commented Jul 9, 2019

tesuji commented Jul 10, 2019 •

edited

ecstatic-morse commented Jul 10, 2019

ecstatic-morse commented Oct 23, 2019 •

edited

tesuji commented Oct 23, 2019

ecstatic-morse commented Oct 23, 2019

ecstatic-morse commented Oct 23, 2019

tesuji commented Oct 23, 2019

tesuji commented Oct 23, 2019 •

edited

ecstatic-morse commented Oct 23, 2019 •

edited

ecstatic-morse commented Oct 23, 2019

tesuji commented Feb 24, 2021 •

edited

nagisa commented May 9, 2021

JOE1994 commented Feb 19, 2024

Missed optimization on array comparison #62531

Missed optimization on array comparison #62531

Comments

tesuji commented Jul 9, 2019 • edited

mati865 commented Jul 9, 2019

tesuji commented Jul 9, 2019 • edited

mati865 commented Jul 9, 2019 • edited

tesuji commented Jul 9, 2019

mati865 commented Jul 9, 2019

tesuji commented Jul 10, 2019 • edited

ecstatic-morse commented Jul 10, 2019

ecstatic-morse commented Oct 23, 2019 • edited

tesuji commented Oct 23, 2019

ecstatic-morse commented Oct 23, 2019

ecstatic-morse commented Oct 23, 2019

tesuji commented Oct 23, 2019

tesuji commented Oct 23, 2019 • edited

ecstatic-morse commented Oct 23, 2019 • edited

ecstatic-morse commented Oct 23, 2019

tesuji commented Feb 24, 2021 • edited

nagisa commented May 9, 2021

JOE1994 commented Feb 19, 2024

tesuji commented Jul 9, 2019 •

edited

tesuji commented Jul 9, 2019 •

edited

mati865 commented Jul 9, 2019 •

edited

tesuji commented Jul 10, 2019 •

edited

ecstatic-morse commented Oct 23, 2019 •

edited

tesuji commented Oct 23, 2019 •

edited

ecstatic-morse commented Oct 23, 2019 •

edited

tesuji commented Feb 24, 2021 •

edited