Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Missed optimization on array comparison #62531

Closed
tesuji opened this issue Jul 9, 2019 · 18 comments · Fixed by #125298
Closed

Missed optimization on array comparison #62531

tesuji opened this issue Jul 9, 2019 · 18 comments · Fixed by #125298
Labels
A-codegen Area: Code generation A-const-eval Area: constant evaluation (mir interpretation) A-LLVM Area: Code generation parts specific to LLVM. Both correctness bugs and optimization-related issues. C-enhancement Category: An issue proposing an enhancement or a PR with one. C-optimization Category: An issue highlighting optimization opportunities or PRs implementing such E-needs-test Call for participation: Writing correctness tests. I-slow Issue: Problems and improvements with respect to performance of generated code. T-compiler Relevant to the compiler team, which will review and decide on the PR/issue.

Comments

@tesuji
Copy link
Contributor

tesuji commented Jul 9, 2019

The godbolt link: https://godbolt.org/z/dc9o3x

I think this snippet should just return true:

pub fn compare() -> bool {
    let bytes = 12.5f32.to_ne_bytes();
    bytes == if cfg!(target_endian = "big") {
        [0x41, 0x48, 0x00, 0x00]
    } else {
        [0x00, 0x00, 0x48, 0x41]
    }
}

The generated asm with opt-level=3:

example::compare:
        push    rax
        mov     al, 1
        pop     rcx
        ret
@mati865
Copy link
Contributor

mati865 commented Jul 9, 2019

I think there are 2 issues:

  • to_ne_bytes is not const fn
  • if cfg! is not const fn

See this modified example: https://godbolt.org/z/86WRHF

@tesuji
Copy link
Contributor Author

tesuji commented Jul 9, 2019

I may misunderstand but in the C version, does clang require similar conditions to optimize code?
@rustbot modify labels: A-codegen A-const-eval

@mati865
Copy link
Contributor

mati865 commented Jul 9, 2019

I may misunderstand but in the C version, does clang require similar condition to optimize code?

#if __BYTE_ORDER__ == __ORDER_BIG_ENDIAN__
    return *(uint32_t *)gf_u.bytes == *(uint32_t *)big;
#elif __BYTE_ORDER__ == __ORDER_LITTLE_ENDIAN__
    return *(uint32_t *)gf_u.bytes == *(uint32_t *)lit;
#endif

Those are preprocessor directives and are handled at compile time so their Rust equivalent would be:
https://godbolt.org/z/Ou1bTH

#![feature(float_to_from_bytes)]
#![feature(stmt_expr_attributes)]
pub fn compare() -> bool {
    let gf_u = 12.5f32;
    #[cfg(target_endian = "big")]
    return gf_u.to_ne_bytes() == [0x41, 0x48, 0x00, 0x00];
    #[cfg(not(target_endian = "big"))]
    return gf_u.to_ne_bytes() == [0x00, 0x00, 0x48, 0x41];
}

@tesuji
Copy link
Contributor Author

tesuji commented Jul 9, 2019

https://doc.rust-lang.org/nightly/std/macro.cfg.html states that cfg! macro:

Evaluates boolean combinations of configuration flags at compile-time.

@mati865
Copy link
Contributor

mati865 commented Jul 9, 2019

Good point, those two are equal:
https://godbolt.org/z/V7Iuoo

#![feature(float_to_from_bytes)]
pub fn compare() -> bool {
    let gf_u = 12.5f32;
    #[cfg(target_endian = "big")]
    return gf_u.to_ne_bytes() == [0x41, 0x48, 0x00, 0x00];
    #[cfg(not(target_endian = "big"))]
    return gf_u.to_ne_bytes() == [0x00, 0x00, 0x48, 0x41];
}

pub fn compare2() -> bool {
    let bytes = 12.5f32.to_ne_bytes();
    if cfg!(target_endian = "big") {
        bytes == [0x41, 0x48, 0x00, 0x00]
    } else {
        bytes == [0x00, 0x00, 0x48, 0x41]
    }
}

I don't know enough about const and I could be totally wrong but I think in the first case if cfg! expands to something like:

#![feature(float_to_from_bytes)]
pub fn compare() -> bool {
    let bytes = 12.5f32.to_ne_bytes();
    let arr = if cfg!(target_endian = "big") {
         [0x41, 0x48, 0x00, 0x00]
    } else {
        [0x00, 0x00, 0x48, 0x41]
    };

    bytes == arr
}

It cannot make arr constant because #49146 is not yet complete.

@rustbot rustbot added A-codegen Area: Code generation A-const-eval Area: constant evaluation (mir interpretation) labels Jul 10, 2019
@tesuji
Copy link
Contributor Author

tesuji commented Jul 10, 2019

Interesting, If I look at MIR output, I think rustc already have enough information:

MIR
// WARNING: This output format is intended for human consumers only
// and is subject to change without notice. Knock yourself out.
fn  compare() -> bool {
    let mut _0: bool;                    // return place in scope 0 at src/lib.rs:2:21: 2:25
    let _1: [u8; 4];                     // "bytes" in scope 0 at src/lib.rs:3:9: 3:14
    let mut _2: &[u8; 4];                // in scope 0 at src/lib.rs:4:5: 4:10
    let mut _3: &[u8; 4];                // in scope 0 at src/lib.rs:4:14: 8:6
    let _4: [u8; 4];                     // in scope 0 at src/lib.rs:4:14: 8:6
    let mut _5: bool;                    // in scope 0 at src/lib.rs:4:17: 4:44
    scope 1 {
    }

    bb0: {
        StorageLive(_1);                 // bb0[0]: scope 0 at src/lib.rs:3:9: 3:14
        _1 = const core::f32::<impl f32>::to_ne_bytes(const 12.5f32) -> bb1; // bb0[1]: scope 0 at src/lib.rs:3:17: 3:38
                                         // ty::Const
                                         // + ty: fn(f32) -> [u8; _] {core::f32::<impl f32>::to_ne_bytes}
                                         // + val: Scalar(<ZST>)
                                         // mir::Constant
                                         // + span: src/lib.rs:3:25: 3:36
                                         // + ty: fn(f32) -> [u8; _] {core::f32::<impl f32>::to_ne_bytes}
                                         // + literal: Const { ty: fn(f32) -> [u8; _] {core::f32::<impl f32>::to_ne_bytes}, val: Scalar(<ZST>) }
                                         // ty::Const
                                         // + ty: f32
                                         // + val: Scalar(0x41480000)
                                         // mir::Constant
                                         // + span: src/lib.rs:3:17: 3:24
                                         // + ty: f32
                                         // + literal: Const { ty: f32, val: Scalar(0x41480000) }
    }

    bb1: {
        StorageLive(_2);                 // bb1[0]: scope 1 at src/lib.rs:4:5: 4:10
        _2 = &_1;                        // bb1[1]: scope 1 at src/lib.rs:4:5: 4:10
        StorageLive(_3);                 // bb1[2]: scope 1 at src/lib.rs:4:14: 8:6
        StorageLive(_4);                 // bb1[3]: scope 1 at src/lib.rs:4:14: 8:6
        StorageLive(_5);                 // bb1[4]: scope 1 at src/lib.rs:4:17: 4:44
        _5 = const false;                // bb1[5]: scope 1 at src/lib.rs:4:17: 4:44
                                         // ty::Const
                                         // + ty: bool
                                         // + val: Scalar(0x00)
                                         // mir::Constant
                                         // + span: src/lib.rs:4:17: 4:44
                                         // + ty: bool
                                         // + literal: Const { ty: bool, val: Scalar(0x00) }
        switchInt(_5) -> [false: bb2, otherwise: bb3]; // bb1[6]: scope 1 at src/lib.rs:4:14: 8:6
    }

    bb2: {
        _4 = [const 0u8, const 0u8, const 72u8, const 65u8]; // bb2[0]: scope 1 at src/lib.rs:7:9: 7:33
                                         // ty::Const
                                         // + ty: u8
                                         // + val: Scalar(0x00)
                                         // mir::Constant
                                         // + span: src/lib.rs:7:10: 7:14
                                         // + ty: u8
                                         // + literal: Const { ty: u8, val: Scalar(0x00) }
                                         // ty::Const
                                         // + ty: u8
                                         // + val: Scalar(0x00)
                                         // mir::Constant
                                         // + span: src/lib.rs:7:16: 7:20
                                         // + ty: u8
                                         // + literal: Const { ty: u8, val: Scalar(0x00) }
                                         // ty::Const
                                         // + ty: u8
                                         // + val: Scalar(0x48)
                                         // mir::Constant
                                         // + span: src/lib.rs:7:22: 7:26
                                         // + ty: u8
                                         // + literal: Const { ty: u8, val: Scalar(0x48) }
                                         // ty::Const
                                         // + ty: u8
                                         // + val: Scalar(0x41)
                                         // mir::Constant
                                         // + span: src/lib.rs:7:28: 7:32
                                         // + ty: u8
                                         // + literal: Const { ty: u8, val: Scalar(0x41) }
        goto -> bb4;                     // bb2[1]: scope 1 at src/lib.rs:4:14: 8:6
    }

    bb3: {
        _4 = [const 65u8, const 72u8, const 0u8, const 0u8]; // bb3[0]: scope 1 at src/lib.rs:5:9: 5:33
                                         // ty::Const
                                         // + ty: u8
                                         // + val: Scalar(0x41)
                                         // mir::Constant
                                         // + span: src/lib.rs:5:10: 5:14
                                         // + ty: u8
                                         // + literal: Const { ty: u8, val: Scalar(0x41) }
                                         // ty::Const
                                         // + ty: u8
                                         // + val: Scalar(0x48)
                                         // mir::Constant
                                         // + span: src/lib.rs:5:16: 5:20
                                         // + ty: u8
                                         // + literal: Const { ty: u8, val: Scalar(0x48) }
                                         // ty::Const
                                         // + ty: u8
                                         // + val: Scalar(0x00)
                                         // mir::Constant
                                         // + span: src/lib.rs:5:22: 5:26
                                         // + ty: u8
                                         // + literal: Const { ty: u8, val: Scalar(0x00) }
                                         // ty::Const
                                         // + ty: u8
                                         // + val: Scalar(0x00)
                                         // mir::Constant
                                         // + span: src/lib.rs:5:28: 5:32
                                         // + ty: u8
                                         // + literal: Const { ty: u8, val: Scalar(0x00) }
        goto -> bb4;                     // bb3[1]: scope 1 at src/lib.rs:4:14: 8:6
    }

    bb4: {
        _3 = &_4;                        // bb4[0]: scope 1 at src/lib.rs:4:14: 8:6
        _0 = const <[u8; 4] as std::cmp::PartialEq>::eq(move _2, move _3) -> bb5; // bb4[1]: scope 1 at src/lib.rs:4:5: 8:6
                                         // ty::Const
                                         // + ty: for<'r, 's> fn(&'r [u8; 4], &'s [u8; 4]) -> bool {<[u8; 4] as std::cmp::PartialEq>::eq}
                                         // + val: Scalar(<ZST>)
                                         // mir::Constant
                                         // + span: src/lib.rs:4:5: 8:6
                                         // + ty: for<'r, 's> fn(&'r [u8; 4], &'s [u8; 4]) -> bool {<[u8; 4] as std::cmp::PartialEq>::eq}
                                         // + literal: Const { ty: for<'r, 's> fn(&'r [u8; 4], &'s [u8; 4]) -> bool {<[u8; 4] as std::cmp::PartialEq>::eq}, val: Scalar(<ZST>) }
    }

    bb5: {
        StorageDead(_3);                 // bb5[0]: scope 1 at src/lib.rs:8:5: 8:6
        StorageDead(_2);                 // bb5[1]: scope 1 at src/lib.rs:8:5: 8:6
        StorageDead(_1);                 // bb5[2]: scope 0 at src/lib.rs:9:1: 9:2
        StorageDead(_5);                 // bb5[3]: scope 0 at src/lib.rs:9:1: 9:2
        StorageDead(_4);                 // bb5[4]: scope 0 at src/lib.rs:9:1: 9:2
        return;                          // bb5[5]: scope 0 at src/lib.rs:9:2: 9:2
    }
}

@ecstatic-morse
Copy link
Contributor

@mati865 Whether a function is marked const at the source level has no effect on whether or not it can be optimized by LLVM. Even if to_ne_bytes was not const, inlining and const propagation could conceivably reduce this to a constant.

@nikic nikic added A-LLVM Area: Code generation parts specific to LLVM. Both correctness bugs and optimization-related issues. I-slow Issue: Problems and improvements with respect to performance of generated code. labels Jul 10, 2019
@ecstatic-morse
Copy link
Contributor

ecstatic-morse commented Oct 23, 2019

Rust now generates the following (almost optimal) assembly for the OP's code:

example::compare:
  push rax
  mov al, 1
  pop rcx
  ret

I would guess that recent improvements to const propagation in the MIR are responsible for this (thanks @wesleywiser!), but it might also have been a change in LLVM. This seems like a decent candidate for a codegen regression test, although an equivalent one might already exist.

@tesuji
Copy link
Contributor Author

tesuji commented Oct 23, 2019

It is better now. But are push rax and pop rcx instructions necessary?

@ecstatic-morse
Copy link
Contributor

I assume it is due to a difference in calling convention, but it does seem kind of silly. You'll need to ask someone more knowledgeable unfortunately.

@ecstatic-morse
Copy link
Contributor

@lzutao A quick google leads me to believe that the push is used to align the stack to a 16-byte boundary.

@tesuji
Copy link
Contributor Author

tesuji commented Oct 23, 2019

So does it: https://stackoverflow.com/a/37774474/5456794. Looks like only regression tests are needed.

@tesuji
Copy link
Contributor Author

tesuji commented Oct 23, 2019

This version doesn't need any stack alignment: https://godbolt.org/z/6donWz

pub fn compare() -> bool {
    let gf_u = 12.5f32;
    #[cfg(target_endian = "big")]
    return gf_u.to_ne_bytes() == [0x41, 0x48, 0x00, 0x00];
    #[cfg(not(target_endian = "big"))]
    return gf_u.to_ne_bytes() == [0x00, 0x00, 0x48, 0x41];
}

@ecstatic-morse
Copy link
Contributor

ecstatic-morse commented Oct 23, 2019

I tried to leave this alone since I'm not very knowledgeable in this area, but curiosity got the better of me. The following might be incorrect, but it's my best guess.

In the optimized LLVM IR for rust, a call to bcmp remains which has been optimized out in the C++ version. I believe stack alignment on x64 is required only prior to a call instruction, so LLVM sees that a non-trivial call instruction exists in the IR from rust and that compare only has an 8-byte return pointer on the stack, meaning the stack would be misaligned for that call. That's why it inserts the stack alignment code in the rust version only.

At some point during translation to x86 assembly, the bcmp gets removed entirely (I have no idea how), resulting in a function that has a push/pop to align the stack for a call but that doesn't actually call anything.

Perhaps someone more familiar with the various LLVM passes could say more.

BTW, a fairer C++ comparison is the following, but this gives the same results:

int compare() {
    uint8_t big[4] = {0x41, 0x48, 0x00, 0x00};
    uint8_t lit[4] = {0x00, 0x00, 0x48, 0x41};
    ieee_float_shape_type gf_u;
    gf_u.value = 12.5;

    if (__BYTE_ORDER__ == __ORDER_BIG_ENDIAN__) {
        return *(uint32_t *)gf_u.bytes == *(uint32_t *)big;
    } else {
        return *(uint32_t *)gf_u.bytes == *(uint32_t *)lit;
    }
}

@ecstatic-morse
Copy link
Contributor

There's some discussion of this in Zulip. I was indeed wrong BTW 😄.

@jonas-schievink jonas-schievink added T-compiler Relevant to the compiler team, which will review and decide on the PR/issue. C-enhancement Category: An issue proposing an enhancement or a PR with one. labels Mar 14, 2020
@tesuji
Copy link
Contributor Author

tesuji commented Feb 24, 2021

This issue appears to be fixed on beta. Maybe we need a codegen test for it.

Edit: It seems that the llvm-ir generated by rustc are the same between beta and stable.
Only the optimized asm for x86 is different. Should I just write assembly codegen or just close this issue.

@rustbot rustbot added the E-needs-test Call for participation: Writing correctness tests. label Feb 24, 2021
@nagisa
Copy link
Member

nagisa commented May 9, 2021

The issue seems to have re-appeared in master :(

@nagisa nagisa removed the E-needs-test Call for participation: Writing correctness tests. label May 9, 2021
@workingjubilee workingjubilee added the C-optimization Category: An issue highlighting optimization opportunities or PRs implementing such label Oct 8, 2023
@JOE1994
Copy link
Contributor

JOE1994 commented Feb 19, 2024

Both rustc 1.76.0 & rustc nightly (as of Feb 19 2024) generates the following asm (with -C opt-level=3)

example::compare:
        mov     al, 1
        ret

@nikic nikic added the E-needs-test Call for participation: Writing correctness tests. label Feb 20, 2024
@tesuji tesuji closed this as completed May 19, 2024
matthiaskrgr added a commit to matthiaskrgr/rust that referenced this issue May 20, 2024
Add codegen test for array comparision opt

Fixed since rust 1.55
closes rust-lang#62531
bors added a commit to rust-lang-ci/rust that referenced this issue May 21, 2024
Add codegen test for array comparision opt

Fixed since rust 1.55
closes rust-lang#62531
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
A-codegen Area: Code generation A-const-eval Area: constant evaluation (mir interpretation) A-LLVM Area: Code generation parts specific to LLVM. Both correctness bugs and optimization-related issues. C-enhancement Category: An issue proposing an enhancement or a PR with one. C-optimization Category: An issue highlighting optimization opportunities or PRs implementing such E-needs-test Call for participation: Writing correctness tests. I-slow Issue: Problems and improvements with respect to performance of generated code. T-compiler Relevant to the compiler team, which will review and decide on the PR/issue.
Projects
None yet
9 participants