Skip to content

Conversation

@Flamefire
Copy link
Contributor

@Flamefire Flamefire commented Dec 7, 2020

(created using eb --new-pr)

This fixes a bug when using GCC to compile code of the form (x > b) ? b : x on POWER which leads to GCC emitting a hardware instruction which does not correctly handle the NaN and signed zero cases, i.e. it would wrongly yield b when x is nan

@Flamefire
Copy link
Contributor Author

Test report by @Flamefire
SUCCESS
Build succeeded for 6 out of 6 (6 easyconfigs in total)
taurusml15 - Linux RHEL 7.6, POWER, 8335-GTX, Python 2.7.5
See https://gist.github.com/433ec5a3cfba58f9f493f25cbece19d5 for a full test report.

@Flamefire
Copy link
Contributor Author

Test report by @Flamefire
SUCCESS
Build succeeded for 6 out of 6 (6 easyconfigs in total)
taurusi6209.taurus.hrsk.tu-dresden.de - Linux RHEL 7.9, x86_64, Intel(R) Xeon(R) CPU E5-2680 v3 @ 2.50GHz, Python 2.7.5
See https://gist.github.com/bdb48cf6ec667422f416ad63d1950963 for a full test report.

@Micket Micket added the bug fix label Dec 7, 2020
Copy link
Contributor

@Micket Micket left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

lgtm

@Micket
Copy link
Contributor

Micket commented Dec 7, 2020

@boegelbot please test @ generoso

@boegelbot
Copy link
Collaborator

@Micket: Request for testing this PR well received on generoso

PR test command 'EB_PR=11837 EB_ARGS= /apps/slurm/default/bin/sbatch --job-name test_PR_11837 --ntasks=4 ~/boegelbot/eb_from_pr_upload_generoso.sh' executed!

  • exit code: 0
  • output:
Submitted batch job 12238

Test results coming soon (I hope)...

Details

- notification for comment with ID 740027004 processed

Message to humans: this is just bookkeeping information for me,
it is of no use to you (unless you think I have a bug, which I don't).

@boegelbot
Copy link
Collaborator

Test report by @boegelbot
FAILED
Build succeeded for 5 out of 6 (6 easyconfigs in total)
generoso-c1-s-2 - Linux centos linux 8.2.2004, x86_64, Intel(R) Xeon(R) CPU E5-2667 v3 @ 3.20GHz (haswell), Python 3.6.8
See https://gist.github.com/8608eae8ba38da5686c725c96dcdf656 for a full test report.

@Flamefire
Copy link
Contributor Author

@Micket The log doesn't contain the information about what went wrong with GCC 8.1. I'd assume a flake somewhere but can't tell. This should not make it fail where it worked before.

@Micket
Copy link
Contributor

Micket commented Dec 8, 2020

Test report by @Micket
SUCCESS
Build succeeded for 4 out of 4 (1 easyconfigs in total)
hebbe-c1 - Linux centos linux 7.9.2009, x86_64, Intel Core Processor (Haswell, no TSX), Python 2.7.5
See https://gist.github.com/e624533351c3f7ee5745abfe8b3d5aa9 for a full test report.

@Micket
Copy link
Contributor

Micket commented Dec 8, 2020

(edit: ignore this failure, I probably messed it up)

@Micket
Copy link
Contributor

Micket commented Dec 8, 2020

Test report by @Micket
SUCCESS
Build succeeded for 3 out of 3 (2 easyconfigs in this PR)
vera-c1 - Linux centos linux 7.9.2009, x86_64, Intel Xeon Processor (Skylake), Python 3.6.8
See https://gist.github.com/19ce1344f7593f984bd0471570fefff3 for a full test report.

@Micket
Copy link
Contributor

Micket commented Dec 8, 2020

Test report by @Micket
SUCCESS
Build succeeded for 2 out of 2 (2 easyconfigs in this PR)
hebbe-c1 - Linux centos linux 7.9.2009, x86_64, Intel Core Processor (Haswell, no TSX), Python 3.6.8
See https://gist.github.com/7dd6586c52026f69f728e6a67588a972 for a full test report.

@Micket Micket added this to the next release (4.3.3?) milestone Dec 11, 2020
@branfosj
Copy link
Member

branfosj commented Dec 13, 2020

The GCC 8.1 failure is:

../../../../libsanitizer/sanitizer_common/sanitizer_platform_limits_posix.cc:160:10: fatal error: sys/ustat.h: No such file or directory

The glibc 2.28 release notes say:

  • The obsolete function ustat is no longer available to newly linked
    binaries; the headers <ustat.h> and <sys/ustat.h> have been removed. This
    function has been deprecated in favor of fstatfs and statfs.

So, the failure on CentOS 8 is not caused by the changes in this PR.

@branfosj
Copy link
Member

Test report by @branfosj
SUCCESS
Build succeeded for 5 out of 5 (5 easyconfigs in total)
bear-pg0306u19a - Linux RHEL 8.2, POWER, 8335-GTX (power9le), Python 3.6.8
See https://gist.github.com/6acbebb168b3776307d603c30bd37ec5 for a full test report.

@Flamefire
Copy link
Contributor Author

The CentOS failure is aweful, but the patch should be pretty easy: There is unsigned struct_ustat_sz = sizeof(struct ustat); in the sanitizer_platform_limits_posix.cc and one usage of that in sanitizer_common_syscalls.inc which can be removed. Or the following patch from upstream GCC 9:

diff --git a/libsanitizer/sanitizer_common/sanitizer_platform_limits_posix.cc b/libsanitizer/sanitizer_common/sanitizer_platform_limits_posix.cc
index 858bb218450..de18e56d11c 100644
--- a/libsanitizer/sanitizer_common/sanitizer_platform_limits_posix.cc
+++ b/libsanitizer/sanitizer_common/sanitizer_platform_limits_posix.cc
@@ -157,7 +157,6 @@ typedef struct user_fpregs elf_fpregset_t;
 # include <sys/procfs.h>
 #endif
 #include <sys/user.h>
-#include <sys/ustat.h>
 #include <linux/cyclades.h>
 #include <linux/if_eql.h>
 #include <linux/if_plip.h>
@@ -250,7 +249,19 @@ namespace __sanitizer {
 #endif // SANITIZER_LINUX || SANITIZER_FREEBSD
 
 #if SANITIZER_LINUX && !SANITIZER_ANDROID
-  unsigned struct_ustat_sz = sizeof(struct ustat);
+  // Use pre-computed size of struct ustat to avoid <sys/ustat.h> which
+  // has been removed from glibc 2.28.
+#if defined(__aarch64__) || defined(__s390x__) || defined (__mips64) \
+  || defined(__powerpc64__) || defined(__arch64__) || defined(__sparcv9) \
+  || defined(__x86_64__)
+#define SIZEOF_STRUCT_USTAT 32
+#elif defined(__arm__) || defined(__i386__) || defined(__mips__) \
+  || defined(__powerpc__) || defined(__s390__) || defined(__sparc__)
+#define SIZEOF_STRUCT_USTAT 20
+#else
+#error Unknown size of struct ustat
+#endif
+  unsigned struct_ustat_sz = SIZEOF_STRUCT_USTAT;
   unsigned struct_rlimit64_sz = sizeof(struct rlimit64);
   unsigned struct_statvfs64_sz = sizeof(struct statvfs64);
 #endif // SANITIZER_LINUX && !SANITIZER_ANDROID

See commit 71b55d45e4304f5e2e98ac30473c581f58fc486b from the github GCC mirror

@Micket
Copy link
Contributor

Micket commented Dec 16, 2020

Going in, thanks @Flamefire!

@Micket Micket merged commit 4066744 into easybuilders:develop Dec 16, 2020
@Micket
Copy link
Contributor

Micket commented Dec 16, 2020

making sure boegelbot has the patched versions built for its the toolchains so that it we can properly test it out with the rest of the PRs:
@boegelbot please test @ generoso

@boegelbot
Copy link
Collaborator

@Micket: Request for testing this PR well received on generoso

PR test command 'EB_PR=11837 EB_ARGS= /apps/slurm/default/bin/sbatch --job-name test_PR_11837 --ntasks=4 ~/boegelbot/eb_from_pr_upload_generoso.sh' executed!

  • exit code: 0
  • output:
Submitted batch job 12308

Test results coming soon (I hope)...

Details

- notification for comment with ID 746764885 processed

Message to humans: this is just bookkeeping information for me,
it is of no use to you (unless you think I have a bug, which I don't).

@Micket
Copy link
Contributor

Micket commented Dec 16, 2020

I expect 8.1.0 to fail here, but if anyone cares about it, then lets make a new PR for that fix.

@boegel boegel changed the title Fix miscompilation bug on POWER with GCC add patch to fix miscompilation bug on POWER for GCC 8.x and 9.x Dec 16, 2020
@boegel
Copy link
Member

boegel commented Dec 16, 2020

@boegelbot please test @ generoso
CORE_CNT=16

(@Micket I cancelled the previous job, was only using 4 cores...)

@boegelbot
Copy link
Collaborator

@boegel: Request for testing this PR well received on generoso

PR test command 'EB_PR=11837 EB_ARGS= /apps/slurm/default/bin/sbatch --job-name test_PR_11837 --ntasks="16" ~/boegelbot/eb_from_pr_upload_generoso.sh' executed!

  • exit code: 0
  • output:
Submitted batch job 12311

Test results coming soon (I hope)...

Details

- notification for comment with ID 746928428 processed

Message to humans: this is just bookkeeping information for me,
it is of no use to you (unless you think I have a bug, which I don't).

@boegelbot
Copy link
Collaborator

Test report by @boegelbot
FAILED
Build succeeded for 4 out of 6 (6 easyconfigs in total)
generoso-x-1 - Linux centos linux 8.2.2004, x86_64, Intel(R) Xeon(R) CPU E5-2667 v3 @ 3.20GHz (haswell), Python 3.6.8
See https://gist.github.com/8b7e9dee72ef51c364ee8f9a495d3586 for a full test report.

@Micket
Copy link
Contributor

Micket commented Dec 16, 2020

@boegel Lock file on 8.3.0 build present

@Flamefire Flamefire deleted the 20201207111848_new_pr_GCCcore810 branch December 17, 2020 07:22
@easybuilders easybuilders deleted a comment from boegelbot Dec 18, 2020
@boegel
Copy link
Member

boegel commented Dec 18, 2020

@boegelbot please test @ generoso
EB_ARGS="GCCcore-8.3.0.eb"
CORE_CNT=16

(lock file for GCC 8.3.0 removed)

@boegelbot
Copy link
Collaborator

@boegel: Request for testing this PR well received on generoso

PR test command 'EB_PR=11837 EB_ARGS="CCcore-8.3.0.eb" /apps/slurm/default/bin/sbatch --job-name test_PR_11837 --ntasks="16" ~/boegelbot/eb_from_pr_upload_generoso.sh' executed!

  • exit code: 0
  • output:
Submitted batch job 12324

Test results coming soon (I hope)...

Details

- notification for comment with ID 747919475 processed

Message to humans: this is just bookkeeping information for me,
it is of no use to you (unless you think I have a bug, which I don't).

@boegel
Copy link
Member

boegel commented Dec 18, 2020

Hmm, that boegelbot run failed with "ERROR: One or more files not found: CCcore-8.3.0.eb", something is eating that G...

@Flamefire
Copy link
Contributor Author

Flamefire commented Dec 18, 2020

Found your bug: value = item.strip(key + '=') does not remove the given prefix but all chars you passed as the argument.

See 'EB_ARGS=GCCcore-8.3.0.eb'.strip("EB_ARGS=") and https://github.com/boegel/boegelbot/blob/cc3ed0e18b2bc43d8ee7ea9bf1cef8969cec992e/boegelbot.py#L534
How about using itemkey, value = item.split('=', 2)

@Micket
Copy link
Contributor

Micket commented Dec 18, 2020

Should be just one max split itemkey, value = item.split('=', 1)

@boegel
Copy link
Member

boegel commented Dec 18, 2020

Ouch... Thanks, that was so easy to overlook.

@boegel
Copy link
Member

boegel commented Dec 18, 2020

@boegelbot please test @ generoso
EB_ARGS="GCCcore-8.3.0.eb"
CORE_CNT=16

@boegelbot
Copy link
Collaborator

@boegel: Request for testing this PR well received on generoso

PR test command 'EB_PR=11837 EB_ARGS="GCCcore-8.3.0.eb" /apps/slurm/default/bin/sbatch --job-name test_PR_11837 --ntasks="16" ~/boegelbot/eb_from_pr_upload_generoso.sh' executed!

  • exit code: 0
  • output:
Submitted batch job 12325

Test results coming soon (I hope)...

Details

- notification for comment with ID 748020112 processed

Message to humans: this is just bookkeeping information for me,
it is of no use to you (unless you think I have a bug, which I don't).

@boegelbot
Copy link
Collaborator

Test report by @boegelbot
SUCCESS
Build succeeded for 1 out of 1 (1 easyconfigs in total)
generoso-x-2 - Linux centos linux 8.2.2004, x86_64, Intel(R) Xeon(R) CPU E5-2667 v3 @ 3.20GHz (haswell), Python 3.6.8
See https://gist.github.com/59222a93e75857ba5a6132fcc0e12de8 for a full test report.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

Projects

None yet

Development

Successfully merging this pull request may close these issues.

5 participants