[llvmlinux] "make test" for x86_64 target just hung there, why?

Sedat Dilek sedat.dilek at gmail.com
Sun Sep 13 07:59:26 UTC 2015

On Sun, Sep 13, 2015 at 9:17 AM, PaX Team <pageexec at gmail.com> wrote:
> On 13 Sep 2015 at 4:24, Sedat Dilek wrote:
>> On Wed, Sep 9, 2015 at 10:24 PM, PaX Team <pageexec at gmail.com> wrote:
>> > are you sure it's not lib/hweight.o instead? under gcc it's compiled
>> > with special flags (CONFIG_ARCH_HWEIGHT_CFLAGS) which clang doesn't
>> > support and we used to patch that out but i have no idea about the
>> > current state of affairs.
>> >
>> Hi pipacs,
>> YES, you are right!
>> One of your patches for x86_64 got archived [1] but is still required
>> to build x86_64 arch correctly.
> interesting, Jan-Simon said that patch was still part of the llvmlinux
> tree (it is really mandatory for clang) so i'm not sure what's going on.

That was a single patch [1].
Now there exists only [2] for x86_64 which was only the Kconfig part
(formerly the kernel-option CONFIG_ARCH_HWEIGHT_CFLAGS was commented

[1] http://git.linuxfoundation.org/?p=llvmlinux.git;a=blob_plain;f=arch/x86_64/patches/ARCHIVE/0029-Fix-ARCH_HWEIGHT-for-compilation-with-clang.patch;hb=HEAD
[2] http://git.linuxfoundation.org/?p=llvmlinux.git;a=blob;f=arch/x86_64/patches/hweight-x86.patch

>> Shouldn't your patch go upstream?
> well, certainly not in this form as it disables an otherwise useful
> optimization (popcnt vs. call to helper). the real/best solution would
> be if clang (or more like llvm in this case) added some support for
> changing the calling conventions via the fcall-saved-REG (and other
> similar) switches but IIRC there was a discussion (or bugzilla entry?)
> about it and the devs weren't in favour of this gcc feature as it
> punched a hole through too many abstraction layers for their taste.
> on the linux side the best (as in, probably upstreamable) fix would be to
> extend and make use of the thunking layer in arch/x86/entry/thunk_{32,64}.S
> as they properly save/restore the necessary registers (modulo eax/rax
> which are needed for the return value) that the above gcc switch would
> induce too.

I cannot say much to this statement and have attached the analyzes of
user sanjoyd (see file attachment).
This was looking at the code - no object-dump or requested IR output :-).

I simply trust you(r explanations).

A patch is welcome for testing.

[ Famous last word :-) ? ]

It took me several hours the last week to get my llvm-toolchain set up
and a llvmlinux-patched Linux v4.2 booting on "bare metal" (to quote
Bryce Lelback).

The big fight was on the llvmlinux  side.
Collecting patches - testing kernel-options (some are BROKEN, not
documented) - provided kernel-configs too old - no new checkpoints for
x86_64, etc. etc.

So, I am not complaining...
***I*** wanted to use my customized environment.
YES, you do NOT need to build a lllvm-toolchain via
llvmlinux-buildsystem (mine I use for building/testing Linux graphics
driver stack).
***I*** always wanted to test against "stable" (released) versions of
llvm-toolchain (here: v3.7.0) and Linux-kernel (here: v4.2).
And YES it worked...

It was somehow interesting to deal with the inline-optimizations.
As a conclusion, the llvm-toolchain or precisely clang seems to check
more strictly in this area.
Good to know!

A checkpoint with the versions I tested on x86_64 would be a last good
gift (a "giveback").
( I cannot promise - I spent and lost so many hours... )


- Sedat -

P.S.: List clang optimization levels attached (due to llvm people not
the correct way)

$ llvm-as < /dev/null | opt -O1 -disable-output -debug-pass=Arguments
2>&1 | tee ../clang-3-7-O-1.txt

$ llvm-as < /dev/null | opt -O2 -disable-output -debug-pass=Arguments
2>&1 | tee ../clang-3-7-O-2.txt

[1] http://stackoverflow.com/questions/15548023/clang-optimization-levels

[ From my backlog #llvm (OFTC) ]

2:39 PM <strcat> dileks_webchat: look at -mllvm -debug-pass=Arguments
or -mllvm -debug-pass=Structure with clang
3:31 PM <sanjoyd> nbjoerg: it seems t-- I also verified directly by
running 'clang -mllvm -print-after-all -O1' vs. -O2
-------------- next part --------------
2:25 PM → dileks_webchat joined (~oftc-webi at
2:25 PM <dileks_webchat> hi
2:26 PM <dileks_webchat> I discovered an inline-optimization bug with clang v3.7 when compiling a llvmlinux-patched Linux v4.2 when compilers's optimization-level is higher than -O2. level -O1 and -O0 are fine.
2:27 PM <dileks_webchat> how can I figure out which compiler-flags for inlining are set in the different optim-levels?
2:27 PM <dileks_webchat> -O2 is broken, too
2:29 PM <dileks_webchat> -Oz gives me a sane objdump but I cannot boot into such a Linux-kernel
2:33 PM <nbjoerg> why do you think it is a bug in clang?
2:35 PM <dileks_webchat> nbjoerg: you guess it could be workarounded in Linux :-)?
2:35 PM <dileks_webchat> OK, I found <http://stackoverflow.com/questions/15548023/clang-optimization-levels>
2:35 PM <nbjoerg> I wouldn't be surprised if the linux kernel depends on certain gcc behavior
2:36 PM <nbjoerg> heck, I explicitly expect it to do that
2:36 PM <dileks_webchat> yupp
2:36 PM <dileks_webchat> llvm-as < /dev/null | opt -O1 -disable-output -debug-pass=Arguments 2>&1 | tee ../clang-3-7-O-1.txt
2:36 PM <dileks_webchat> llvm-as < /dev/null | opt -O2 -disable-output -debug-pass=Arguments 2>&1 | tee ../clang-3-7-O-2.txt
2:36 PM <nbjoerg> there is no clang -O1
2:36 PM <nbjoerg> and opt and clang are completely different things
2:36 PM <dileks_webchat> so -O1 has -always inline and -O2 starts with -inline
2:37 PM <dileks_webchat> OK. I wanted to know what inline compiler-flags are set in -O1 and -O2
2:38 PM <nbjoerg> as usual with SO, that question is answered completely wrong
2:38 PM <nbjoerg> as I said, clang and opt are not related
2:38 PM <nbjoerg> most noticable, clang -O2 and clang -O1 are the *same* thingg
2:38 PM <dileks_webchat> nbjoerg: and how would you answer my question?
2:39 PM <dileks_webchat> inline compiler-flags for different -Ox
2:39 PM <strcat> dileks_webchat: look at -mllvm -debug-pass=Arguments or -mllvm -debug-pass=Structure with clang
2:43 PM <dileks_webchat> strcat: I did not get why above lines are not giving me the right hints
2:44 PM <strcat> dileks_webchat: opt is independent from clang and might (does?) use different passes
2:45 PM <dileks_webchat> can you give me a single line to check for example -O1?
2:51 PM <dileks_webchat> nbjoerg: no. with -O1 compiled Linux OK. with -O2 NOPE
2:56 PM <nbjoerg> dileks_webchat: that makes no sense. really, for clang -O1 and -O2 are exactly the same
2:58 PM <dileks_webchat> nbjoerg: I am struggling the whole day with inline noinline always_inline. you can trust me when I say a kernel boots here or not. cannot say it is on kernel-side or clang
3:00 PM <nbjoerg> I don't know what you are doing. but clang -O2 and clang -O1 are the same. if there is a difference, you are calling something else somewhere
3:02 PM <dileks_webchat> hmm, cflags are set in the main Makefile
3:14 PM <sanjoyd> nbjoerg: it looks like clang -O2 runs GVN and MergeLoadStoreMotion when clang -O1 does not?
3:15 PM <sanjoyd> They're predicated in PassManagerBuilder::populateModulePassManager under if (OptLevel > 1).
3:15 PM <sanjoyd> (At least on a recent clang)
3:16 PM <sanjoyd> dileks_webchat: do you have a specific file / function that you suspect is being miscompiled?
3:17 PM <sanjoyd> dileks_webchat: if yes, you could try diffing the IR you get after clang -O1 vs clang -O2, and see if you spot something suspicious.
3:21 PM <dileks_webchat> sanjoyd: if you give me clear instructions on how to do that - I can provide some material :-)
3:22 PM <dileks_webchat> lib/bitmap seems to be miscompiled - __bitmap_weight() causes a call-trace on early-boot
3:23 PM <sanjoyd> What's a call-trace?
3:24 PM <dileks_webchat> a regression
3:24 PM <dileks_webchat> sanjoyd: http://marc.info/?t=144156156700001&r=1&w=2
3:28 PM <sanjoyd> dileks_webchat: interesting, I assume the BUG: unable to handle kernel NULL pointer dereference is the issue?
3:28 PM <dileks_webchat> sanjoyd: yes
3:29 PM <dileks_webchat> I tried with noinline etc. the objdump looked sane but it did not boot
3:30 PM <dileks_webchat> now I forced always-inline inline optimization in include/linux/compiler-gcc.h where gcc behaviour is defined and is available for clang
3:30 PM <nbjoerg> sanjoyd: that would be a bug
3:30 PM <dileks_webchat> I forgot to add a -always-inline to my -O2 line
3:31 PM <nbjoerg> sanjoyd: if it makes a difference for clang, that is
3:31 PM <nbjoerg> sanjoyd: they are not supposed to be different
3:31 PM <sanjoyd> nbjoerg: it seems t-- I also verified directly by running 'clang -mllvm -print-after-all -O1' vs. -O2
3:31 PM <sanjoyd> And GVN shows up in O2 but not in O1
3:32 PM <nbjoerg> especially since -O => -O2
3:32 PM <sanjoyd> I could also be misreading the code / output; I'm not familiar with clang's pass ordering.
3:32 PM <dileks_webchat> it is different - sth. is optimized away
3:33 PM <dileks_webchat> or the linux code is buggy
3:33 PM <dileks_webchat> anyway I need a workaround
3:34 PM <sanjoyd> dileks_webchat: what is optimized away?
3:35 PM <sanjoyd> dileks_webchat: if you can show us the boots vs. does-not-boot IR, that'll be helpful.
3:35 PM <sanjoyd> Like, pastebin it or something.
3:35 PM <dileks_webchat> how do I do an "IR"?
3:36 PM <sanjoyd> The LLVM IR -- "it is different - sth. is optimized away" => I thought you had two snapshots of optimized IR that you were comparing.
3:38 PM <dileks_webchat> sanjoyd: I have objdumps sent here... http://marc.info/?l=linux-kernel&m=144179317507922&w=2
4:14 PM <sanjoyd> dileks_webchat: I think the bug is in __arch_hweight64
4:14 PM <sanjoyd> dileks_webchat: it does not specify that the called function can blow caller saved registers.
4:14 PM <sanjoyd> Like %rdx and %rdx; since you generate the call via a inline asm.
4:15 PM <sanjoyd> dileks_webchat: my guess is that the gcc version works purely by accident, since it keeps the "unsigned long *bitmap" in %rdx, and that just happens to be preserved by __arch_hweight64
4:16 PM <sanjoyd> dileks_webchat: an easy way to verify this will be to change "	asm (ALTERNATIVE("call __sw_hweight64", POPCNT64, X86_FEATURE_POPCNT)" to a normal C-level call to __sw_hweight64
4:16 PM <sanjoyd> dileks_webchat: and see if the kernel boots after that.
4:17 PM <dileks_webchat> sanjoyd: wow! hmm, I fell over an archived patch in llvmlinux
4:19 PM <dileks_webchat> http://git.linuxfoundation.org/?p=llvmlinux.git;a=blob_plain;f=arch/x86_64/patches/ARCHIVE/0029-Fix-ARCH_HWEIGHT-for-compilation-with-clang.patch;hb=HEAD
4:20 PM <dileks_webchat> sanjoyd: ^^
4:20 PM  → Philpax, +dexonsmith (voiced), Davidbrcz and dileks_webchat_ joined  ⇐ rendar, inolen and cp- quit  ↔ aditya_nandakumar popped in  
4:34 PM <dileks_webchat_> sanjoyd: lost my internet connection - umts/hspa here
4:36 PM ⇐ dileks_webchat quit (~oftc-webi at Ping timeout: 480 seconds
4:36 PM <dileks_webchat_> sanjoyd: is that channel logged - archived offline? I lost the backlog.
4:36 PM <sanjoyd> I don't think so.
4:36 PM <sanjoyd> dileks_webchat_: if there's something specific you want, I can PM it to you.
4:37 PM <dileks_webchat_> sanjoyd: oh yes
[02:12] <dileks_webchat_> nbjoerg: majnemer_ echristo_ sanjoyd I could boot into a llvmlinux-patched Linux v4.2 kernel - http://paste.ubuntu.com/12387430/ is required (not sure why this known issue and patch was archived).
[02:12] <dileks_webchat_> http://git.linuxfoundation.org/?p=llvmlinux.git;a=blob_plain;f=arch/x86_64/patches/ARCHIVE/0029-Fix-ARCH_HWEIGHT-for-compilation-with-clang.patch;hb=HEAD
[02:13] <dileks_webchat_> thanks for your help!
-------------- next part --------------
Pass Arguments:  -tti -no-aa -tbaa -scoped-noalias -assumption-cache-tracker -targetlibinfo -basicaa -verify -simplifycfg -domtree -sroa -early-cse -lower-expect
Pass Arguments:  -targetlibinfo -tti -no-aa -tbaa -scoped-noalias -assumption-cache-tracker -basicaa -ipsccp -globalopt -deadargelim -domtree -instcombine -simplifycfg -basiccg -prune-eh -inline-cost -always-inline -functionattrs -sroa -domtree -early-cse -lazy-value-info -jump-threading -correlated-propagation -simplifycfg -domtree -instcombine -tailcallelim -simplifycfg -reassociate -domtree -loops -loop-simplify -lcssa -loop-rotate -licm -loop-unswitch -instcombine -scalar-evolution -loop-simplify -lcssa -indvars -loop-idiom -loop-deletion -loop-unroll -memdep -memcpyopt -sccp -domtree -bdce -instcombine -lazy-value-info -jump-threading -correlated-propagation -domtree -memdep -dse -loops -loop-simplify -lcssa -licm -adce -simplifycfg -domtree -instcombine -barrier -float2int -domtree -loops -loop-simplify -lcssa -loop-rotate -branch-prob -block-freq -scalar-evolution -loop-accesses -loop-vectorize -instcombine -simplifycfg -domtree -instcombine -loops -loop-simplify -lcssa -scalar-evolution -loop-unroll -instcombine -loop-simplify -lcssa -licm -scalar-evolution -alignment-from-assumptions -strip-dead-prototypes -verify
-------------- next part --------------
Pass Arguments:  -tti -no-aa -tbaa -scoped-noalias -assumption-cache-tracker -targetlibinfo -basicaa -verify -simplifycfg -domtree -sroa -early-cse -lower-expect
Pass Arguments:  -targetlibinfo -tti -no-aa -tbaa -scoped-noalias -assumption-cache-tracker -basicaa -ipsccp -globalopt -deadargelim -domtree -instcombine -simplifycfg -basiccg -prune-eh -inline-cost -inline -functionattrs -sroa -domtree -early-cse -lazy-value-info -jump-threading -correlated-propagation -simplifycfg -domtree -instcombine -tailcallelim -simplifycfg -reassociate -domtree -loops -loop-simplify -lcssa -loop-rotate -licm -loop-unswitch -instcombine -scalar-evolution -loop-simplify -lcssa -indvars -loop-idiom -loop-deletion -loop-unroll -mldst-motion -domtree -memdep -gvn -memdep -memcpyopt -sccp -domtree -bdce -instcombine -lazy-value-info -jump-threading -correlated-propagation -domtree -memdep -dse -loops -loop-simplify -lcssa -licm -adce -simplifycfg -domtree -instcombine -barrier -float2int -domtree -loops -loop-simplify -lcssa -loop-rotate -branch-prob -block-freq -scalar-evolution -loop-accesses -loop-vectorize -instcombine -scalar-evolution -slp-vectorizer -simplifycfg -domtree -instcombine -loops -loop-simplify -lcssa -scalar-evolution -loop-unroll -instcombine -loop-simplify -lcssa -licm -scalar-evolution -alignment-from-assumptions -strip-dead-prototypes -elim-avail-extern -globaldce -constmerge -verify

More information about the LLVMLinux mailing list