Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Add building and testing of PGO+LTO LLVM and Julia #345

Open
wants to merge 57 commits into
base: main
Choose a base branch
from

Conversation

Zentrik
Copy link
Contributor

@Zentrik Zentrik commented Mar 26, 2024

I ended up just using the profile from Julia being built for PGO, locally it seems to work fine. If anyone has better ideas for what to profile, please do say.

I don't think we need to run the full test suite but it seems good to run at least some stuff to make sure nothing got messed up. The testing seems to finish a fair bit quicker than the gnuassertrr tests and not that far behind the other test pipelines, so this shouldn't slow down CI much.

@Zentrik
Copy link
Contributor Author

Zentrik commented Mar 26, 2024

I don't think I have permissions to fix ERROR: Pipeline '.buildkite/pipelines/main/launch_upload_jobs.yml' fails treehash signature check! You may need to re-run cryptic/bin/sign_treehashes!.

@Zentrik
Copy link
Contributor Author

Zentrik commented Mar 26, 2024

Build currently failing as the bump to patchelf JuliaLang/julia#53845 introduced a check that the compiler supports C++ 17. Not sure why that check is failing on CI as it does seem to be using clang 16.

We run /cache/build/builder-amdci4-3/julialang/julia-buildkite/deps/srccache/patchelf-0.18.0//configure --prefix=/cache/build/builder-amdci4-3/julialang/julia-buildkite/contrib/pgo-lto/stage1.build/usr --build=x86_64-unknown-linux-gnu --libdir=/cache/build/builder-amdci4-3/julialang/julia-buildkite/contrib/pgo-lto/stage1.build/usr/lib --bindir=/cache/build/builder-amdci4-3/julialang/julia-buildkite/contrib/pgo-lto/stage1.build/usr/tools LDFLAGS="-fuse-ld=lld -flto=thin -fprofile-generate=/cache/build/builder-amdci4-3/julialang/julia-buildkite/contrib/pgo-lto/profiles -Wl,-rpath,'\$\$ORIGIN' -Wl,-z,origin -Wl,-rpath-link,/cache/build/builder-amdci4-3/julialang/julia-buildkite/contrib/pgo-lto/stage1.build/usr/lib -Wl,--enable-new-dtags " F77="gfortran -m64" CC="/cache/build/builder-amdci4-3/julialang/julia-buildkite/contrib/pgo-lto/stage0.build/usr/tools/clang " CXX="/cache/build/builder-amdci4-3/julialang/julia-buildkite/contrib/pgo-lto/stage0.build/usr/tools/clang++ " LD="/cache/build/builder-amdci4-3/julialang/julia-buildkite/contrib/pgo-lto/stage0.build/usr/tools/ld.lld" LDFLAGS="" CPPFLAGS="" for the PGO build and the failure is checking whether /cache/build/builder-amdci4-3/julialang/julia-buildkite/contrib/pgo-lto/stage0.build/usr/tools/clang++ supports C++17 features with -std=c++17... no.
For the non PGO build we run /cache/build/tester-amdci5-9/julialang/julia-buildkite/deps/srccache/patchelf-0.18.0//configure --prefix=/cache/build/tester-amdci5-9/julialang/julia-buildkite/usr --build=x86_64-linux-gnu --libdir=/cache/build/tester-amdci5-9/julialang/julia-buildkite/usr/lib --bindir=/cache/build/tester-amdci5-9/julialang/julia-buildkite/usr/tools LDFLAGS=" -Wl,-rpath,'\$\$ORIGIN' -Wl,-z,origin -Wl,-rpath-link,/cache/build/tester-amdci5-9/julialang/julia-buildkite/usr/lib -Wl,--enable-new-dtags " F77="gfortran -m64" CC="gcc -m64 " CXX="g++ -m64 " LD="ld" LDFLAGS="" CPPFLAGS="" and checking whether g++ -m64 supports C++17 features with -std=c++17... yes is the relevant check.

The line that runs the check is https://github.com/NixOS/patchelf/blob/99c24238981b7b1084313aca8f5c493bb46f302c/configure.ac#L35 and the macro defined in https://github.com/NixOS/patchelf/blob/master/m4/ax_cxx_compile_stdcxx.m4.

The only obvious difference I see is the -m64 but I really have no clue why this is failing on CI.

@Zentrik
Copy link
Contributor Author

Zentrik commented Mar 26, 2024

Maybe clang is picking the wrong gcc?

@Zentrik
Copy link
Contributor Author

Zentrik commented Mar 26, 2024

Ok so the actual error is

configure:6320: /cache/build/builder-amdci4-6/julialang/julia-buildkite/contrib/pgo-lto/stage0.build/usr/tools/clang++ -m64  -std=c++17 -c -fprofile-generate=/cache/build/builder-amdci4-6/julialang/julia-buildkite/contrib/pgo-lto/profiles -Xclang -mllvm -Xclang -vp-counters-per-site=6  conftest.cpp >&5
conftest.cpp:437:10: fatal error: 'initializer_list' file not found
#include <initializer_list>

JuliaLang/julia#45641 (comment) mentions something about libstdc++ detection, so maybe something to do with that.
Clang does seem to be loading /usr/lib/gcc/x86_64-linux-gnu/10 the only gcc installation it finds whereas under 'Print software versions' we see gcc version 9.

@haampie
Copy link

haampie commented Mar 26, 2024

Is that a partial gcc 10 install? No g++10 / libstdc++-10-dev?

You can force clang to use gcc 9 if necessary with --gcc-install-dir=/usr/lib/gcc/x86_64-linux-gnu/9

Notice that --gcc-toolchain is entirely useless when multiple GCCs are installed in the same prefix /usr.

@Zentrik
Copy link
Contributor Author

Zentrik commented Mar 26, 2024

I'm quite out of my depth here but I'm not sure gcc is installed in /usr/lib/gcc/x86_64-linux-gnu/9 because this was printed which seems to suggest it's installed elsewhere

COLLECT_LTO_WRAPPER=/usr/local/bin/../libexec/gcc/x86_64-linux-gnu/9.1.0/lto-wrapper
Target: x86_64-linux-gnu
Configured with: /workspace/srcdir/gcc-9.1.0/configure --prefix=/workspace/destdir --with-build-sysroot=/workspace/destdir/x86_64-linux-gnu --with-sysroot=/workspace/destdir/x86_64-linux-gnu --with-gxx-include-dir=/workspace/destdir/x86_64-linux-gnu/include/c++/9.1.0 --target=x86_64-linux-gnu --host=x86_64-linux-gnu --build=x86_64-linux-gnu --disable-multilib --disable-werror --disable-bootstrap --enable-threads=posix --program-prefix=x86_64-linux-gnu- --with-arch=x86-64 --enable-languages=c,c++

@Zentrik
Copy link
Contributor Author

Zentrik commented Mar 26, 2024

It does seem to find the correct gcc version now. Yes, it does seem to get through patchelf's configure phase but now fails at making it

/cache/build/builder-amdci4-0/julialang/julia-buildkite/contrib/pgo-lto/stage0.build/usr/tools/clang++ -DPACKAGE_NAME=\"patchelf\" -DPACKAGE_TARNAME=\"patchelf\" -DPACKAGE_VERSION=\"0.18.0\" -DPACKAGE_STRING=\"patchelf\ 0.18.0\" -DPACKAGE_BUGREPORT=\"\" -DPACKAGE_URL=\"\" -DPACKAGE=\"patchelf\" -DVERSION=\"0.18.0\" -DHAVE_CXX17=1 -I. -I/cache/build/builder-amdci4-0/julialang/julia-buildkite/deps/srccache/patchelf-0.18.0/src    -Wall -Wextra -Wcast-qual -std=c++17 -D_FILE_OFFSET_BITS=64     -fprofile-generate=/cache/build/builder-amdci4-0/julialang/julia-buildkite/contrib/pgo-lto/profiles -Xclang -mllvm -Xclang -vp-counters-per-site=6 -MT patchelf.o -MD -MP -MF .deps/patchelf.Tpo -c -o patchelf.o /cache/build/builder-amdci4-0/julialang/julia-buildkite/deps/srccache/patchelf-0.18.0/src/patchelf.cc
/cache/build/builder-amdci4-0/julialang/julia-buildkite/deps/srccache/patchelf-0.18.0/src/patchelf.cc:19:10: fatal error: 'algorithm' file not found
#include <algorithm>

@haampie
Copy link

haampie commented Mar 26, 2024

The build environment likely contains just the C compiler part of GCC 10, and clang has poor detection & heuristics for the GCC install dir. If multiple are in the same prefix it picks the newest, and it will do so also when there are missing components like C++ stdlib header files. The --gcc-toolchain flag is useles cause it expects the prefix /usr, in which both GCC 9 and 10 live.

Use --gcc-install-dir=/usr/lib/gcc/x86_64-linux-gnu/9 or uninstall GCC 10.

@Zentrik
Copy link
Contributor Author

Zentrik commented Mar 26, 2024

/cache/build/builder-amdci5-3/julialang/julia-buildkite/contrib/pgo-lto/stage0.build/usr/tools/clang  -fprofile-generate=/cache/build/builder-amdci5-3/julialang/julia-buildkite/contrib/pgo-lto/profiles -Xclang -mllvm -Xclang -vp-counters-per-site=6 -O2 -m64 --gcc-install-dir=/usr/lib/gcc/x86_64-linux-gnu/9 -fPIC -std=c99 -Wsign-conversion -Wall -Wextra -pedantic -DUTF8PROC_EXPORTS  -c -o utf8proc.o utf8proc.c
clang-16: error: '/usr/lib/gcc/x86_64-linux-gnu/9' does not contain a GCC installation

On the previous commit I saw

Found candidate GCC installation: /usr/local/bin/../lib/gcc/x86_64-linux-gnu/9.1.0//../../../../lib/gcc/x86_64-linux-gnu/9.1.0
Selected GCC installation: /usr/local/bin/../lib/gcc/x86_64-linux-gnu/9.1.0//../../../../lib/gcc/x86_64-linux-gnu/9.1.0

So I presume the specified path was wrong.

@Zentrik
Copy link
Contributor Author

Zentrik commented Mar 26, 2024

Thanks, it seems to be working much better now. I just need to figure out why there's a hash mismatch, I had the same locally but I just refreshed it.

@Zentrik Zentrik force-pushed the main branch 4 times, most recently from f2ffa42 to 5ebf7b5 Compare April 21, 2024 12:17
@Zentrik
Copy link
Contributor Author

Zentrik commented Apr 21, 2024

This looks good to me, testing was 15% faster. If we want -Werror I think we just need to merge or otherwise fix JuliaLang/julia#53897 and bump libwhich to vtjnash/libwhich@613b3f0.

Zentrik added a commit to Zentrik/julia-buildkite that referenced this pull request Jul 27, 2024
Somewhat hacky way to fix JuliaCI#314 and to get JuliaCI#345 working again.
staticfloat pushed a commit that referenced this pull request Jul 28, 2024
* Update CMake version used when building

Somewhat hacky way to fix #314 and to get #345 working again.

* Only update CMake with llvm_passes image

* Potential Fix

* Try print debugging

* Fix typo

* Remove debugging echos
@Zentrik Zentrik changed the title Add building and testing of PGO+LTO LLVM and Julia Add building and testing of PGO+LTO+BOLT LLVM and Julia Aug 17, 2024
@Zentrik Zentrik marked this pull request as draft August 19, 2024 06:32
@Zentrik Zentrik changed the title Add building and testing of PGO+LTO+BOLT LLVM and Julia Add building and testing of PGO+LTO LLVM and Julia Aug 22, 2024
@Zentrik Zentrik force-pushed the main branch 2 times, most recently from 94f7c00 to a0b0a3b Compare August 25, 2024 07:47
@Zentrik Zentrik marked this pull request as ready for review September 9, 2024 20:28
@Zentrik Zentrik marked this pull request as draft September 9, 2024 21:43
@Zentrik Zentrik marked this pull request as ready for review September 11, 2024 18:05
@giordano
Copy link
Member

I discussed this with @Zentrik a few days ago: I believe it'd be great to actually test the PGO+LTO pipeline, but I'm a bit concerned with the extra long job this introduces. One option could be to move the job as a scheduled one, with the counter effect few eyes will look at it and the risk this will bitrot at some point. What do people think?

@Zentrik
Copy link
Contributor Author

Zentrik commented Sep 21, 2024

In terms of speeding up the build job we could build the instrumented LLVM on yggy. Further, we could only profile periodically, e.g. every LLVM upgrade, allowing us to skip the slow profiling step and to distribute the optimised LLVM from yggy.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

4 participants