Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Allocate PHT & SHT at the end of the *.elf file #544

Open
wants to merge 1 commit into
base: master
Choose a base branch
from

Conversation

Patryk27
Copy link
Member

@Patryk27 Patryk27 commented Feb 6, 2024

Closes #531.
Closes #482.
Closes #244.

Upstream-wise, affects NixOS/nixpkgs#226339.
(didn't want to write close so that merging this merge request doesn't close that issue at once)

Abstract

Patching an *.elf file might require extending its program header table, which can then cause it to run out of its originally allocated space (both in terms of file offset and virtual memory).

Currently patchelf solves this problem by finding which sections would overlap PHT and then by moving them to the end of *.elf file:

while( i < rdi(hdr()->e_shnum) && rdi(shdrs.at(i).sh_offset) <= pht_size ) {

As compared to similar logic for binaries:

if ((rdi(shdr.sh_type) == SHT_PROGBITS && sectionName != ".interp")

... the logic for libraries is missing a crucial check: it doesn't take into account whether that particular section can be shuffled around - in particular, sections with SHT_PROGBITS can't!

As luck would have it, libnode.so (e.g. shipped together with pcloud) does have .rodata (a section with SHT_PROGBITS active) right at the beginning of the file:

; readelf -S libnode.so

There are 29 section headers, starting at offset 0x152bf70:

Section Headers:
  [Nr] Name              Type             Address           Offset
       Size              EntSize          Flags  Link  Info  Align
  [ 0]                   NULL             0000000000000000  00000000
       0000000000000000  0000000000000000           0     0     0
  [ 1] .rodata           PROGBITS         0000000000000240  00000240
       00000000003796a8  0000000000000000 AMS       0     0     32
/* ... */

... which patchelf happily moves around, breaking RIP-relative addressing in the assembly (which, after patching, tries to access the ZZZZ-ed memory).

This commit fixes the issue by changing the logic from:

if there's no space for PHT, shuffle sections around

... to, perhaps a bit more wasteful in terms of storage:

just always allocate PHT at the end of the file

As far as I've checked, the reason why PHT was so strictly kept at the beginning was an old Linux bug:

/* Even though this file is of type ET_DYN, it could actually be

Bug in Linux kernel, in fs/binfmt_elf.c:

... which is not present anymore (not sure when precisely was it fixed, though - the original entry in the BUGS file is dated 2005).

Seizing the day, I'm also including another fix (for binaries), so that merging this pull request will solve all pcloud-related problems.

@Patryk27 Patryk27 changed the title Draft: Allocate PHT & SHT at the end of the *.elf file Allocate PHT & SHT at the end of the *.elf file Feb 8, 2024
@Patryk27
Copy link
Member Author

Patryk27 commented Feb 8, 2024

Btw, I've got another merge request in the works, one that fixes #75 and hopefully will be able to handle #520 as well, but I've ultimately decided to move it out of this pull request, since it's technically a separate changeset (and less important, at least from my point of view) -- I'll prepare it in the upcoming weeks.

@jvolkman
Copy link

jvolkman commented Mar 7, 2024

Another issue is that strip doesn't like binaries with with the PHT and SHT bits at the end, and will either fail or generate invalid binaries as output. One workaround is to just strip first, but I guess this is not possible in all workflows.

See #117 and #10.

Definitely seems like a strip/libbfd bug since, as far as can tell, nothing in the ELF spec requires these tables to be at the top. I spent a couple days stepping through bfd/elf.c back when I was working on this (which also places the updated headers at the end), but I was ultimately unable to decipher what was going on.

tests/short-first-segment.sh Outdated Show resolved Hide resolved
@Mic92 Mic92 force-pushed the fixes branch 5 times, most recently from 5f982f1 to 8dad294 Compare November 18, 2024 10:12
@Mic92
Copy link
Member

Mic92 commented Nov 18, 2024

@Patryk27 rebased CI and it looks like this causes several regressions on non-x86 platforms.

@Mic92
Copy link
Member

Mic92 commented Nov 18, 2024

It should be possible to reproduce this locally using boot.binfmt.emulatedSystems and than either with a cross compiler from nixpkgs or with the docker image used in the tests.

@Patryk27
Copy link
Member Author

Patryk27 commented Dec 3, 2024

Status: got patch v2.0 in the works - a bit different, simpler approach. Hopefully will submit it in a couple of hours.

@Patryk27
Copy link
Member Author

Patryk27 commented Dec 3, 2024

Nope, still fails on other architectures (I'm testing using binfmt) - currently hitting Inconsistency detected by ld.so: rtld.c: 1280: rtld_setup_main_map: Assertion GL(dl_rtld_map).l_libname failed! on some tests. Analyzing further 🔍

@Patryk27
Copy link
Member Author

Patryk27 commented Dec 5, 2024

Got it! -- but the situation is... messy.

It seems that my fix is actually correct for all of the platforms - the tests fail due to a bug in qemu-user that stems from a bug in Linux. You see, this bug mentioned in here, the reason why we try to keep PHT at the beginning:

https://github.com/NixOS/patchelf/blob/769337c227799aa60911562b6940530f4a86eb3c/BUGS

... got fixed relatively recent, just in 2022-04:

https://lore.kernel.org/lkml/[email protected]/

qemu-user has a similar code, but they haven't backported this fix yet:

... and so patchelf with my fix generates correct *.elfs for native (or fully emulated) platforms - those *.elfs just seem broken for qemu-user and kernels before 5.15.

I have confirmed this suspicion by running a fully-emulated arm64 Ubuntu - with my fix, patchelf's tests fail on qemu-user (aka binfmt), but they pass on the fully emulated machine.

I think the most sane way out of this mess is to alter the logic so that patchelf tries to keep PHT at the beginning of the file by relocating other sections, as it does right now, and only upon finding an unmovable section (one that would overlap PHT), it would reallocate PHT instead.

If this PHT-relocation happens, the output *.elf won't work in qemu-user or kernels before 5.15, but I think:

  • it's niche (out of all the applications in nixpkgs, only pCloud's NodeJS library seems to have this peculiar ELF layout that doesn't provide enough space to resize PHT in place),
  • there's no other way around it (if we have to resize PHT in place, but there's no place for it, patchelf could at most crash instead of generating a broken binary; it seems there's no way to generate a correct *.elf with relocated PHT that would satisfy the older kernels and qemu-user).

What do you think, @Mic92?

@Mic92
Copy link
Member

Mic92 commented Dec 6, 2024

If we can skip tests if they run in qemu emulation, that would be nice. Patchelf is full of platform assumption and we would degraded it very quickly without proper testing. I will try to add some native arm64 ci here, so we can test that part. We should cover at least amd64/arm64 with native builds in this case.

@Patryk27 Patryk27 force-pushed the fixes branch 2 times, most recently from f0963eb to f901e39 Compare December 17, 2024 17:45
@Patryk27
Copy link
Member Author

Alright, @Mic92 - v3.0 is ready!

I've adjusted the logic so that we keep PHDR at the beginning of the file for as long as it's possible - if we detect that the new PHDR would overflow another section, only then we relocate it at the end of the file.

This makes the ELFs generated by patchelf compatible with both the older and newer kernels (unless we have to launch the new logic, e.g. for the pCloud's NodeJS library - those ELFs would be incompatible with the older kernels; but there's no other solution here).

I still need to write an extra test, though - for now I've just manually checked that the pCloud launches when patchelf-ed with my approach here.

@Mic92
Copy link
Member

Mic92 commented Dec 17, 2024

Inconsistency detected by ld.so: dl-setup_hash.c: 36: _dl_setup_hash: Assertion `(bitmask_nwords & (bitmask_nwords - 1)) == 0' failed!

This from the test suite on x86_64.

@Patryk27
Copy link
Member Author

Huh, interesting - all tests pass on my machine. I’ll take a look.

@Mic92
Copy link
Member

Mic92 commented Dec 17, 2024

@Patryk27 And you say that cherry-picking the qemu patch, would fix the other tests? I could potentially cross compile patchelf in that case. I think we have now basic toolchain for all these targets.

@osandov
Copy link

osandov commented Dec 17, 2024

I've adjusted the logic so that we keep PHDR at the beginning of the file for as long as it's possible - if we detect that the new PHDR would overflow another section, only then we relocate it at the end of the file.

Drive by comment here, thanks for making this change! There's another important reason to try to keep the PHDRs at the beginning of the file: by default, Linux only saves the first page of an executable or shared library in core dumps, and debuggers need the PHDRs and build ID from a core dump to identify the exact binary that was in use. So if the PHDRs aren't in the first page, debuggers won't be able to find what they need.

That was the motivation for the binutils patch (https://sourceware.org/git/?p=binutils-gdb.git;a=commit;h=bf6d7087de0a7351fd1dfd5f41522a7f4f576180) that caused #568 and NixOS/nixpkgs#362211, which I assume this also fixes?

@Patryk27
Copy link
Member Author

@Mic92

And you say that cherry-picking the qemu patch, would fix the other tests?

With my current approach the cherry-pick is not required, since we generate ELFs compatible with both kernels (and after I add a test covering the new behavior, I'll enable it just for x86_64, to avoid going through qemu-user).

@Mic92
Copy link
Member

Mic92 commented Dec 18, 2024

@Patryk27 Ok, but why do all emulated tests fail than a couple of tests each.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
4 participants