Skip to content

[LLD][ELF] -r --gc-sections corrupts DWARF relocations against local symbols #160789

@mysterymath

Description

@mysterymath

I'm forwarding this issue to LLVM upstream; @frobtech has already done an excellent analysis of the issue, which I've copied below. I've also included the reproducer and the original issue.

A TL;DR is that --discard-locals is corrupting relocations within DWARF when locals are getting scrubbed out. I've spent less than 5 minutes looking into why, but is seems like maybe these relocs aren't getting marked as used? We're very likely running with GC on, and it seems like there's an early out in markUsedLocalSymbols in that case. If DWARF isn't considered GC-live, maybe that's preventing these from getting marked live later. Just a guess.

Begin @frobtech:


On the output file startup-trampoline.export.link.o.debug, llvm-dwarfdump --debug-info will show some mangling stuff and eventually crash. (It's a separate bug that llvm-dwarfdump crashes on bad input like this.)

I think there may be several wrong relocations in the output section (and maybe others in the file) But I've found the first that explains the first wrong output. The first CU comes from the first input file (this is an assembly file, which is why its .debug_info contribution is so short and uses simpler DWARF encodings than the others). The second CU comes from the second input file .../start-compiler-abi.start-compiler-abi.cc.o. In just showing the top-level CU DIE for that, we start to see the problems.

0x000000fa: Compile Unit: length = 0x000098f9, format = DWARF32, version = 0x0005, unit_type = DW_UT_compile, abbr_offset = 0x0021, addr_size = 0x08 (next unit at 0x000099f7)

0x00000106: DW_TAG_compile_unit
              DW_AT_producer	()
              DW_AT_language	(DW_LANG_C_plus_plus_14)
              DW_AT_name	()
              DW_AT_str_offsets_base	(0x00000000)
              DW_AT_stmt_list	(0x00000000)
              DW_AT_comp_dir	()
              DW_AT_low_pc	(0x0000000000000000)
              DW_AT_ranges	(indexed (0x7) rangelist = 0x00000000)
              DW_AT_addr_base	(0x00000000)
              DW_AT_rnglists_base	(0x00000000)
              DW_AT_loclists_base	(0x00000000)

The () outputs are cases where it should be a string but wrong relocations are making it resolve to the empty string (.debug_str offset zero). This is using some multi-indirect fancy DWARF encodings. So the root of the bad strings is actually the DW_AT_str_offsets_base value being wrong: it's zero and should be 8.

In the input file, we see this:

0x00000000: Compile Unit: length = 0x000098f9, format = DWARF32, version = 0x0005, unit_type = DW_UT_compile, abbr_offset = 0x0000, addr_size = 0x08 (next unit at 0x000098fd)

0x0000000c: DW_TAG_compile_unit
              DW_AT_producer	("Fuchsia clang version 22.0.0git (https://llvm.googlesource.com/llvm-project 9d7449a82b83ee589b8af8d6f86525727788b3b9) ../../prebuilt/third_party/clang/linux-x64/bin/clang++ --driver-mode=g++ -MD -MF user.basic_riscv64-shared/obj/sdk/lib/c/startup/start-compiler-abi.start-compiler-abi.cc.o.d -o user.basic_riscv64-shared/obj/sdk/lib/c/startup/start-compiler-abi.start-compiler-abi.cc.o -D WITH_FRAME_POINTERS=1 -D _LIBCPP_DISABLE_VISIBILITY_ANNOTATIONS -D _LIBCPP_ENABLE_THREAD_SAFETY_ANNOTATIONS=1 -D TOOLCHAIN_VERSION=q0YKA_ep-wRB3fsgEMQHa6undH-AlTqoIHh-p6poHPkC -D ZX_ASSERT_LEVEL=2 -D _ALL_SOURCE -D LIBC_COPT_USE_C_ASSERT=1 -D LIBC_COPT_PUBLIC_PACKAGING=1 -D LIBC_NAMESPACE=__fuchsia_libc -D _XOPEN_SOURCE=700 -I ../../zircon/system/public -I ../../sdk/lib/zircon-assert -I ../../sdk/lib/c -I ../../sdk/lib/c/include-preempt -I ../../third_party/llvm-libc/src -I ../../zircon/third_party/ulib/musl/src/internal -I ../../zircon/third_party/ulib/musl/arch/riscv64 -I ../../zircon/system/ulib/runtime/include -I ../../src/zircon/lib/zircon/include -I fidling/gen/zircon/vdso/zx/zither/legacy_syscall_cdecl -I ../../zircon/system/ulib/zircon-internal/include -I ../../sdk/lib/stdcompat/include -I ../../sdk/lib/fit/include -I ../../zircon/system/ulib/zx/include -I ../../sdk/lib/ld/include -I ../../zircon/kernel/lib/arch/riscv64/include -I gen/zircon/kernel/lib/arch/gen-arm64-feature-asm.include -I gen/zircon/kernel/lib/arch/gen-arm64-system-asm.include -I gen/zircon/kernel/lib/arch/gen-riscv64-system-asm.include -I gen/zircon/kernel/lib/arch/gen-x86-msr-asm.include -I gen/zircon/kernel/lib/arch/gen-x86-cpuid-asm.include -I ../../zircon/kernel/lib/arch/include -I ../../zircon/system/ulib/hwreg/include -I ../../zircon/system/ulib/mmio-ptr/include -I ../../zircon/system/ulib/fbl/include -I ../../src/lib/elfldltl/include -Xclang -debug-info-kind=constructor -g3 -grecord-command-line -gdwarf-5 -gz=zstd -fno-omit-frame-pointer -momit-leaf-frame-pointer -fdata-sections -ffunction-sections -O2 -Wall -Wextra -Wconversion -Wextra-semi -Wimplicit-fallthrough -Wnewline-eof -Wstrict-prototypes -Wwrite-strings -Wno-sign-conversion -Wno-unused-parameter -Wnonportable-system-include-path -Wno-missing-field-initializers -Wno-extra-qualification -Wno-cast-function-type-mismatch -Wno-unknown-warning-option -Wno-missing-template-arg-list-after-template-kw -Wno-deprecated-pragma -Wno-nontrivial-memaccess -ftrivial-auto-var-init=pattern -ffile-compilation-dir=. -no-canonical-prefixes -fvisibility=hidden -Wthread-safety -Wno-unknown-warning-option -Wno-thread-safety-reference-return -Werror -Wa,--fatal-warnings -fno-common -fsized-deallocation --target=riscv64-fuchsia -march=rv64gcv_zihintpause_zba_zbb_zbs_zicbom_zicbop_zicboz_zfhmin_zkt -mabi=lp64d -fcolor-diagnostics -fcrash-diagnostics-dir=clang-crashreports -fcrash-diagnostics=all -fstack-size-section -ffuchsia-api-level=4293918720 -fno-sanitize=safe-stack -fno-sanitize=shadow-call-stack -fno-stack-protector -Wmissing-prototypes -Wmissing-variable-declarations -Wno-sign-compare -Wno-implicit-fallthrough -fno-stack-protector -fno-sanitize=fuzzer -idirafter ../../sdk/lib/c/include -idirafter ../../zircon/third_party/ulib/musl/include -idirafter gen/third_party/llvm-libc/src/include -idirafter ../../third_party/llvm-libc/src/include -Wno-deprecated-this-capture -std=c++20 -fno-exceptions -fno-rtti -fvisibility-inlines-hidden -ftemplate-backtrace-limit=0 -c ../../sdk/lib/c/startup/start-compiler-abi.cc")
              DW_AT_language	(DW_LANG_C_plus_plus_14)
              DW_AT_name	("../../sdk/lib/c/startup/start-compiler-abi.cc")
              DW_AT_str_offsets_base	(0x00000008)
              DW_AT_stmt_list	(0x00000000)
              DW_AT_comp_dir	(".")
              DW_AT_low_pc	(0x0000000000000000)
              DW_AT_ranges	(indexed (0x7) rangelist = 0x00000083
                 [0x0000000000000000, 0x000000000000032e)
                 [0x0000000000000000, 0x000000000000003c))
              DW_AT_addr_base	(0x00000008)
              DW_AT_rnglists_base	(0x0000000c)
              DW_AT_loclists_base	(0x0000000c)

That's llvm-dwarfdump applying the relocations in both cases, since both are ET_REL files (input and output from a -r link). The real issue is in the relocations.

In the input file, we have:

Relocation section '.rela.debug_info' at offset 0x151f0 contains 48 entries:
    Offset             Info             Type               Symbol's Value  Symbol's Name + Addend
0000000000000008  000000aa00000001 R_RISCV_32             0000000000000000 .debug_abbrev + 0
0000000000000011  000000ad00000001 R_RISCV_32             0000000000000008 .L0  + 0
[...]

The relocation to offset 0x11 is the one where the datum for DW_AT_str_offsets_base is. The reloc points to symtab entry 0xad (173):

   173: 0000000000000008     0 NOTYPE  LOCAL  DEFAULT   20 .L0 

That's pointing to section 20:

  [20] .debug_str_offsets PROGBITS        0000000000000000 005093 00002f 00   C  0   0  8

Rather than try

So this particular .L0 (there are many local symbols with that name here!) is at (input) .debug_str_offsets+8.

In the link map, we can see:

               0                0       7b     1 .debug_str_offsets
               0                0     1ac8     1         usr/local/google/home/mcgrathr/tq/fuchsia/out/minimal.riscv64/user.basic_riscv64-shared/obj/sdk/lib/c/startup/start-compiler-abi.start-compiler-abi.cc.o:(.debug_str_offsets)
[...]

That second input file is the first contribution to the output .debug_str_offsets (the first input file doesn't have one).

Now, both the first and second input files contributed to .debug_info. The first input file's contribution has size 0xfa (uncompressed size). So 0xfa+0x11 = 0x10b is the output offset that corresponds to that reloc above in the second input file (with r_offset=0x11).

So here's the output file's first few relocs in .debug_info:

Relocation section '.rela.debug_info' at offset 0x21ca8 contains 619 entries:
    Offset             Info             Type               Symbol's Value  Symbol's Name + Addend
0000000000000008  0000000000000001 R_RISCV_32                                0
000000000000000d  0000000000000001 R_RISCV_32                                0
0000000000000011  0000001100000002 R_RISCV_64             0000000000000000 .text + 0
0000000000000019  0000000000000002 R_RISCV_64                                0
00000000000000f1  0000000000000002 R_RISCV_64                                0
0000000000000102  0000001400000001 R_RISCV_32             0000000000000000 .debug_abbrev + 21
000000000000010b  0000000000000001 R_RISCV_32                                0
[...]

The reloc at 0x10b (last one in the quoted block) has been rewritten to no symbol, no addend. So now it will resolves to an offset of 0 in the output .debug_str_offsets section when it should resolve to an offset of 8 as the original reloc for the original input section did.

I suspect that many other relocs in the -r output have been wrongly rewritten in similar fashion. It should be fine to drop the original local symbols and rewrite the relocs if you want to. But that would have to rewrite them relative to a replacement STT_SECTION symbol. It can't just drop the symbol association entirely. It can't even turn it into no symbol + addend 8, because the output .debug_str_offsets is still a relocatable input that can be combined with other sections and the .debug_info reference still needs to be relocated to something that amounts to offset 8 into the original input section.


It looks like --discard-locals is what makes it happen. So we can work around it by avoiding --discard-locals with -r. But the --discard-locals behavior should never break semantics like this. It should replace the discarded symbols with STT_SECTION symbols to preserve the reloc semantics.

I'm not sure if there's any issue with -r --discard-locals and optimal relaxation for code sections in the final link. We should look into that too. IMHO if there's any case where it would break semantics, we should either disallow/disable --discard-locals or (preferably) do it in a way that reduces the symbol table load while retaining fully correct semantics for all aspects of relocation and relaxation.

Metadata

Metadata

Assignees

No one assigned

    Labels

    Type

    No type

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions