debug_names incorrect parent due to collision between CU and TU

from https://github.com/llvm/llvm-project/pull/91808#issuecomment-2138480664

```
namespace A {
namespace B {
struct C { };
struct D { };
}  // namespace B
}  // namespace A
void f1(A::B::C, A::B::D) { }
```
```
$ clang++-tot names.cpp -g2 -O0 -gpubnames -c -fdebug-types-section && llvm-dwarfdump-tot -debug-names names.o | grep "Name \|Entry \|Tag: \|DW_IDX\|String: "
```
```
      String: 0x000000a0 "_Z2f1N1A1B1CENS0_1DE"
      Entry @ 0xe5 {
        Tag: DW_TAG_subprogram
        DW_IDX_die_offset: 0x00000023
        DW_IDX_parent: <parent not indexed>
```
```
      String: 0x000000b5 "A"
      Entry @ 0xeb {
        Tag: DW_TAG_namespace
        DW_IDX_type_unit: 0x00
        DW_IDX_die_offset: 0x00000023
        DW_IDX_parent: <parent not indexed>
      Entry @ 0xf1 {
        Tag: DW_TAG_namespace
        DW_IDX_type_unit: 0x01
        DW_IDX_die_offset: 0x00000023
        DW_IDX_parent: <parent not indexed>
      Entry @ 0xf7 {
        Tag: DW_TAG_namespace
        DW_IDX_die_offset: 0x00000044
        DW_IDX_parent: <parent not indexed>
```
```
      String: 0x000000b7 "B"
      Entry @ 0x103 {
        Tag: DW_TAG_namespace
        DW_IDX_type_unit: 0x00
        DW_IDX_die_offset: 0x00000025
        DW_IDX_parent: Entry @ 0xe5
      Entry @ 0x10d {
        Tag: DW_TAG_namespace
        DW_IDX_type_unit: 0x01
        DW_IDX_die_offset: 0x00000025
        DW_IDX_parent: Entry @ 0xf1
      Entry @ 0x117 {
        Tag: DW_TAG_namespace
        DW_IDX_die_offset: 0x00000046
        DW_IDX_parent: Entry @ 0xf7
```

So the second and third "B" entries match up with the appropriate "A" entries, but the first jumps over to refer to a different DIE in the CU that has the same offset.

This is because the UniqueID on a unit is only unique within that type of unit (it's unique across CUs and, separately, unique across TUs) - so the "DieOffsetAndUnitID" is not a globally unique identifier for an entry - it'd need the type of unit in there too to completely uniquify it.

Adding this patch is enough to expose the issue more directly:
```
diff --git a/llvm/lib/CodeGen/AsmPrinter/AccelTable.cpp b/llvm/lib/CodeGen/AsmPrinter/AccelTable.cpp
index 5b679fd3b9f9..cc8b8d1881ed 100644
--- a/llvm/lib/CodeGen/AsmPrinter/AccelTable.cpp
+++ b/llvm/lib/CodeGen/AsmPrinter/AccelTable.cpp
@@ -615,8 +615,10 @@ Dwarf5AccelTableWriter::Dwarf5AccelTableWriter(
 
   for (auto &Bucket : Contents.getBuckets())
     for (auto *Hash : Bucket)
-      for (auto *Value : Hash->getValues<DWARF5AccelTableData *>())
-        IndexedOffsets.insert(Value->getDieOffsetAndUnitID());
+      for (auto *Value : Hash->getValues<DWARF5AccelTableData *>()) {
+        auto Inserted = IndexedOffsets.insert(Value->getDieOffsetAndUnitID()).second;
+        assert(Inserted);
+      }
 
   populateAbbrevsMap();
 }
```
The two different units (one CU, one TU) with the same unit ID end up colliding in the `IndexedOffsets` set - and only one ends up in there, and so then the loop here:
```
for (OffsetAndUnitID Offset : IndexedOffsets)
  DIEOffsetToAccelEntryLabel.insert({Offset, Asm->createTempSymbol("")});
```
Only inserts the copy once, oh... and /this/ code:
```
if (EmittedAccelEntrySymbols.insert(EntrySymbol).second)
  Asm->OutStreamer->emitLabel(EntrySymbol);
```
Silently skips emitting the label even though it matches more than one entry unintentionally, because it can match more than one entry /intentionally/ (if there's multiple entries for exactly the same entity, but known by different names (like the mangled name and unmangled name)).

So, yeah.

Hmm, I guess at least the type unit number probably isn't gapless - there are cases where we create a type unit and then potentially throw it away (see the TypeUnitsUnderConstruction stuff) - but we don't reuse the type unit number. 

So, maybe it's possible to use a single numbering for CUs and TUs, gaps and all?

The only thing I was worried about was that some code might be using the numbering to index into an array of units at some point, but if that's not the case - great! Probably more intuitive that they be totally unique numbers.


Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

debug_names incorrect parent due to collision between CU and TU #93886

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

debug_names incorrect parent due to collision between CU and TU #93886

Description

Metadata

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

Issue actions