Issue 080930.1: Using COMDAT Sections to Reduce the Size of DWARF Debug Information

Author: Cary Coutant
Champion: Cary Coutant
Date submitted: 2008-09-30
Date revised:
Date closed:
Type: Enhancement
Status: Accepted
DWARF Version: 4
Background
----------

DWARF debugging information for a typical C++ application can consume
a large amount of disk space in both the relocatable object files and
the final executable or shared library. Depending on the application
and compilation options, the debug information can consume as much as
75% of the object file.

The bulk of the debug information is in the .debug_info section, the
bulk of that section contains type information, and the bulk of the
type information is made up of duplicate copies of types that are
emitted by the compiler in each compilation unit.

This proposal extends the DWARF format to allow the compiler to place
type information in separate .debug_types sections, such that the
linker can eliminate duplicates using the COMDAT mechanism, which is
now a fairly standard feature of modern linkers.

For complete details of the approach, see the following Wiki page:

http://wiki.dwarfstd.org/index.php?title=COMDAT_Type_Sections


Overview
--------

A new .debug_types section is defined, which will contain "type
units." These are similar to the compile units that can be found in
the .debug_info section, but each type unit will describe exactly one
type. The top-level DIE of a type unit has a new tag,
DW_TAG_type_unit.

Each type that is placed in a type unit will have a unique 64-bit
signature, generated by the compiler using a hash algorithm that
operates on the DWARF definition of the type.

References to types contained in type units are made using the 64-bit
signature rather than a direct reference to the DIE that describes the
type. These references are represented in the DWARF information using
a new form, DW_FORM_ref_sig8, which is a member of the reference
class, and can be used wherever a DW_FORM_ref4, for example, would be
used to refer to a type.


Proposed Changes to the DWARF Specification
-------------------------------------------

These changes are keyed to the June 24, 2008, working draft of the DWARF
Version 4 specification.

Section 2.1, The Debugging Information Entry

  In Figure 1, add DW_TAG_type_unit.

  Change the last paragraph of the section as follows:

    The debugging information entries are intended to exist
    in the .debug_info and .debug_types sections of an
    object file.

Section 2.2, Attribute Types

  In Figure 2, add the following row to the table:

    DW_AT_signature      Type signature

  In Figure 3, change the description of the reference class in the table
  as follows:
    Refers to one of the debugging information entries that
    describe the program. There are three types of
    reference. The first is an offset relative to the
    beginning of the compilation unit in which the reference
    occurs and must refer to an entry within that same
    compilation unit. The second type of reference is the
    offset of a debugging information entry in any
    compilation unit, including one different from the unit
    containing the reference. The third type of reference is
    an indirect reference to a type definition, based on the
    64-bit signature of that type.

Chapter 3, Program Scope Entries

  In the first paragraph, change the final sentence to:

    Except for separate type entries (see Section 3.2),
    these entries may be thought of as bounded by ranges
    of text addresses within the program.

  Add a new section after Section 3.1:

    3.2 Separate Type Entries

    An object file may contain any number of separate type
    unit entries, each representing a single complete type
    definition. Each type unit must be uniquely identified
    by a 64-bit signature, stored as part of the type unit,
    which can be used to reference the type definition from
    debugging entries in compilation units and other type
    units.

    A type unit is represented by a debugging information
    entry with the tag DW_TAG_type_unit. A type unit entry
    owns debugging information entries that represent the
    definition of a single type, plus additional debugging
    information entries that may be necessary to include as
    part of the definition of the type.

    A type unit entry may have a DW_AT_language attribute,
    whose constant value is an integer code indicating the
    source language used to define the type. The set of
    language names and their meanings are given in Figure 8.

    A type unit entry for a given type T may have three
    kinds of children:

    1. A tree representing the defining declaration of
       type T.

    2. A tree containing a declaration of type T, enclosed
       in any nested types and/or namespaces. The
       declaration of the type will have a DW_AT_declaration
       attribute, and the defining declaration will have a
       DW_AT_specification attribute that refers to the
       declaration entry. (Required only if the type is
       nested inside another type or namespace.)

    3. Additional trees as necessary, each containing a
       declaration of a type that is referenced by type T
       but has not been placed in its own separate type
       unit.

    Alternatively, for nested types, the defining
    declaration (1) may be placed directly within the
    declaration tree (2), avoiding the use of
    DW_AT_specification and DW_AT_declaration attributes.

    *Not all types are required to be placed in type units.
    In general, only structure, class, enumeration, and
    union types included from header files should be
    considered for separate type units. Base types and other
    small types are not usually worth the overhead of
    placing in separate type units. Types that are unlikely
    to be replicated, such as those defined in the main
    source file, are also better left in the main
    compilation unit.*

Chapter 5, Type Entries

Section 5.6.1, Structure, Union and Class Type Entries

  Add the following paragraph after the paragraph beginning "An
incomplete structure...":

    If the complete declaration of a type has been placed in
    a separate type unit, an incomplete declaration of that
    type in the compilation unit may provide the unique
    64-bit signature of the type with a DW_AT_signature
    attribute.

Chapter 7, Data Representation

Section 7.4, 32-Bit and 64-Bit DWARF Formats

  In the table of section offset and section length fields in
  Item #2, add the following rows:

    .debug_types  debug_abbrev_offset  offset in .debug_abbrev
    .debug_types  type_offset          offset in .debug_types

Section 7.5, Format of Debugging Information

  Add a new section after Section 7.5.1:

    7.5.2 Type Unit Header

    The header for the series of debugging information
    entries contributing to the description of a type that
    has been placed in its own type unit consists of the
    following information:

    1. unit_length (initial length)

       [same as 7.5.1]

    2. version (uhalf)

       [same as 7.5.1]

    3. debug_abbrev_offset (section offset)

       [same as 7.5.1]

    4. address_size (ubyte)

       [same as 7.5.1]

    5. type_signature (8-byte unsigned integer)

       A 64-bit unique signature of the type described in
       this type unit.

    6. type_offset (section offset)

       A 4-byte or 8-byte unsigned offset relative to the
       beginning of the type unit header. This offset refers
       to the debugging information entry that describes the
       type. Because the type may be nested inside a
       namespace or other structures, and may contain
       references to other types that have not been placed
       in separate type units, it is not necessarily either
       the first or the only entry in the type unit.
       In the 32-bit DWARF format, this is a 4-byte unsigned
       length; in the 64-bit DWARF format, this is an 8-byte
       unsigned length (see Section 7.4).

    The type_signature is computed by the DWARF producer
    only; a DWARF consumer needs to resolve type references
    to the type definitions that are contained in type units
    based on the signature. The signature is formed from the
    MD5 hash of a flattened description of the type. The
    flattened description of the type is a byte sequence
    derived from the DWARF encoding of the type as follows:

    1. Start with an empty sequence S and a list V of
       visited types, where V is initialized to a list
       containing the starting type as its single element.
       Elements in V are indexed from 1, so that V[1] is the
       starting type.

    2. If the debug entry represents a type that is nested
       inside another type or a namespace, append to S the
       following for each surrounding type or namespace,
       beginning with the outermost such construct: the
       letter 'C', the DWARF tag of the construct, and the
       name (taken from the DW_AT_name attribute) of the
       type or namespace (including its trailing null byte).

    3. Append to S the letter 'D', followed by the DWARF tag
       of the debug entry.

    4. For each of the following attributes that are present
       in the debug entry, in the order listed below, append
       to S a marker letter (see below), the DWARF attribute
       code, and the attribute value:

          DW_AT_name
          DW_AT_accessibility
          DW_AT_address_class
          DW_AT_allocated
          DW_AT_artificial
          DW_AT_associated
          DW_AT_binary_scale
          DW_AT_bit_offset
          DW_AT_bit_size
          DW_AT_bit_stride
          DW_AT_byte_size
          DW_AT_byte_stride
          DW_AT_const_expr [pending approval of 090107.1]
          DW_AT_const_value
          DW_AT_containing_type
          DW_AT_count
          DW_AT_data_location
          DW_AT_data_member_location
          DW_AT_decimal_scale
          DW_AT_decimal_sign
          DW_AT_default_value
          DW_AT_digit_count
          DW_AT_discr
          DW_AT_discr_list
          DW_AT_discr_value
          DW_AT_encoding
          DW_AT_endianity
          DW_AT_explicit
          DW_AT_is_optional
          DW_AT_location
          DW_AT_lower_bound
          DW_AT_mutable
          DW_AT_ordering
          DW_AT_picture_string
          DW_AT_prototyped
          DW_AT_small
          DW_AT_segment
          DW_AT_string_length
          DW_AT_threads_scaled
          DW_AT_upper_bound
          DW_AT_use_location
          DW_AT_use_UTF8
          DW_AT_use_UTF16 [pending approval of 090109.1]
          DW_AT_use_UTF32 [pending approval of 090109.1]
          DW_AT_variable_parameter
          DW_AT_virtuality
          DW_AT_visibility
          DW_AT_vtable_elem_location

       If an implementation defines any vendor-specific
       attributes, any such attributes that are essential to
       the definition of the type should also be included in
       the above list at fixed positions defined by the
       vendor.

       An attribute that refers to another type entry T is
       processed as follows:  (a) If T is in the list V at
       some V[x], use the letter 'R' as the marker and use
       the unsigned LEB128 encoding of x as the attribute
       value; otherwise, (b) use the letter 'T' as the
       marker, process the type T recursively by performing
       Steps 2 through 7, using the result as the attribute
       value.

       Other attribute values use the letter 'A' as the
       marker, and the value consists of the form code
       (encoded as an unsigned LEB128 value) followed by the
       encoding of the value according to the form code. To
       ensure reproducibility of the signature, the set of
       forms used in the signature computation is limited to
       the following: DW_FORM_sdata, DW_FORM_flag,
       DW_FORM_string, and DW_FORM_block.

    5. If the tag is one of DW_TAG_pointer_type,
       DW_TAG_reference_type, DW_TAG_rvalue_reference_type
       [pending approval of 090106.1],
       DW_TAG_ptr_to_member_type, or DW_TAG_friend, and the
       referenced type (via the DW_AT_type or DW_AT_friend
       attribute) has a DW_AT_name attribute, append to S
       the letter 'N', the DWARF attribute code (DW_AT_type
       or DW_AT_friend), the context of the type (according
       to the method in Step 2), the letter 'E', and the
       name of the type. For DW_TAG_friend, if the
       referenced debug entry is a DW_TAG_subprogram, the
       context is omitted and the name to be used is the
       ABI-specific name of the subprogram (e.g., the
       mangled linker name).

    6. If the tag is not one of DW_TAG_pointer_type,
       DW_TAG_reference_type, DW_TAG_rvalue_reference_type
       [pending approval of 090106.1],
       DW_TAG_ptr_to_member_type, or DW_TAG_friend, but has
       a DW_AT_type attribute, or if the referenced type
       (via the DW_AT_type or DW_AT_friend attribute) does
       not have a DW_AT_name attribute, the attribute is
       processed according to the method in Step 4 for an
       attribute that refers to another type entry.

    7. Visit each child C of the debug entry as follows: If
       C is a nested type entry or a member function entry,
       and has a DW_AT_name attribute, append to S the
       letter 'S', the tag of C, and its name; otherwise,
       process C recursively by performing Steps 3 through
       7, appending the result to S. Following the last
       child (or if there are no children), append a zero
       byte.

    For the purposes of this algorithm, if a debug entry has
    a DW_AT_specification attribute that refers to another
    debug entry (which has a DW_AT_declaration attribute),
    then the two debug entries are processed as a single
    entry, with all the attributes and children of the
    specification combined into the declaration.

    DWARF tag and attribute codes are appended to the
    sequence as unsigned LEB128 values, using the values
    defined later in this chapter.

    *An attribute that refers to another debug entry should
    be recursively processed or replaced with the name of
    the referent (in Step 5 or 6). If neither treatment
    applies to an attribute that references another debug
    entry, the entry that contains that attribute should not
    be considered for a separate type unit.

    If a debug entry contains an attribute from the list
    above that would require an unsupported form, that debug
    entry should not be considered for a separate type unit.

    A type should be considered for a separate type unit
    only if all of the debug entries that it contains or
    refers to in Steps 6 and 7 can themselves be considered
    for a separate type unit.*

    Where the DWARF producer may reasonably choose two or
    more different forms for a given attribute, it should
    choose the simplest possible form in computing the
    signature. (For example, a constant value should be
    preferred to a location expression when possible.)

    Once the string S has been formed from the DWARF
    encoding, an MD5 hash is computed for the string and the
    lower 64 bits are taken as the type signature.

    *The string S is intended to be a flattened
    representation of the type that uniquely identifies that
    type (i.e., a different type is unlikely to produce the
    same string).

    If the value of an attribute is a location expression,
    and the location expression contains a reference to
    another debug entry (e.g., a DW_OP_call_ref operator),
    it is unlikely that the debug entry will remain
    identical across compilation units, and it should not be
    placed in a separate type unit.

    If an attribute refers to a code location or a location
    list, the debug entry should not be placed in a separate
    type unit.

    If an attribute refers to another debug entry that does
    not represent a type, the debug entry should not be
    placed in a separate type unit.

    The DW_AT_declaration attribute is not included in the
    signature because it indicates that the debug entry
    represents an incomplete declaration, and incomplete
    declarations should not be placed in separate type
    units.

    The DW_AT_description attribute is not included because
    it does not provide any information unique to the
    defining declaration of the type.

    The DW_AT_decl_file, DW_AT_decl_line, and
    DW_AT_decl_column attributes are not included because
    they may vary from one source file to the next, and
    would prevent two otherwise identical type declarations
    from producing the same hash.

    The DW_AT_object_pointer attribute is not included
    because the information it provides is not necessary for
    the computation of a unique type signature.

    Nested types and some types referred to by a debug entry
    are encoded by name rather than by recursively encoding
    the type to allow for cases where a complete definition
    of the type might not be available in all compilation
    units.*

Section 7.5.4 [original numbering], Attribute Encodings

  Under the "reference" class, change "There are two types
  of reference" to "There are three types of reference".

  In the paragraph that begins "The second type of
  reference...", replace "can identify any debugging
  information entry within a program" with "can identify any
  debugging information entry in a .debug_info section".

  After the paragraph beginning "The second type of
  reference," add the following paragraph:

    The third type of reference can identify any debugging
    information type entry that has been placed in its own
    type unit. This type of reference (DW_FORM_ref_sig8) is
    the 64-bit type signature that was computed for the
    type.

  In Figure 18, add the following row:

    DW_TAG_type_unit   0x41

  In Figure 20, add the following row:

    DW_AT_signature    0x69

  In Figure 21, add the following row:

    DW_FORM_ref_sig8   0x20


Appendix A -- Attributes by Tag Value

  In Figure 42, add DW_AT_signature to the following rows:

    DW_TAG_class_type
    DW_TAG_enumeration_type
    DW_TAG_structure_type
    DW_TAG_union_type

  In Figure 42, add the following row:

    DW_TAG_type_unit      DW_AT_language

Appendix B -- Debug Section Relationships

  In Figure 43, add ".debug_types" to the circle currently
  labelled ".debug_info".

Appendix E -- DWARF Compression and Duplicate Elimination (informative)

I propose to restructure this appendix a bit. Currently there are four sections:

E.1 Overview
E.2 Naming and Usage Considerations
E.3 Examples
E.4 Summary of Compression Techniques

Sections E.1 through E.3 are really all about the per-header file
compression technique, as summarized in E.4.1. The new structure would
be:

E.1 One Compilation Unit per Header File
    E.1.1 Overview [Original E.1]
    E.1.2 Naming and Usage Considerations [Original E.2]
    E.1.3 Examples [Original E.3]
E.2 Using Type Units to Eliminate Duplicate Types
    [New content]
E.3 Summary of Compression Techniques [Original E.4]
    E.3.1 #include compression [Original E.4.1]
    E.3.2 Eliminating function duplication [Original E.4.2]
    E.3.3 Single-function-per-DWARF-compilation-unit [Original E.4.3]
    E.3.4 Inlining and out-of-line instances [Original E.4.4]
    E.3.5 Separate type units [New content]

The proposed contents of the new Sections E.2 and E.3.5 are given here:

E.2 Using Type Units to Eliminate Duplicate Types

A large portion of debug information is type information, and in a
typical compilation environment, many types are duplicated many times.
One method of controlling the amount of duplication is separating each
type into a separate .debug_types section and arranging for the linker
to recognize and eliminate duplicates at the individual type level.

Using this technique, each substantial type definition is placed in
its own individual section, while the remainder of the DWARF
information (non-type information, incomplete type declarations, and
definitions of trivial types) is placed in the usual debug information
section. In a typical implementation, the relocatable object file may
contain one of each of these debug sections:

  .debug_abbrev
  .debug_info
  .debug_line

and any number of these additional sections:

  .debug_types

As discussed in the previous section [Section E.1], many linkers today
support the concept of a COMDAT group or linkonce section. The general
idea is that a "key" can be attached to a section or a group of
sections, and the linker will include only one copy of a section group
(or individual section) for any given key. For .debug_types sections,
the key is the signature formed from the algorithm given in Section
7.5.2.

As an example, consider a C++ header file containing the following
type definitions:

     1  namespace N {
     2  
     3  struct B;
     4  
     5  struct C {
     6    int x;
     7    int y;
     8  };
     9  
    10  class A {
    11   public:
    12    A(int v)
    13      : v_(v), next(NULL), bp(NULL), c()
    14    { }
    15    int v()
    16    { return v_; }
    17   private:
    18    int v_;
    19    struct A *next;
    20    struct B *bp;
    21    struct C c;
    22  };
    23  
    24  }

Let us first consider one possible representation of the DWARF
information that describes the type "struct C":

  DW_TAG_type_unit
      DW_AT_language: DW_LANG_C_plus_plus (4)
    DW_TAG_namespace
        DW_AT_name: "N"
L1:
      DW_TAG_class_type
          DW_AT_name: "C"
          DW_AT_byte_size: 8
          DW_AT_decl_file: 1
          DW_AT_decl_line: 5
        DW_TAG_member
            DW_AT_name: "x"
            DW_AT_decl_file: 1
            DW_AT_decl_line: 6
            DW_AT_type: &L2
            DW_AT_data_member_location: 0
        DW_TAG_member
            DW_AT_name: "y"
            DW_AT_decl_file: 1
            DW_AT_decl_line: 7
            DW_AT_type: &L2
            DW_AT_data_member_location: 4
L2:
    DW_TAG_base_type
        DW_AT_byte_size: 4
        DW_AT_encoding: DW_ATE_signed
        DW_AT_name: "int"

In computing a signature for the type N::C, we will flatten the type
description into a byte stream according to the procedure outlined in
Section 7.5.2:

// Step 2: 'C' DW_TAG_namespace "N"
0x43 0x39 0x4e 0x00
// Step 3: 'D' DW_TAG_structure_type
0x44 0x13
// Step 4: 'A' DW_AT_name "C"
0x41 0x03 0x43 0x00
// Step 4: 'A' DW_AT_byte_size 8
0x0b 0x08
// Step 7: First child ("x")
    // Step 3: 'D' DW_TAG_member
    0x44 0x0d
    // Step 4: 'A' DW_AT_name "x"
    0x41 0x03 0x78 0x00
    // Step 4: 'A' DW_AT_data_member_location 0
    0x41 0x38 0x00
    // Step 6: 'T' (type #2)
    0x54
        // Step 3: 'D' DW_TAG_base_type
        0x44 0x24
        // Step 4: 'A' DW_AT_name "int"
        0x41 0x03 0x69 0x6e 0x74 0x00
        // Step 4: 'A' DW_AT_byte_size 4
        0x41 0x0b 0x04
        // Step 4: 'A' DW_AT_encoding DW_ATE_signed
        0x41 0x3e 0x05
        // Step 7: End of DW_TAG_base_type "int"
        0x00
    // Step 7: End of DW_TAG_member "x"
    0x00
// Step 7: Second child ("y")
    // Step 3: 'D' DW_TAG_member
    0x44 0x0d
    // Step 4: 'A' DW_AT_name "y"
    0x41 0x03 0x78 0x00
    // Step 4: 'A' DW_AT_data_member_location 4
    0x41 0x38 0x04
    // Step 6: 'R' DW_AT_type (type #2)
    0x52 0x49 0x02
    // Step 7: End of DW_TAG_member "y"
    0x00
// Step 7: End of DW_TAG_structure_type "C"
0x00

Running an MD5 hash over this byte stream, and taking the low-order 64
bits, yields the final signature: 0xb0dbb00a bd4cf18f.

Next, we consider a representation of the DWARF information that
describes the type "class A":

  DW_TAG_type_unit
      DW_AT_language: DW_LANG_C_plus_plus (4)
    DW_TAG_namespace
        DW_AT_name: "N"
L1:
      DW_TAG_class_type
          DW_AT_name: "A"
          DW_AT_byte_size: 20
          DW_AT_decl_file: 1
          DW_AT_decl_line: 10
        DW_TAG_member
            DW_AT_name: "v_"
            DW_AT_decl_file: 1
            DW_AT_decl_line: 18
            DW_AT_type: &L2
            DW_AT_data_member_location: 0
        DW_TAG_member
            DW_AT_name: "next"
            DW_AT_decl_file: 1
            DW_AT_decl_line: 19
            DW_AT_type: &L3
            DW_AT_data_member_location: 4
        DW_TAG_member
            DW_AT_name: "bp"
            DW_AT_decl_file: 1
            DW_AT_decl_line: 20
            DW_AT_type: &L4
            DW_AT_data_member_location: 8
        DW_TAG_member
            DW_AT_name: "c"
            DW_AT_decl_file: 1
            DW_AT_decl_line: 21
            DW_AT_type: 0xb0dbb00a bd4cf18f (signature for struct C)
            DW_AT_data_member_location: 12
        DW_TAG_subprogram
            DW_AT_external: 1
            DW_AT_name: "A"
            DW_AT_decl_file: 1
            DW_AT_decl_line: 12
            DW_AT_declaration: 1
          DW_TAG_formal_parameter
            DW_AT_type: &L3
            DW_AT_artificial: 1
          DW_TAG_formal_parameter
            DW_AT_type: &L2
        DW_TAG_subprogram
            DW_AT_external: 1
            DW_AT_name: "v"
            DW_AT_decl_file: 1
            DW_AT_decl_line: 15
            DW_AT_type: &L2
          DW_TAG_formal_parameter
            DW_AT_type: &L3
            DW_AT_artificial: 1
L2:
    DW_TAG_base_type
        DW_AT_byte_size: 4
        DW_AT_encoding: DW_ATE_signed
        DW_AT_name: "int"
L3:
    DW_TAG_pointer_type
        DW_AT_type: &L1
L4:
    DW_TAG_pointer_type
        DW_AT_type: &L5
    DW_TAG_namespace
        DW_AT_name: "N"
L5:
      DW_TAG_structure_type
          DW_AT_name: "B"
          DW_AT_declaration: 1

In this example, the structure types N::A and N::C have each been
placed in separate type units. For N::A, the actual definition of the
type begins at label L2. The definition involves references to the int
base type and to two pointer types. The information for each of these
referenced types is also included in this type unit, since base types
and pointer types are trivial types that are not worth the overhead of
a separate type unit. The last pointer type contains a reference to an
incomplete type N::B, which is also included here as a declaration,
since the complete type is unknown and its signature is therefore
unavailable. There is also a reference to N::C, using DW_FORM_sig8 to
refer to the type signature for that type.

In computing a signature for the type N::A, we will flatten the type
description into a byte stream according to the procedure outlined in
Section 7.5.2:

// Step 2: 'C' DW_TAG_namespace "N"
0x43 0x39 0x4e 0x00
// Step 3: 'D' DW_TAG_class_type
0x44 0x02
// Step 4: 'A' DW_AT_name "A"
0x41 0x03 0x41 0x00
// Step 4: 'A' DW_AT_byte_size 20
0x41 0x0b 0x14
// Step 7: First child ("v_")
    // Step 3: 'D' DW_TAG_member
    0x43 0x0d
    // Step 4: 'A' DW_AT_name "v_"
    0x41 0x03 0x76 0x5f 0x00
    // Step 4: 'A' DW_AT_data_member_location 0
    0x41 0x38 0x00
    // Step 6: 'T' (type #2)
    0x54
        // Step 3: 'D' DW_TAG_base_type
        0x44 0x24
        // Step 4: 'A' DW_AT_name "int"
        0x41 0x03 0x69 0x6e 0x74 0x00
        // Step 4: 'A' DW_AT_byte_size 4
        0x41 0x0b 0x04
        // Step 4: 'A' DW_AT_encoding DW_ATE_signed
        0x41 0x3e 0x05
        // Step 7: End of DW_TAG_base_type "int"
        0x00
    // Step 7: End of DW_TAG_member "v_"
    0x00
// Step 7: Second child ("next")
    // Step 3: 'D' DW_TAG_member
    0x43 0x0d
    // Step 4: 'A' DW_AT_name "next"
    0x41 0x03 0x6e 0x65 0x78 0x74 0x00
    // Step 4: 'A' DW_AT_data_member_location 4
    0x41 0x38 0x04
    // Step 6: 'T' (type #3)
    0x54
        // Step 3: 'D' DW_TAG_pointer_type
        0x44 0x0f
        // Step 5: 'N' DW_AT_type
        0x4e 0x49
        // Step 5: 'C' DW_AT_namespace "N"
        0x43 0x39 0x4e 0x00
        // Step 5: "A"
        0x41 0x00
        // Step 7: End of DW_TAG_pointer_type
        0x00
    // Step 7: End of DW_TAG_member "next"
    0x00
// Step 7: Third child ("bp")
    // Step 3: 'D' DW_TAG_member
    0x43 0x0d
    // Step 4: 'A' DW_AT_name "bp"
    0x41 0x03 0x62 0x70 0x00
    // Step 4: 'A' DW_AT_data_member_location 4
    0x41 0x38 0x08
    // Step 6: 'T' (type #4)
    0x54
        // Step 3: 'D' DW_TAG_pointer_type
        0x44 0x0f
        // Step 5: 'N' DW_AT_type
        0x4e 0x49
        // Step 5: 'C' DW_AT_namespace "N"
        0x43 0x39 0x4e 0x00
        // Step 5: "B"
        0x42 0x00
        // Step 7: End of DW_TAG_pointer_type
        0x00
    // Step 7: End of DW_TAG_member "next"
    0x00
// Step 7: Fourth child ("c")
    // Step 3: 'D' DW_TAG_member
    0x44 0x0d
    // Step 4: 'A' DW_AT_name "c"
    0x41 0x03 0x63 0x00
    // Step 4: 'A' DW_AT_data_member_location 12
    0x41 0x38 0x0c
    // Step 6: 'T' (type #5)
    0x54
        // Step 2: 'C' DW_TAG_namespace "N"
        0x43 0x39 0x4e 0x00
        // Step 3: 'D' DW_TAG_structure_type
        0x44 0x13
        // Step 4: 'A' DW_AT_name "C"
        0x41 0x03 0x43 0x00
        // Step 4: 'A' DW_AT_byte_size 8
        0x41 0x0b 0x08
        // Step 7: First child ("x")
            // Step 3: 'D' DW_TAG_member
            0x44 0x0d
            // Step 4: 'A' DW_AT_name "x"
            0x41 0x03 0x78 0x00
            // Step 4: DW_AT_data_member_location 0
            0x38 0x00
            // Step 6: 'R' DW_AT_type (type #2)
            0x52 0x49 0x02
            // Step 7: End of DW_TAG_member "x"
            0x00
        // Step 7: Second child ("y")
            // Step 3: 'D' DW_TAG_member
            0x44 0x0d
            // Step 4: 'A' DW_AT_name "y"
            0x41 0x03 0x79 0x00
            // Step 4: DW_AT_data_member_location 4
            0x38 0x04
            // Step 6: 'R' DW_AT_type (type #2)
            0x52 0x49 0x02
            // Step 7: End of DW_TAG_member "y"
            0x00
        // Step 7: End of DW_TAG_structure_type "C"
        0x00
    // Step 7: End of DW_TAG_member "c"
    0x00
// Step 7: Fifth child ("A")
    // Step 3: 'S' DW_TAG_subprogram "A"
    0x53 0x2e 0x41 0x00
// Step 7: Sixth child ("v")
    // Step 3: 'S' DW_TAG_subprogram "v"
    0x53 0x2e 0x76 0x00
// Step 7: End of DW_TAG_structure_type "A"
0x00

Running an MD5 hash over this byte stream, and taking the low-order 64
bits, yields the final signature: 0xd681845c 21a14576.

A source file that includes this header file may declare a variable of
type N::A, and its DWARF information may look like the following:

  DW_TAG_compile_unit
    ...
    DW_TAG_subprogram
      ...
      DW_TAG_variable
        DW_AT_name: "a"
        DW_AT_type: (signature) 0xd681845c 21a14576
        DW_AT_location: ...
    ...


E.3.5  Separate type units

Each complete declaration of a globally-visible type can be placed in
its own separate type section, with a group key derived from the type
signature. The linker can then remove all duplicate type declarations
based on the key.


E.3.6  Grammar for COMDAT compression 

signature
 : opt-context debug-entry attributes children

opt-context                         # Step 2
 : 'C' tag-code string opt-context
 : empty

debug-entry                         # Step 3
 : 'D' tag-code

attributes                          # Steps 4, 5, 6
 : attribute attributes
 : empty

attribute
 : 'A' at-code form-encoded-value  # Normal attributes
 : 'N' at-code opt-context 'E' string  # Reference to type by name
 : 'R' at-code back-ref            # Back-reference to visited type
 : 'T' at-code signature           # Recursive type

children                            # Step 7
 : child children
 : '\0'

child
 : 'S' tag-code string
 : signature

tag-code
 : <ULEB128>

at-code
 : <ULEB128>

form-encoded-value
 : DW_FORM_sdata value
 : DW_FORM_flag value
 : DW_FORM_string string
 : DW_FORM_block block

DW_FORM_string
 : '\x08'

DW_FORM_block
 : '\x09'

DW_FORM_flag
 : '\x0c'

DW_FORM_sdata
 : '\x0d'

value
 : <LEB128>

block
 : <ULEB128> <fixed-length-block> # The ULEB128 gives the length of the block

back-ref
 : <ULEB128>

string
 : <null-terminated-string>

empty
 :


Appendix F -- Version Numbers

  In Figure 80, add the following row:

    .debug_types    -    -    4

  Under "Notes", add a new bullet point:

    * The version number for the .debug_info section and
      the .debug_types section should always match.


Revision History
----------------

December 2, 2008

- Changed DW_FORM_sig8 to DW_FORM_ref_sig8.

- Added new material to first paragraph of Chapter 3.

- Added additional material to non-normative text in
  Section 3.2.

- Modified algorithm in Section 7.5.2: (a) include trailing
  NULL when checksumming strings; (b) use LEB128 format when
  checksumming integral values, tag codes, and attribute
  codes; (c) include context with the name when checksumming
  pointer, reference, and friend DIEs; (d) include the
  starting type as first element of the list of visited
  types; (e) visit children in order; (f) canonical encoding
  of FORM_flag; (g) mention vendor extensions.

- Added note to Appendix F about matching .debug_info and
  .debug_types version numbers.

December 8, 2008

- Added DW_TAG_pointer_to_member_type to Step 5 in Section 7.5.2.

- Added rule for block-valued attributes in Section 7.5.2.

- Minor editorial changes suggested by David Gross.

January 26, 2009

- Added description of the structure of a type unit to
  Section 3.2.

- Updated the description of the algorithm in Section 7.5.2:

  - So that it can be applied recursively to non-type debug
    entries.

  - Include trailing null byte when appending names in
    Step 2.

  - Added DW_AT_containing_type, DW_AT_explicit,
    DW_AT_mutable, DW_AT_virtuality, and
    DW_AT_vtable_elem_location to the list of hashable
    attributes in Step 4.

  - Removed DW_AT_start_scope from the list of hashable
    attributes in Step 4.

  - Include attributes provided indirectly via
    DW_AT_specification attribute in Step 4.

  - Clarified the treatment of vendor-specific attributes in
    Step 4.

  - Added case for subprogram friends in Step 5.

  - Corrected condition in Step 6 to be the logical negation
    of that of Step 5.

  - Add 1024 when appending references to previously-visited
    types in Step 6, to remove ambiguity with existing DWARF
    tags.

  - Modified Step 7 to append names of nested types.

- Added text to prefer simplest of alternative forms for
  purposes of forming the signature in Section 7.5.2.

- Added non-normative text in Section 7.5.2 listing cases
  where a type should not be placed in a separate type unit.

- Added non-normative text in Section 7.5.2 providing
  explanations for why certain attributes are not included
  in the signature.

- Added text for Appendix E.

February 27, 2009

- In Section 7.4, added .debug_types table to the list of
  sections that contain section offset and length fields.

- In Section 7.5.2, added additional attributes pending
  approval from other proposals.

- In Section 7.5.2, clarified the effect of DW_AT_specification
  attributes.

- Modified the algorithm in Section 7.5.2 to produce a more
  reversible signature string.

- For block-valued attributes, added the length of the block
  as part of the signature string.

- In Section 7.5.2, added non-normative text with additional
  rationale.

- In Section 7.5.4, clarified wording of the second type
  of reference.

- Minor editorial changes.


March 24, 2009

- In Section 7.5.2, made further improvements to the
  reversability of the signature string. Added Additional
  marker letters and added form codes.

- In Section 7.5.2, further clarified the effect of the
  DW_AT_specification attribute.

- In Section 7.5.2, added further clarification of the
  treatment of attributes that reference other debug entries.

- Minor editorial changes.

April 6, 2009

- In Section 7.5.2, restored previously deleted text about
  processing attributes that refer to other type entries.

- In Section 7.5.2, added DW_FORM_string as an allowed form.

- In Section 7.5.2, added non-normative text disqualifying
  debug entries from separate type units.

- Minor editorial changes.

- Revised Appendix E

April 24, 2009

- In Section 7.5.2, changed the rules for processing
  references to other types in Step 4, added an additional
  marker letter in Step 5, and changed Step 6 to be
  consistent with the rule for references to other types
  in Step 4.

May 29, 2009

- Add grammar to Appendix E

--

Accepted.