Issue 080930.1: Using COMDAT Sections to Reduce the Size of DWARF Debug Information
Background
----------
DWARF debugging information for a typical C++ application can consume
a large amount of disk space in both the relocatable object files and
the final executable or shared library. Depending on the application
and compilation options, the debug information can consume as much as
75% of the object file.
The bulk of the debug information is in the .debug_info section, the
bulk of that section contains type information, and the bulk of the
type information is made up of duplicate copies of types that are
emitted by the compiler in each compilation unit.
This proposal extends the DWARF format to allow the compiler to place
type information in separate .debug_types sections, such that the
linker can eliminate duplicates using the COMDAT mechanism, which is
now a fairly standard feature of modern linkers.
For complete details of the approach, see the following Wiki page:
http://wiki.dwarfstd.org/index.php?title=COMDAT_Type_Sections
Overview
--------
A new .debug_types section is defined, which will contain "type
units." These are similar to the compile units that can be found in
the .debug_info section, but each type unit will describe exactly one
type. The top-level DIE of a type unit has a new tag,
DW_TAG_type_unit.
Each type that is placed in a type unit will have a unique 64-bit
signature, generated by the compiler using a hash algorithm that
operates on the DWARF definition of the type.
References to types contained in type units are made using the 64-bit
signature rather than a direct reference to the DIE that describes the
type. These references are represented in the DWARF information using
a new form, DW_FORM_ref_sig8, which is a member of the reference
class, and can be used wherever a DW_FORM_ref4, for example, would be
used to refer to a type.
Proposed Changes to the DWARF Specification
-------------------------------------------
These changes are keyed to the June 24, 2008, working draft of the DWARF
Version 4 specification.
Section 2.1, The Debugging Information Entry
In Figure 1, add DW_TAG_type_unit.
Change the last paragraph of the section as follows:
The debugging information entries are intended to exist
in the .debug_info and .debug_types sections of an
object file.
Section 2.2, Attribute Types
In Figure 2, add the following row to the table:
DW_AT_signature Type signature
In Figure 3, change the description of the reference class in the table
as follows:
Refers to one of the debugging information entries that
describe the program. There are three types of
reference. The first is an offset relative to the
beginning of the compilation unit in which the reference
occurs and must refer to an entry within that same
compilation unit. The second type of reference is the
offset of a debugging information entry in any
compilation unit, including one different from the unit
containing the reference. The third type of reference is
an indirect reference to a type definition, based on the
64-bit signature of that type.
Chapter 3, Program Scope Entries
In the first paragraph, change the final sentence to:
Except for separate type entries (see Section 3.2),
these entries may be thought of as bounded by ranges
of text addresses within the program.
Add a new section after Section 3.1:
3.2 Separate Type Entries
An object file may contain any number of separate type
unit entries, each representing a single complete type
definition. Each type unit must be uniquely identified
by a 64-bit signature, stored as part of the type unit,
which can be used to reference the type definition from
debugging entries in compilation units and other type
units.
A type unit is represented by a debugging information
entry with the tag DW_TAG_type_unit. A type unit entry
owns debugging information entries that represent the
definition of a single type, plus additional debugging
information entries that may be necessary to include as
part of the definition of the type.
A type unit entry may have a DW_AT_language attribute,
whose constant value is an integer code indicating the
source language used to define the type. The set of
language names and their meanings are given in Figure 8.
A type unit entry for a given type T may have three
kinds of children:
1. A tree representing the defining declaration of
type T.
2. A tree containing a declaration of type T, enclosed
in any nested types and/or namespaces. The
declaration of the type will have a DW_AT_declaration
attribute, and the defining declaration will have a
DW_AT_specification attribute that refers to the
declaration entry. (Required only if the type is
nested inside another type or namespace.)
3. Additional trees as necessary, each containing a
declaration of a type that is referenced by type T
but has not been placed in its own separate type
unit.
Alternatively, for nested types, the defining
declaration (1) may be placed directly within the
declaration tree (2), avoiding the use of
DW_AT_specification and DW_AT_declaration attributes.
*Not all types are required to be placed in type units.
In general, only structure, class, enumeration, and
union types included from header files should be
considered for separate type units. Base types and other
small types are not usually worth the overhead of
placing in separate type units. Types that are unlikely
to be replicated, such as those defined in the main
source file, are also better left in the main
compilation unit.*
Chapter 5, Type Entries
Section 5.6.1, Structure, Union and Class Type Entries
Add the following paragraph after the paragraph beginning "An
incomplete structure...":
If the complete declaration of a type has been placed in
a separate type unit, an incomplete declaration of that
type in the compilation unit may provide the unique
64-bit signature of the type with a DW_AT_signature
attribute.
Chapter 7, Data Representation
Section 7.4, 32-Bit and 64-Bit DWARF Formats
In the table of section offset and section length fields in
Item #2, add the following rows:
.debug_types debug_abbrev_offset offset in .debug_abbrev
.debug_types type_offset offset in .debug_types
Section 7.5, Format of Debugging Information
Add a new section after Section 7.5.1:
7.5.2 Type Unit Header
The header for the series of debugging information
entries contributing to the description of a type that
has been placed in its own type unit consists of the
following information:
1. unit_length (initial length)
[same as 7.5.1]
2. version (uhalf)
[same as 7.5.1]
3. debug_abbrev_offset (section offset)
[same as 7.5.1]
4. address_size (ubyte)
[same as 7.5.1]
5. type_signature (8-byte unsigned integer)
A 64-bit unique signature of the type described in
this type unit.
6. type_offset (section offset)
A 4-byte or 8-byte unsigned offset relative to the
beginning of the type unit header. This offset refers
to the debugging information entry that describes the
type. Because the type may be nested inside a
namespace or other structures, and may contain
references to other types that have not been placed
in separate type units, it is not necessarily either
the first or the only entry in the type unit.
In the 32-bit DWARF format, this is a 4-byte unsigned
length; in the 64-bit DWARF format, this is an 8-byte
unsigned length (see Section 7.4).
The type_signature is computed by the DWARF producer
only; a DWARF consumer needs to resolve type references
to the type definitions that are contained in type units
based on the signature. The signature is formed from the
MD5 hash of a flattened description of the type. The
flattened description of the type is a byte sequence
derived from the DWARF encoding of the type as follows:
1. Start with an empty sequence S and a list V of
visited types, where V is initialized to a list
containing the starting type as its single element.
Elements in V are indexed from 1, so that V[1] is the
starting type.
2. If the debug entry represents a type that is nested
inside another type or a namespace, append to S the
following for each surrounding type or namespace,
beginning with the outermost such construct: the
letter 'C', the DWARF tag of the construct, and the
name (taken from the DW_AT_name attribute) of the
type or namespace (including its trailing null byte).
3. Append to S the letter 'D', followed by the DWARF tag
of the debug entry.
4. For each of the following attributes that are present
in the debug entry, in the order listed below, append
to S a marker letter (see below), the DWARF attribute
code, and the attribute value:
DW_AT_name
DW_AT_accessibility
DW_AT_address_class
DW_AT_allocated
DW_AT_artificial
DW_AT_associated
DW_AT_binary_scale
DW_AT_bit_offset
DW_AT_bit_size
DW_AT_bit_stride
DW_AT_byte_size
DW_AT_byte_stride
DW_AT_const_expr [pending approval of 090107.1]
DW_AT_const_value
DW_AT_containing_type
DW_AT_count
DW_AT_data_location
DW_AT_data_member_location
DW_AT_decimal_scale
DW_AT_decimal_sign
DW_AT_default_value
DW_AT_digit_count
DW_AT_discr
DW_AT_discr_list
DW_AT_discr_value
DW_AT_encoding
DW_AT_endianity
DW_AT_explicit
DW_AT_is_optional
DW_AT_location
DW_AT_lower_bound
DW_AT_mutable
DW_AT_ordering
DW_AT_picture_string
DW_AT_prototyped
DW_AT_small
DW_AT_segment
DW_AT_string_length
DW_AT_threads_scaled
DW_AT_upper_bound
DW_AT_use_location
DW_AT_use_UTF8
DW_AT_use_UTF16 [pending approval of 090109.1]
DW_AT_use_UTF32 [pending approval of 090109.1]
DW_AT_variable_parameter
DW_AT_virtuality
DW_AT_visibility
DW_AT_vtable_elem_location
If an implementation defines any vendor-specific
attributes, any such attributes that are essential to
the definition of the type should also be included in
the above list at fixed positions defined by the
vendor.
An attribute that refers to another type entry T is
processed as follows: (a) If T is in the list V at
some V[x], use the letter 'R' as the marker and use
the unsigned LEB128 encoding of x as the attribute
value; otherwise, (b) use the letter 'T' as the
marker, process the type T recursively by performing
Steps 2 through 7, using the result as the attribute
value.
Other attribute values use the letter 'A' as the
marker, and the value consists of the form code
(encoded as an unsigned LEB128 value) followed by the
encoding of the value according to the form code. To
ensure reproducibility of the signature, the set of
forms used in the signature computation is limited to
the following: DW_FORM_sdata, DW_FORM_flag,
DW_FORM_string, and DW_FORM_block.
5. If the tag is one of DW_TAG_pointer_type,
DW_TAG_reference_type, DW_TAG_rvalue_reference_type
[pending approval of 090106.1],
DW_TAG_ptr_to_member_type, or DW_TAG_friend, and the
referenced type (via the DW_AT_type or DW_AT_friend
attribute) has a DW_AT_name attribute, append to S
the letter 'N', the DWARF attribute code (DW_AT_type
or DW_AT_friend), the context of the type (according
to the method in Step 2), the letter 'E', and the
name of the type. For DW_TAG_friend, if the
referenced debug entry is a DW_TAG_subprogram, the
context is omitted and the name to be used is the
ABI-specific name of the subprogram (e.g., the
mangled linker name).
6. If the tag is not one of DW_TAG_pointer_type,
DW_TAG_reference_type, DW_TAG_rvalue_reference_type
[pending approval of 090106.1],
DW_TAG_ptr_to_member_type, or DW_TAG_friend, but has
a DW_AT_type attribute, or if the referenced type
(via the DW_AT_type or DW_AT_friend attribute) does
not have a DW_AT_name attribute, the attribute is
processed according to the method in Step 4 for an
attribute that refers to another type entry.
7. Visit each child C of the debug entry as follows: If
C is a nested type entry or a member function entry,
and has a DW_AT_name attribute, append to S the
letter 'S', the tag of C, and its name; otherwise,
process C recursively by performing Steps 3 through
7, appending the result to S. Following the last
child (or if there are no children), append a zero
byte.
For the purposes of this algorithm, if a debug entry has
a DW_AT_specification attribute that refers to another
debug entry (which has a DW_AT_declaration attribute),
then the two debug entries are processed as a single
entry, with all the attributes and children of the
specification combined into the declaration.
DWARF tag and attribute codes are appended to the
sequence as unsigned LEB128 values, using the values
defined later in this chapter.
*An attribute that refers to another debug entry should
be recursively processed or replaced with the name of
the referent (in Step 5 or 6). If neither treatment
applies to an attribute that references another debug
entry, the entry that contains that attribute should not
be considered for a separate type unit.
If a debug entry contains an attribute from the list
above that would require an unsupported form, that debug
entry should not be considered for a separate type unit.
A type should be considered for a separate type unit
only if all of the debug entries that it contains or
refers to in Steps 6 and 7 can themselves be considered
for a separate type unit.*
Where the DWARF producer may reasonably choose two or
more different forms for a given attribute, it should
choose the simplest possible form in computing the
signature. (For example, a constant value should be
preferred to a location expression when possible.)
Once the string S has been formed from the DWARF
encoding, an MD5 hash is computed for the string and the
lower 64 bits are taken as the type signature.
*The string S is intended to be a flattened
representation of the type that uniquely identifies that
type (i.e., a different type is unlikely to produce the
same string).
If the value of an attribute is a location expression,
and the location expression contains a reference to
another debug entry (e.g., a DW_OP_call_ref operator),
it is unlikely that the debug entry will remain
identical across compilation units, and it should not be
placed in a separate type unit.
If an attribute refers to a code location or a location
list, the debug entry should not be placed in a separate
type unit.
If an attribute refers to another debug entry that does
not represent a type, the debug entry should not be
placed in a separate type unit.
The DW_AT_declaration attribute is not included in the
signature because it indicates that the debug entry
represents an incomplete declaration, and incomplete
declarations should not be placed in separate type
units.
The DW_AT_description attribute is not included because
it does not provide any information unique to the
defining declaration of the type.
The DW_AT_decl_file, DW_AT_decl_line, and
DW_AT_decl_column attributes are not included because
they may vary from one source file to the next, and
would prevent two otherwise identical type declarations
from producing the same hash.
The DW_AT_object_pointer attribute is not included
because the information it provides is not necessary for
the computation of a unique type signature.
Nested types and some types referred to by a debug entry
are encoded by name rather than by recursively encoding
the type to allow for cases where a complete definition
of the type might not be available in all compilation
units.*
Section 7.5.4 [original numbering], Attribute Encodings
Under the "reference" class, change "There are two types
of reference" to "There are three types of reference".
In the paragraph that begins "The second type of
reference...", replace "can identify any debugging
information entry within a program" with "can identify any
debugging information entry in a .debug_info section".
After the paragraph beginning "The second type of
reference," add the following paragraph:
The third type of reference can identify any debugging
information type entry that has been placed in its own
type unit. This type of reference (DW_FORM_ref_sig8) is
the 64-bit type signature that was computed for the
type.
In Figure 18, add the following row:
DW_TAG_type_unit 0x41
In Figure 20, add the following row:
DW_AT_signature 0x69
In Figure 21, add the following row:
DW_FORM_ref_sig8 0x20
Appendix A -- Attributes by Tag Value
In Figure 42, add DW_AT_signature to the following rows:
DW_TAG_class_type
DW_TAG_enumeration_type
DW_TAG_structure_type
DW_TAG_union_type
In Figure 42, add the following row:
DW_TAG_type_unit DW_AT_language
Appendix B -- Debug Section Relationships
In Figure 43, add ".debug_types" to the circle currently
labelled ".debug_info".
Appendix E -- DWARF Compression and Duplicate Elimination (informative)
I propose to restructure this appendix a bit. Currently there are four sections:
E.1 Overview
E.2 Naming and Usage Considerations
E.3 Examples
E.4 Summary of Compression Techniques
Sections E.1 through E.3 are really all about the per-header file
compression technique, as summarized in E.4.1. The new structure would
be:
E.1 One Compilation Unit per Header File
E.1.1 Overview [Original E.1]
E.1.2 Naming and Usage Considerations [Original E.2]
E.1.3 Examples [Original E.3]
E.2 Using Type Units to Eliminate Duplicate Types
[New content]
E.3 Summary of Compression Techniques [Original E.4]
E.3.1 #include compression [Original E.4.1]
E.3.2 Eliminating function duplication [Original E.4.2]
E.3.3 Single-function-per-DWARF-compilation-unit [Original E.4.3]
E.3.4 Inlining and out-of-line instances [Original E.4.4]
E.3.5 Separate type units [New content]
The proposed contents of the new Sections E.2 and E.3.5 are given here:
E.2 Using Type Units to Eliminate Duplicate Types
A large portion of debug information is type information, and in a
typical compilation environment, many types are duplicated many times.
One method of controlling the amount of duplication is separating each
type into a separate .debug_types section and arranging for the linker
to recognize and eliminate duplicates at the individual type level.
Using this technique, each substantial type definition is placed in
its own individual section, while the remainder of the DWARF
information (non-type information, incomplete type declarations, and
definitions of trivial types) is placed in the usual debug information
section. In a typical implementation, the relocatable object file may
contain one of each of these debug sections:
.debug_abbrev
.debug_info
.debug_line
and any number of these additional sections:
.debug_types
As discussed in the previous section [Section E.1], many linkers today
support the concept of a COMDAT group or linkonce section. The general
idea is that a "key" can be attached to a section or a group of
sections, and the linker will include only one copy of a section group
(or individual section) for any given key. For .debug_types sections,
the key is the signature formed from the algorithm given in Section
7.5.2.
As an example, consider a C++ header file containing the following
type definitions:
1 namespace N {
2
3 struct B;
4
5 struct C {
6 int x;
7 int y;
8 };
9
10 class A {
11 public:
12 A(int v)
13 : v_(v), next(NULL), bp(NULL), c()
14 { }
15 int v()
16 { return v_; }
17 private:
18 int v_;
19 struct A *next;
20 struct B *bp;
21 struct C c;
22 };
23
24 }
Let us first consider one possible representation of the DWARF
information that describes the type "struct C":
DW_TAG_type_unit
DW_AT_language: DW_LANG_C_plus_plus (4)
DW_TAG_namespace
DW_AT_name: "N"
L1:
DW_TAG_class_type
DW_AT_name: "C"
DW_AT_byte_size: 8
DW_AT_decl_file: 1
DW_AT_decl_line: 5
DW_TAG_member
DW_AT_name: "x"
DW_AT_decl_file: 1
DW_AT_decl_line: 6
DW_AT_type: &L2
DW_AT_data_member_location: 0
DW_TAG_member
DW_AT_name: "y"
DW_AT_decl_file: 1
DW_AT_decl_line: 7
DW_AT_type: &L2
DW_AT_data_member_location: 4
L2:
DW_TAG_base_type
DW_AT_byte_size: 4
DW_AT_encoding: DW_ATE_signed
DW_AT_name: "int"
In computing a signature for the type N::C, we will flatten the type
description into a byte stream according to the procedure outlined in
Section 7.5.2:
// Step 2: 'C' DW_TAG_namespace "N"
0x43 0x39 0x4e 0x00
// Step 3: 'D' DW_TAG_structure_type
0x44 0x13
// Step 4: 'A' DW_AT_name "C"
0x41 0x03 0x43 0x00
// Step 4: 'A' DW_AT_byte_size 8
0x0b 0x08
// Step 7: First child ("x")
// Step 3: 'D' DW_TAG_member
0x44 0x0d
// Step 4: 'A' DW_AT_name "x"
0x41 0x03 0x78 0x00
// Step 4: 'A' DW_AT_data_member_location 0
0x41 0x38 0x00
// Step 6: 'T' (type #2)
0x54
// Step 3: 'D' DW_TAG_base_type
0x44 0x24
// Step 4: 'A' DW_AT_name "int"
0x41 0x03 0x69 0x6e 0x74 0x00
// Step 4: 'A' DW_AT_byte_size 4
0x41 0x0b 0x04
// Step 4: 'A' DW_AT_encoding DW_ATE_signed
0x41 0x3e 0x05
// Step 7: End of DW_TAG_base_type "int"
0x00
// Step 7: End of DW_TAG_member "x"
0x00
// Step 7: Second child ("y")
// Step 3: 'D' DW_TAG_member
0x44 0x0d
// Step 4: 'A' DW_AT_name "y"
0x41 0x03 0x78 0x00
// Step 4: 'A' DW_AT_data_member_location 4
0x41 0x38 0x04
// Step 6: 'R' DW_AT_type (type #2)
0x52 0x49 0x02
// Step 7: End of DW_TAG_member "y"
0x00
// Step 7: End of DW_TAG_structure_type "C"
0x00
Running an MD5 hash over this byte stream, and taking the low-order 64
bits, yields the final signature: 0xb0dbb00a bd4cf18f.
Next, we consider a representation of the DWARF information that
describes the type "class A":
DW_TAG_type_unit
DW_AT_language: DW_LANG_C_plus_plus (4)
DW_TAG_namespace
DW_AT_name: "N"
L1:
DW_TAG_class_type
DW_AT_name: "A"
DW_AT_byte_size: 20
DW_AT_decl_file: 1
DW_AT_decl_line: 10
DW_TAG_member
DW_AT_name: "v_"
DW_AT_decl_file: 1
DW_AT_decl_line: 18
DW_AT_type: &L2
DW_AT_data_member_location: 0
DW_TAG_member
DW_AT_name: "next"
DW_AT_decl_file: 1
DW_AT_decl_line: 19
DW_AT_type: &L3
DW_AT_data_member_location: 4
DW_TAG_member
DW_AT_name: "bp"
DW_AT_decl_file: 1
DW_AT_decl_line: 20
DW_AT_type: &L4
DW_AT_data_member_location: 8
DW_TAG_member
DW_AT_name: "c"
DW_AT_decl_file: 1
DW_AT_decl_line: 21
DW_AT_type: 0xb0dbb00a bd4cf18f (signature for struct C)
DW_AT_data_member_location: 12
DW_TAG_subprogram
DW_AT_external: 1
DW_AT_name: "A"
DW_AT_decl_file: 1
DW_AT_decl_line: 12
DW_AT_declaration: 1
DW_TAG_formal_parameter
DW_AT_type: &L3
DW_AT_artificial: 1
DW_TAG_formal_parameter
DW_AT_type: &L2
DW_TAG_subprogram
DW_AT_external: 1
DW_AT_name: "v"
DW_AT_decl_file: 1
DW_AT_decl_line: 15
DW_AT_type: &L2
DW_TAG_formal_parameter
DW_AT_type: &L3
DW_AT_artificial: 1
L2:
DW_TAG_base_type
DW_AT_byte_size: 4
DW_AT_encoding: DW_ATE_signed
DW_AT_name: "int"
L3:
DW_TAG_pointer_type
DW_AT_type: &L1
L4:
DW_TAG_pointer_type
DW_AT_type: &L5
DW_TAG_namespace
DW_AT_name: "N"
L5:
DW_TAG_structure_type
DW_AT_name: "B"
DW_AT_declaration: 1
In this example, the structure types N::A and N::C have each been
placed in separate type units. For N::A, the actual definition of the
type begins at label L2. The definition involves references to the int
base type and to two pointer types. The information for each of these
referenced types is also included in this type unit, since base types
and pointer types are trivial types that are not worth the overhead of
a separate type unit. The last pointer type contains a reference to an
incomplete type N::B, which is also included here as a declaration,
since the complete type is unknown and its signature is therefore
unavailable. There is also a reference to N::C, using DW_FORM_sig8 to
refer to the type signature for that type.
In computing a signature for the type N::A, we will flatten the type
description into a byte stream according to the procedure outlined in
Section 7.5.2:
// Step 2: 'C' DW_TAG_namespace "N"
0x43 0x39 0x4e 0x00
// Step 3: 'D' DW_TAG_class_type
0x44 0x02
// Step 4: 'A' DW_AT_name "A"
0x41 0x03 0x41 0x00
// Step 4: 'A' DW_AT_byte_size 20
0x41 0x0b 0x14
// Step 7: First child ("v_")
// Step 3: 'D' DW_TAG_member
0x43 0x0d
// Step 4: 'A' DW_AT_name "v_"
0x41 0x03 0x76 0x5f 0x00
// Step 4: 'A' DW_AT_data_member_location 0
0x41 0x38 0x00
// Step 6: 'T' (type #2)
0x54
// Step 3: 'D' DW_TAG_base_type
0x44 0x24
// Step 4: 'A' DW_AT_name "int"
0x41 0x03 0x69 0x6e 0x74 0x00
// Step 4: 'A' DW_AT_byte_size 4
0x41 0x0b 0x04
// Step 4: 'A' DW_AT_encoding DW_ATE_signed
0x41 0x3e 0x05
// Step 7: End of DW_TAG_base_type "int"
0x00
// Step 7: End of DW_TAG_member "v_"
0x00
// Step 7: Second child ("next")
// Step 3: 'D' DW_TAG_member
0x43 0x0d
// Step 4: 'A' DW_AT_name "next"
0x41 0x03 0x6e 0x65 0x78 0x74 0x00
// Step 4: 'A' DW_AT_data_member_location 4
0x41 0x38 0x04
// Step 6: 'T' (type #3)
0x54
// Step 3: 'D' DW_TAG_pointer_type
0x44 0x0f
// Step 5: 'N' DW_AT_type
0x4e 0x49
// Step 5: 'C' DW_AT_namespace "N"
0x43 0x39 0x4e 0x00
// Step 5: "A"
0x41 0x00
// Step 7: End of DW_TAG_pointer_type
0x00
// Step 7: End of DW_TAG_member "next"
0x00
// Step 7: Third child ("bp")
// Step 3: 'D' DW_TAG_member
0x43 0x0d
// Step 4: 'A' DW_AT_name "bp"
0x41 0x03 0x62 0x70 0x00
// Step 4: 'A' DW_AT_data_member_location 4
0x41 0x38 0x08
// Step 6: 'T' (type #4)
0x54
// Step 3: 'D' DW_TAG_pointer_type
0x44 0x0f
// Step 5: 'N' DW_AT_type
0x4e 0x49
// Step 5: 'C' DW_AT_namespace "N"
0x43 0x39 0x4e 0x00
// Step 5: "B"
0x42 0x00
// Step 7: End of DW_TAG_pointer_type
0x00
// Step 7: End of DW_TAG_member "next"
0x00
// Step 7: Fourth child ("c")
// Step 3: 'D' DW_TAG_member
0x44 0x0d
// Step 4: 'A' DW_AT_name "c"
0x41 0x03 0x63 0x00
// Step 4: 'A' DW_AT_data_member_location 12
0x41 0x38 0x0c
// Step 6: 'T' (type #5)
0x54
// Step 2: 'C' DW_TAG_namespace "N"
0x43 0x39 0x4e 0x00
// Step 3: 'D' DW_TAG_structure_type
0x44 0x13
// Step 4: 'A' DW_AT_name "C"
0x41 0x03 0x43 0x00
// Step 4: 'A' DW_AT_byte_size 8
0x41 0x0b 0x08
// Step 7: First child ("x")
// Step 3: 'D' DW_TAG_member
0x44 0x0d
// Step 4: 'A' DW_AT_name "x"
0x41 0x03 0x78 0x00
// Step 4: DW_AT_data_member_location 0
0x38 0x00
// Step 6: 'R' DW_AT_type (type #2)
0x52 0x49 0x02
// Step 7: End of DW_TAG_member "x"
0x00
// Step 7: Second child ("y")
// Step 3: 'D' DW_TAG_member
0x44 0x0d
// Step 4: 'A' DW_AT_name "y"
0x41 0x03 0x79 0x00
// Step 4: DW_AT_data_member_location 4
0x38 0x04
// Step 6: 'R' DW_AT_type (type #2)
0x52 0x49 0x02
// Step 7: End of DW_TAG_member "y"
0x00
// Step 7: End of DW_TAG_structure_type "C"
0x00
// Step 7: End of DW_TAG_member "c"
0x00
// Step 7: Fifth child ("A")
// Step 3: 'S' DW_TAG_subprogram "A"
0x53 0x2e 0x41 0x00
// Step 7: Sixth child ("v")
// Step 3: 'S' DW_TAG_subprogram "v"
0x53 0x2e 0x76 0x00
// Step 7: End of DW_TAG_structure_type "A"
0x00
Running an MD5 hash over this byte stream, and taking the low-order 64
bits, yields the final signature: 0xd681845c 21a14576.
A source file that includes this header file may declare a variable of
type N::A, and its DWARF information may look like the following:
DW_TAG_compile_unit
...
DW_TAG_subprogram
...
DW_TAG_variable
DW_AT_name: "a"
DW_AT_type: (signature) 0xd681845c 21a14576
DW_AT_location: ...
...
E.3.5 Separate type units
Each complete declaration of a globally-visible type can be placed in
its own separate type section, with a group key derived from the type
signature. The linker can then remove all duplicate type declarations
based on the key.
E.3.6 Grammar for COMDAT compression
signature
: opt-context debug-entry attributes children
opt-context # Step 2
: 'C' tag-code string opt-context
: empty
debug-entry # Step 3
: 'D' tag-code
attributes # Steps 4, 5, 6
: attribute attributes
: empty
attribute
: 'A' at-code form-encoded-value # Normal attributes
: 'N' at-code opt-context 'E' string # Reference to type by name
: 'R' at-code back-ref # Back-reference to visited type
: 'T' at-code signature # Recursive type
children # Step 7
: child children
: '\0'
child
: 'S' tag-code string
: signature
tag-code
: <ULEB128>
at-code
: <ULEB128>
form-encoded-value
: DW_FORM_sdata value
: DW_FORM_flag value
: DW_FORM_string string
: DW_FORM_block block
DW_FORM_string
: '\x08'
DW_FORM_block
: '\x09'
DW_FORM_flag
: '\x0c'
DW_FORM_sdata
: '\x0d'
value
: <LEB128>
block
: <ULEB128> <fixed-length-block> # The ULEB128 gives the length of the block
back-ref
: <ULEB128>
string
: <null-terminated-string>
empty
:
Appendix F -- Version Numbers
In Figure 80, add the following row:
.debug_types - - 4
Under "Notes", add a new bullet point:
* The version number for the .debug_info section and
the .debug_types section should always match.
Revision History
----------------
December 2, 2008
- Changed DW_FORM_sig8 to DW_FORM_ref_sig8.
- Added new material to first paragraph of Chapter 3.
- Added additional material to non-normative text in
Section 3.2.
- Modified algorithm in Section 7.5.2: (a) include trailing
NULL when checksumming strings; (b) use LEB128 format when
checksumming integral values, tag codes, and attribute
codes; (c) include context with the name when checksumming
pointer, reference, and friend DIEs; (d) include the
starting type as first element of the list of visited
types; (e) visit children in order; (f) canonical encoding
of FORM_flag; (g) mention vendor extensions.
- Added note to Appendix F about matching .debug_info and
.debug_types version numbers.
December 8, 2008
- Added DW_TAG_pointer_to_member_type to Step 5 in Section 7.5.2.
- Added rule for block-valued attributes in Section 7.5.2.
- Minor editorial changes suggested by David Gross.
January 26, 2009
- Added description of the structure of a type unit to
Section 3.2.
- Updated the description of the algorithm in Section 7.5.2:
- So that it can be applied recursively to non-type debug
entries.
- Include trailing null byte when appending names in
Step 2.
- Added DW_AT_containing_type, DW_AT_explicit,
DW_AT_mutable, DW_AT_virtuality, and
DW_AT_vtable_elem_location to the list of hashable
attributes in Step 4.
- Removed DW_AT_start_scope from the list of hashable
attributes in Step 4.
- Include attributes provided indirectly via
DW_AT_specification attribute in Step 4.
- Clarified the treatment of vendor-specific attributes in
Step 4.
- Added case for subprogram friends in Step 5.
- Corrected condition in Step 6 to be the logical negation
of that of Step 5.
- Add 1024 when appending references to previously-visited
types in Step 6, to remove ambiguity with existing DWARF
tags.
- Modified Step 7 to append names of nested types.
- Added text to prefer simplest of alternative forms for
purposes of forming the signature in Section 7.5.2.
- Added non-normative text in Section 7.5.2 listing cases
where a type should not be placed in a separate type unit.
- Added non-normative text in Section 7.5.2 providing
explanations for why certain attributes are not included
in the signature.
- Added text for Appendix E.
February 27, 2009
- In Section 7.4, added .debug_types table to the list of
sections that contain section offset and length fields.
- In Section 7.5.2, added additional attributes pending
approval from other proposals.
- In Section 7.5.2, clarified the effect of DW_AT_specification
attributes.
- Modified the algorithm in Section 7.5.2 to produce a more
reversible signature string.
- For block-valued attributes, added the length of the block
as part of the signature string.
- In Section 7.5.2, added non-normative text with additional
rationale.
- In Section 7.5.4, clarified wording of the second type
of reference.
- Minor editorial changes.
March 24, 2009
- In Section 7.5.2, made further improvements to the
reversability of the signature string. Added Additional
marker letters and added form codes.
- In Section 7.5.2, further clarified the effect of the
DW_AT_specification attribute.
- In Section 7.5.2, added further clarification of the
treatment of attributes that reference other debug entries.
- Minor editorial changes.
April 6, 2009
- In Section 7.5.2, restored previously deleted text about
processing attributes that refer to other type entries.
- In Section 7.5.2, added DW_FORM_string as an allowed form.
- In Section 7.5.2, added non-normative text disqualifying
debug entries from separate type units.
- Minor editorial changes.
- Revised Appendix E
April 24, 2009
- In Section 7.5.2, changed the rules for processing
references to other types in Step 4, added an additional
marker letter in Step 5, and changed Step 6 to be
consistent with the rule for references to other types
in Step 4.
May 29, 2009
- Add grammar to Appendix E
--
Accepted.