Issue 130313.2: Indirect Address Table (Split DWARF, part 2/5)
Section , pg
Introduction
------------
With the current DWARF format, the debug information is designed
with the expectation that it will be processed by the linker to
produce an output binary with complete debug information, and
with fully-resolved references to locations within the
application. For very large applications, however, this approach
can result in excessively large link times and excessively large
output files. Several vendors have independently developed
proprietary approaches that allow the debug information to remain
in the relocatable object files, so that the linker does not have
to process the debug information or copy it to the output file.
These approaches have all required that additional information be
made available to the debug information consumer, and that the
consumer perform some minimal amount of relocation in order to
interpret the debug info correctly. The additional information
required, in the form of load maps or symbol tables, and the
details of the relocation are not covered by the DWARF
specification, and vary with each vendor's implementation.
For more background information, see the GCC wiki page at:
http://gcc.gnu.org/wiki/DebugFission
This is the second in a series of proposals to extend the DWARF
specification to support the use of unlinked and unrelocated
debug information.
* The first proposal introduces a mechanism for referring to
strings indirectly, collecting the string offsets into a new
section, and introducing a new form code for this purpose.
* This second proposal introduces a similar indirect mechanism
for referring to relocatable addresses in the loadable
segments of the program, collecting all such addresses in a
single section.
* The third proposal introduces a mechanism for referring to
range table entries in the .debug_ranges section with a
single relocation per compilation unit rather than one for
each reference.
* The fourth proposal introduces a mechanism for splitting the
DWARF debugging information into two sets of sections. One
set remains in the relocatable object (.o) files, and is
linked into the final executable; the other set is written to
a non-relocatable DWARF object (.dwo) file, and remains in
the build tree as an independent file where the debugger can
reference it as needed.
* The fifth proposal introduces a package file format for
collecting the DWARF object (.dwo) files so that the
debugging information can be easily saved and transported for
debugging applications outside of the build tree.
Overview
--------
In order to reduce the quantity of relocations that reference
code and data in the program, and to remove those relocations
from the .debug_info and .debug_types sections, we propose a new
.debug_addr section that contains all such addresses collected
into one section. We add a new form and two new location
expression operators that reference a location by its index in
the .debug_addr section. The DW_FORM_addrx form code and the
DW_OP_addrx and DW_OP_constx operators each take an unsigned
LEB128 value to specify the index.
When there are multiple references from the debug information to
one address, this allows those references to be coalesced into a
single entry in the .debug_addr section, with a single
relocation, greatly reducing the number of relocations required
for the debug information.
Changes to the DWARF Specification
----------------------------------
Section 2.2 ("Attribute Types")
In Figure 2, add the following rows:
Attribute name Identifies or Specifies
-------------- -----------------------
DW_AT_addr_base Base offset for address table
Secton 2.5.1.1 ("Literal Encodings")
Add the following items to the numbered list:
7. DW_OP_addrx
The DW_OP_addrx operation has a single operand that
encodes an unsigned LEB128 value, which is a zero-based
index into the .debug_addr section, where a machine
address is stored.
8. DW_OP_constx
The DW_OP_constx operation has a single operand that
encodes an unsigned LEB128 value, which is a zero-based
index into the .debug_addr section, where a constant, the
size of a machine address, is stored.
<non-normative>
The DW_OP_constx operation is provided for constants that
require link-time relocation but should not be
interpreted by the consumer as a relocatable address
(e.g., offsets to thread-local storage).
</non-normative>
Section 3.1.1 ("Normal and Partial Compilation Unit Entries")
In the numbered list introduced by "Compilation unit entries may
have the following attributes:", add the following item:
13. A DW_AT_addr_base attribute, whose value is a reference.
This attribute points to the beginning of the compilation
unit's contribution to the .debug_addr section. Indirect
references (using DW_FORM_addrx, DW_OP_addrx, or
DW_OP_constx) within the compilation unit must be
interpreted as indexes relative to this base.
Section 7.5.4 ("Attribute Encodings")
Replace the paragraph describing the "address" class with the
following:
Represented as an object of appropriate size to hold an
address on the target machine (DW_FORM_addr), or as an
indirect index into a table of such addresses in the
.debug_addr section (DW_FORM_addrx). The size of an address
is encoded in the compilation unit header (see Section
7.5.1.1). This address is relocatable in a relocatable object
file and is relocated in an executable file or shared object.
The representation of a DW_FORM_addrx value is an unsigned
LEB128 value, which is interpreted as a zero-based index into
an array of addresses in the .debug_addr section.
In Figure 20, add the following row to the table:
Attribute name Value Classes
-------------- ----- -------
DW_AT_addr_base [TBA] address
In Figure 21, add the following row to the table:
Form name Value Class
--------- ----- -----
DW_FORM_addrx [TBA] address
Section 7.7.1 ("DWARF Expressions")
In Figure 24, add the following row to the table:
Operation Code No. of Operands Notes
 --------- ---- --------------- -----
DW_OP_addrx [TBA] 1 indirect address
DW_OP_constx [TBA] 1 indirect constant
In Chapter 7, add a new section after Section 7.24:
7.xx Address Table
Each set of entries in the address table contained in
the .debug_addr section begins with a header containing:
1. unit_length (initial length)
A 4-byte or 12-byte length containing the length of
the set of entries for this compilation unit, not
including the length field itself. In the 32-bit
DWARF format, this is a 4-byte unsigned integer
(which must be less than 0xfffffff0); in the 64-bit
DWARF format, this consists of the 4-byte value
0xffffffff followed by an 8-byte unsigned integer
that gives the actual length (see Section 7.4).
2. version (uhalf)
A 2-byte version identifier containing the value 1
(see Appendix F).
3. address_size (ubyte)
A 1-byte unsigned integer containing the size in
bytes of an address (or the offset portion of an
address for segmented addressing) on the target
system.
4. segment_size (ubyte)
A 1-byte unsigned integer containing the size in
bytes of a segment selector on the target system.
This header is followed by a series of segment/address pairs.
The segment size is given by the segment_size field of the
header, and the address size is given by the address_size
field of the header. If the segment_size field in the header
is zero, the entries consist only of an address.
The DW_AT_addr_base attribute must point to the first entry
following the header. The entries are indexed sequentially
from this base entry, starting from 0.
Section 7.25 ("Dependencies and Constraints")
Replace the first sentence with the following:
The debugging information in this format is intended to exist
in the .debug_abbrev, .debug_aranges, .debug_frame,
.debug_info, .debug_line, .debug_loc, .debug_macinfo,
.debug_pubnames, .debug_pubtypes, .debug_ranges, .debug_str,
.debug_str_offsets, .debug_types, and .debug_addr sections of
an object file, or equivalent separate file or database.
Appendix A ("Attributes by Tag Value")
In Figure 42, in the table entries for DW_TAG_compile_unit and
DW_TAG_partial_unit, add DW_AT_addr_base.
Appendix B ("Debug Section Relationships")
In Figure 43, add an arrow from .debug_info and .debug_types to a
new box for DW_FORM_addrx, and an arrow from that box to a new
section .debug_addr. Add an arrow from .debug_loc to a new box
for DW_OP_addrx and DW_OP_constx, and an arrow from that box to
.debug_addr. The note for both new boxes should read as follows:
.debug_addr
The value of the DW_AT_addr_base attribute in the
DW_TAG_compile_unit or DW_TAG_partial_unit DIE is the
offset in the .debug_addr section of the machine
addresses for that compilation unit or type unit.
Appendix F ("DWARF Section Version Numbers")
In Figure 97, add the following row:
Section Name V2 V3 V4 V5
------------ -- -- -- --
.debug_addr x x x 5
---
Revised March 25. 2013
Revised May 26, 2013: Change .debug_addr version to 5.
Accepted with modifications April 23, 2013.