||Introduce DW_FORM_addr_offset paired form
Section 7.5.3, pg 207
ELF relocations take a significant amount of object file space -
especially/even when using Split DWARF and ELF's section compression
(SHF_COMPRESSED) (since the relocations themselves are not compressed -
so they have an outsized impact).
Various change in the past have helped reduce the number of relocations -
including the ability to use low_pc-relative offsets in high_pc, the
DWARFv5 rnglists and loclists encodings that allow sharing
addresses/relocations with .debug_info via .debug_addr (and reusing
addresses between entries in those loc/rnglists (& rather than using
two relocatable addresses for each address range in the old
.debug_ranges/.debug_loc encoding)). But there are a few outstanding
cases of sharing that could still be helpful:
1) low_pcs within a range - eg: two functions in the same section
or a scope within a function still end up using separate relocations
when one could be described relative to the other. This can be
addressed by using a range encoding (& then using addrx forms
in the rnglist) even in the case of a contiguous range (Clang
implements this under -mllvm -minimize-addr-in-v5=Ranges, so
we have data on the cost/benefit of this choice)
2) Addresses used in non-range-encoding parts of DWARF, such as
DW_AT_label or DW_TAG_call_site. These can't use ranges to share
One considered solution was to use a DWARF expression for these
cases (it'd be more compact for (1) even compared to using ranges,
and allow relocation sharing in (2) where it isn't otherwise possible).
Though the full generality of DWARF expressions in this novel location
is probably overkill - no consumers, to my knowledge, currently
expect/support a DWARF expression in these address attributes.
The other option, proposed here, is to add a special-case form,
DW_FORM_addr_offset. This form would be special (like DW_FORM_indirect
and DW_FORM_const_value) in that in the abbreviation table it would
include two forms, not one. One form encodes an address (presumably
an addrx address form, but that's not a requirement) and the other
would encode an offset (for Clang, at least, currently we always
produce offsets as data4, for instance).
The reason for the custom form is that otherwise we get into the
territory of defining forms for every combination of encodings
for address and offset - especially around the offset, do we have
separate forms for addrx+data4 and addrx+udata (uleb encoded size)?
This novel two-piece form avoids that problem and allows implementation
flexibility about the encodings.
Data suggests that this custom form has a bit better space cost/savings
than the expression, so I'm proposing that here.
The data I have is a clang self-host build, with Split DWARF and
compression enabled in .o/.dwo files, but not exe/dwp files:
o dwo o+dwo exe dwp exe+dwp
expressions -9.21% -2.90% -4.33% -4.45% -0.45% -0.83%
form -9.21% -3.42% -4.73% -4.45% -3.47% -3.56%
These improvements are relative to the "ranges" feature described
above, that already helps reduce relocation costs significantly.
Other build configurations - different uses of compression, non-split
DWARF, etc, etc, will have somewhat different results, but at least
the above data was enough to motivate my work on this feature.
7.5.3, page 207:
After the "DW_FORM_implicit_const" paragraph, add a new paragraph:
"The attribute form DW_FORM_addr_offset is another special case. For
attributes with this form, the attribute specification contains not one,
but two unsigned LEB128 numbers each representing a form. The first
form must be of class address and the second of class constant. Values
using this form in the .debug_info section contain a value for the first
form followed by a value for the second form. The total value of the
DW_FORM_addr_offset is then computed by adding those two values together
(if the first value is an indirect address, that is resolved first
before adding it to the second value)."
7.6, Page 221:
A new row:
DW_FORM_addr_offset | 0x2d | address