Issue 220802.1: Introduce DW_FORM_addr_offset paired form

Author: David Blaikie
Champion: David Blaikie
Date submitted: 2022-08-02
Date revised:
Date closed: 2023-01-23
Type: Enhancement
Status: Accepted with modifications
DWARF Version: 6
Section 7.5.3, pg 207
Problem:
ELF relocations take a significant amount of object file space - 
especially/even when using Split DWARF and ELF's section compression 
(SHF_COMPRESSED) (since the relocations themselves are not compressed - 
so they have an outsized impact).

Various change in the past have helped reduce the number of relocations - 
including the ability to use low_pc-relative offsets in high_pc, the 
DWARFv5 rnglists and loclists encodings that allow sharing 
addresses/relocations with .debug_info via .debug_addr (and reusing 
addresses between entries in those loc/rnglists (& rather than using 
two relocatable addresses for each address range in the old 
.debug_ranges/.debug_loc encoding)). But there are a few outstanding 
cases of sharing that could still be helpful:

1) low_pcs within a range - eg: two functions in the same section 
or a scope within a function still end up using separate relocations 
when one could be described relative to the other. This can be 
addressed by using a range encoding (& then using addrx forms 
in the rnglist) even in the case of a contiguous range (Clang 
implements this under -mllvm -minimize-addr-in-v5=Ranges, so 
we have data on the cost/benefit of this choice)

2) Addresses used in non-range-encoding parts of DWARF, such as 
DW_AT_label or DW_TAG_call_site. These can't use ranges to share 
the address.


Solutions:
One considered solution was to use a DWARF expression for these 
cases (it'd be more compact for (1) even compared to using ranges, 
and allow relocation sharing in (2) where it isn't otherwise possible). 
Though the full generality of DWARF expressions in this novel location 
is probably overkill - no consumers, to my knowledge, currently 
expect/support a DWARF expression in these address attributes.

The other option, proposed here, is to add a special-case form, 
DW_FORM_addr_offset. This form would be special (like DW_FORM_indirect 
and DW_FORM_const_value) in that in the abbreviation table it would 
include two forms, not one. One form encodes an address (presumably 
an addrx address form, but that's not a requirement) and the other 
would encode an offset (for Clang, at least, currently we always 
produce offsets as data4, for instance).

The reason for the custom form is that otherwise we get into the 
territory of defining forms for every combination of encodings 
for address and offset - especially around the offset, do we have 
separate forms for addrx+data4 and addrx+udata (uleb encoded size)? 
This novel two-piece form avoids that problem and allows implementation 
flexibility about the encodings.

Data suggests that this custom form has a bit better space cost/savings 
than the expression, so I'm proposing that here.

The data I have is a clang self-host build, with Split DWARF and 
compression enabled in .o/.dwo files, but not exe/dwp files:

                 o      dwo    o+dwo    exe     dwp   exe+dwp
expressions   -9.21%  -2.90%  -4.33%  -4.45%  -0.45%  -0.83%
form          -9.21%  -3.42%  -4.73%  -4.45%  -3.47%  -3.56%

These improvements are relative to the "ranges" feature described 
above, that already helps reduce relocation costs significantly.

Other build configurations - different uses of compression, non-split 
DWARF, etc, etc, will have somewhat different results, but at least 
the above data was enough to motivate my work on this feature.


Wording changes:
7.5.3, page 207:
After the "DW_FORM_implicit_const" paragraph, add a new paragraph:

"The attribute form DW_FORM_addr_offset is another special case. For 
attributes with this form, the attribute specification contains not one, 
but two unsigned LEB128 numbers each representing a form. The first 
form must be of class address and the second of class constant. Values 
using this form in the .debug_info section contain a value for the first 
form followed by a value for the second form. The total value of the 
DW_FORM_addr_offset is then computed by adding those two values together 
(if the first value is an indirect address, that is resolved first 
before adding it to the second value)."

7.6, Page 221:
A new row:
DW_FORM_addr_offset | 0x2d | address

--
2023-01-23: Accepted with modification: replace DW_FORM_addr_offset to
  DW_FORM_addrx_offset.