Issue 070516.1: Section offset FORMs

Author: Ron Brender
Champion: Ron Brender
Date submitted: 2007-05-16
Date revised:
Date closed:
Type: Extension
Status: Accepted
DWARF Version: 4
BACKGROUND

Since V2, DWARF has multiplexed the DW_FORM_data4 form as both a
constant (usually an integer) and an offset into another section
(especially for location lists). This multiplexing was implicit and
not explained at all well.

V3 made this practice much more explicit. It also extended the
practice to DW_FORM_data8 (for the 64-bit format) and added new
cases. Being more explicit also exposed the inherent complexity much
more clearly. (It was always there in V2, V3 just made it obvious.)

It would be good to clean up this practice and reduce the complexity
as part of our current work.

There is already at least one case where it would be appropriate to
use the FORM to distinguish an integer vs offset in section rather
than introduce a new attribute (generalize DW_AT_start_scope rather
than add the proposed DW_AT_scope_ranges, see proposal 030626.1).
It can be anticipated that additional cases will emerge over time.


HIGH LEVEL PROPOSAL

Before preparing a detailed proposal, it would be helpful to reach
consensus on the key strategy questions that need to be resolved:

1) What FORM(s) to use for a general offset in section construct

2) Whether to "convert" existing multiplexing to use the new FORM(s)

These issues are almost independent, so let me address them here
separately.


NEW FORM(s)

There are two main choices for choice of FORM:

  1A) Introduce a new FORM code (or codes)

  1B) Generalize one of the existing FORMs

Introducing a new form is perhaps conceptually simplest and cleanest.
And rather than adding just one new form, it might be desirable to
add several new forms, one for each target section that can be pointed
into.

However, one or more new forms is definitely not forward compatible.
It would be absolutely necessary to increment the .debug_info version
number, because V3 consumers would not have builtin knowledge about
how to read and ignore the new forms.

A less drastic approach is to generalize one of the existing forms,
namely DW_FORM_strp or DW_FORM_ref_addr. Each of these forms is
currently defined as a section offset into a particular kind of section,
namely .debug_string or .debug_info respectively. The proposal would
be to re-interpret one of these forms as a general "offset into some
section" pointer. The existing code would be given a new name to reflect
the broader usage, such as DW_FORM_sec_offset. The section pointed to
would be a function of the attribute in which it was used (for example,
debug_loc for DW_AT_location, .debug_macinfo for DW_AT_macro_info,
and so on). This attribute contextual dependency is already part of
the V3 conventions for forms _data4 and _data8 when used as section
offsets, so adds no new complexity.

The choice between _strp and _ref_addr is a bit of a toss up. One
arguement is that _strp should remain specific to the .debug_string
section because that allows dumpers to freely interpret such forms as
string pointers without considering further context (as some do now).
Also the _ref_addr form is not much used so modifying its interpretation
is much less likely to be disruptive.

MY PROPOSAL 1: Rename the DW_FORM_ref_addr form code to
DW_FORM_sec_offset and generalize its meaning to be an offset into a
contextually determined section.

     Minor question: Should DW_FORM_sec_offset be allowed to refer to
     the .debug_string section? It seems harmless and simplest to
     allow it but silly to actually use it that way when _strp is
     available! I am inclined to allow it...


TO MULTIPLEX OR NOT

Assuming Proposal 1 is adopted, it becomes feasible to redefine the
classes lineptr, loclistptr, macptr, and rangelistptr to use form
DW_FORM_sec_offset instead of DW_FORM_data4/_data8. It seems desirable
to retain these class names to reflect the target section.

Making such a change would greatly simplify and cleanup the section
offset conventions in DWARF. However, it is not upward compatible
and would require a version change in the .debug_info section. (And
if a version change is adopted, then maybe a new FORM or FORMs is
more desirable after all!)

A compromise is to define xxxptr classes to include both the _sec_offset
form AND the _data4/_data8 forms. That is, allow the new practice but
continue to support the old.

Unfortunately, this compromise does not work well for the DW_AT_start_scope
attribute. It would have to be defined to allow the _sec_offset form
for a pointer into the .debug_ranges section, but _data4/_data8 would
continue to be interpreted as integer constants. Presumably a new
class name should be invented for this new case (just the _sec_offset
form). Ugh!

It is worth noting that DWARF producers probably gain little benefit
from the compromise--any given producer would probably continue to
use the old conventions or use the new conventions, not mix and match.
DWARF consumers on the other hand get the worst of both worlds--they
have to support both conventions, even intermixed.

The more I ponder this the more I conclude that given the opportunity
to clean up (eliminate!) the multiplexing, it would be a shame to
squander it by adopting some kind of compromise. And, it would be a
shame not to do it all. So...

MY PROPOSAL 2: Redefine the xxxptr classes to consist of just the
DW_FORM_sec_offset form adopted in Proposal 1. This requires bumping
the .debug_info version number. Forms DW_FORM_data4 and _data8
"revert" to always be members of the constant class in all cases.

The attributes that would be affected by this proposal can be determined
using Figure 20, pages 133-138. If I scanned correctly, they are these:

    DW_AT_data_member_location
         _frame_base
         _location
         _macro_info
         _ranges
         _return_addr
         _segment
         _static_link
         _stmt_list
         _string_length
         _use_location
         _vtable_elem_location


FINAL NOTE: If we adopt Proposal 2 and increment the version number,
it then matters much less whether we generalize DW_FORM_ref_addr as
proposed above or introduce a new form DW_FORM_sec_offset and code.
I still lean toward generalizing _ref_addr on the grounds that a new
form/code doesn't really add anything, but I wouldn't argue strongly
either way...

-----------------------------------

Revised proposal:

1) In Section 7.4, p122, bullet 3, add the following in the table
following the line for DW_FORM_ref_addr (alphabetical order):

    DW_FORM_sec_offset  offset in a section other than .debug_info
                        or .debug_str

2) Replace Section 7.5.4 (Attribute Encodings) (in its entirety) 
with the following:  <a href="http://dwarfstd.org/doc/070516-1.pdf">Section 7.5.4</a>.

3) In Figure 21, p138, delete ", lineptr, loclistptr, macptr, rangelistptr" 
   from the class alternatives for DW_FORM_data4 and DW_FORM_data8. Only 
   class constant remains.

4) In Figure 21, p139, add a new entry at the end a la:

   DW_FORM_sec_offset  0x17    lineptr, loclistptr, macptr,
                               rangelistptr


-----------------------------------

Accepted.  Remove commentary about DWARF 2.