Issue 211101.1: Allow 64-bit string offsets in DWARF-32

Author: Paul Robinson
Champion: Paul Robinson
Date submitted: 2021-11-01
Date revised: 2022-05-21
Date closed: 2022-05-16
Type: Enhancement
Status: Accepted
DWARF Version: 6
Section Many, pg Many

Background:

Issue 181026.2 was originally about allowing 64-bit string offsets into
the supplementary object file, even in DWARF-32.  After the committee 
meeting on 1 November 2021, it appears that we want to allow 64-bit
string offsets in general, even in DWARF-32.  Experience reports are
that the string section might well be the first one to exceed 4GB, 
especially for C++ compilation units with lots of templates, making this
enhancement worthwhile.

String references occur in many places within DWARF.  According to Figure
B.1 (p.274), within one object file these include:
- from .debug_info to .debug_str (DW_FORM_strp)
- from .debug_str_offsets to .debug_str
- from .debug_macro to .debug_str (DW_MACRO_define, DW_MACRO_undef)

References from a main object file to a supplementary object file also
occur:
- from main .debug_info to supplementary .debug_str (DW_FORM_strp_sup)
- from main .debug_macro to supplementary .debug_str (DW_MACRO_define_sup,
  DW_MACRO_undef_sup)

According to Figure B.2 (p.278) there are no direct string references
between a skeleton and the corresponding .dwo sections.

For completeness, .debug_line can reference .debug_line_str in the same
object file, but we don't consider .debug_line_str to have the same size
problems that .debug_str does, so we will not consider it further.

Recall that the macro information header (section 6.3.1, p.166) has an
offset_size_flag, which determines whether the offsets within the macro
section are 4-byte or 8-byte.  That leaves us an "out" for allowing all
string references to be 8-byte, unless the macro header gains a length
and therefore no longer deserves a separate offset_size_flag (see issue 
200716.1).  This proposal won't consider macro information any further.

The problem, then, can appear when we use any of the following:
- DW_FORM_strp
- DW_FORM_strp_sup
- the .debug_str_offsets section


Solution:

1) Forms

We retain the existing forms DW_FORM_strp, DW_FORM_strp_sup with their
existing interpretations, i.e., the size of the reference is determined
by whether the form is used in a DWARF-32 or DWARF-64 format unit.  We
add DW_FORM_strp8 and DW_FORM_strp_sup8 which always are 64-bit offsets
but otherwise mean the same as DW_FORM_strp and DW_FORM_strp_sup,
respectively.

2) .debug_str_offsets

We retire DW_AT_str_offsets_base and replace it with DW_AT_str_offsets,
which points to the string offsets table header, rather than to the table 
itself as the old attribute did.  (This is to avoid redefining the
semantics of an existing attribute.)  We permit an exception to the rule
that all contributions for a given compilation unit must use the same
format (32-bit or 64-bit); this allows a 32-bit DWARF .debug_info section
to use one of the DW_FORM_strx* forms to index into the .debug_str_offsets
table, which will contain 32-bit or 64-bit entries depending on the table
format rather than the .debug_info section format.

In principle this allows a linker to rewrite the content of the
.debug_str_offsets section as desired, either to widen the table from
32-bit to 64-bit entries if any references require a 64-bit offset, or
conversely to shrink a table from 64-bit to 32-bit entries if all
references are within the lower 4GB of the .debug_str section.


While some committee members did want all of these changes to be
proposed in a single issue, as a unified proposal, it is also true
that the textual changes are entirely orthogonal and are likely to be
easier to understand if presented separately.  So, I've split the list
of proposed changes into two parts, corresponding to the two-part
solution described above.


Proposed specification changes for part 1, Forms:
=================================================
Section 6.2.4.1 Standard Content Descriptions, p.158
----------------------------------------------------
In item 1 DW_LNCT_path, first paragraph, replace the sentence:

  If the form code is DW_FORM_line_strp, DW_FORM_strp, or DW_FORM_strp_sup,
  then the string is included in the `.debug_line_str`, `.debug_str` or
  supplementary string section, respectively, and its offset occurs
  immediately in the containing `directories` or `file_names` field.

With this:

  If the form code is DW_FORM_line_strp, then the string is included in
  the `.debug_line_str` section; if the form code is DW_FORM_strp or
  DW_FORM_strp8, then the string is included in the `.debug_str` section;
  if the form code is DW_FORM_strp_sup or DW_FORM_strp_sup8, then the
  string is included in the supplementary string section. In all cases
  other than DW_FORM_string, the string's offset occurs immediately in 
  the containing `directories` or `file_names field.


Section 6.2.4.2 Vendor-defined Content Descriptions, p.159
----------------------------------------------------------
Add DW_FORM_strp8 and DW_FORM_strp_sup8 to the list.


Section 6.3.1 Macro Information Header, p.167
---------------------------------------------
Add DW_FORM_strp8 and DW_FORM_strp_sup8 to the list (around line 10).


Section 7.3.1 Relocatable Object Files, p.186
---------------------------------------------
In the bullet mentioning DW_FORM_strp, add "or DW_FORM_strp8" (line 11).


Section 7.3.2.1 First Partition (with Skeleton Unit) p.187
----------------------------------------------------------
In the last bullet, add DW_FORM_strp8.


Section 7.3.6 DWARF Supplementary Object Files, p.195
-----------------------------------------------------
On line 22, after "DW_FORM_strp_sup" add "and DW_FORM_strp_sup8".

On line 25, replace "or DW_FORM_strp_sup`" with "DW_FORM_strp_sup or
DW_FORM_strp_sup8".


Section 7.5.5 Classes and Forms, p.218
--------------------------------------
In the bullet item for class string, second sub-bullet (lines 20-30):

- on line 21, in the parenthentical (DW_FORM_strp) add "or DW_FORM_strp8"
- on line 24, in the parenthetical (DW_FORM_strp_sup) add "or DW_FORM_strp_sup8".
- the next sentence ("DW_FORM_strp_sup offsets from...") should not be present
  and I've filed a separate issue to remove it (so it doesn't require modification).
- on line 30, add the following sentence:

    The representation of a DW_FORM_strp8 or DW_FORM_strp_sup8 value is an
    8-byte unsigned offset.


Section 7.5.5 Classes and Forms, p.219
--------------------------------------
On lines 3-4, replace the wording:
    have the same representation as DW_FORM_strp values.
with:
    are described in Section 7.26, String Offsets Table.
So the entire sentence reads:
    The offset entries in the .debug_str_offsets section
    are described in Section 7.26, String Offsets Table.

Section 7.5.6, Form Encodings, Table 7.6
----------------------------------------
Add entries for DW_FORM_strp8 and DW_FORM_strp_sup8, both class string.


Appendix B, Figure B.1, p.274
-----------------------------
- The box with "DW_FORM_strp (d)" becomes "DW_FORM_strp[8] (d)".
- Note (d) on p.275 add "or DW_FORM_strp8" in the obvious place.


Appendix B, Notes on Figure B.2, p.279
--------------------------------------
- Note (d) add "or DW_FORM_strp8" in the obvious place.
- Note (do) likewise.


Section F.1 Overview, p.393
---------------------------
- Line 1, add "or DW_FORM_strp8" in the obvious place.
- Line 34, add DW_FORM_strp8 to the list.


Proposed specification changes for part 2, .debug_str_offsets:
==============================================================
Section 2.2 Attribute Types

- Remove DW_AT_str_offsets_base.
- Add DW_AT_str_offsets, with the description "String offsets information for unit"


Section 3.1.1 Full and Partial Compilation Unit Entries
-------------------------------------------------------
For item 13, DW_AT_str_offsets_base,
- line 1, replace "DW_AT_str_offsets_base" with "DW_AT_str_offsets"
- line 2, replace "first string offset" with "header"
- line 4, replace "relative to this base" with "into the array of offsets
  following that header."
This causes the paragraph to read:

  13. A DW_AT_str_offsets attribute, whose value is of class stroffsetsptr.
      This attribute points to the header of the compilation unit's contribution
      to the .debug_str_offsets (or .debug_str_offsets.dwo) section.  Indirect
      references (...) within the compilation unit are interpreted as indices
      into the array of offsets following that header.


Section 3.1.4 Type Unit Entries
-------------------------------
For item 4, DW_AT_str_offsets_base, make the same changes as item 13 above.


Section 7.4 32-Bit and 64-Bit DWARF Formats, p.198
--------------------------------------------------
At the end of the paragraph on lines 14-15, add an exception, so that
the paragraph reads as follows.

  The 32-bit and 64-bit DWARF format conventions must _not_ be intermixed
  within a single compilation unit, except for contributions to the
  .debug_str_offsets (or .debug_str_offsets.dwo) section.

Following this paragraph, insert the following non-normative paragraph
(preceding the paragraph at line 16).

  *The exception for the .debug_str_offsets section enables an executable
  program with a mixture of 32-bit and 64-bit DWARF compilation units to
  refer to any string in the merged .debug_str section, even if that section
  exceeds 4GB in size.*


Section 7.5.4 Attribute Encodings
---------------------------------
- Remove DW_AT_str_offsets_base
- Add DW_AT_str_offsets, class stroffsetsptr


Section 7.5.5 Classes and Forms, p.219
--------------------------------------
On line 21 (bullet for stroffsetsptr) change "beginning" to "header" so that
the sentence reads:

  It consists of an offset from the beginning of the of the .debug_str_offsets
  section to the header of the string offsets information for the referencing
  entity.


Section 7.26 String Offsets Table, p.240
----------------------------------------
At the end of item 1 "unit_length" add the following sentence:

  The DWARF format used for the string offsets table is not required to match
  the format used by other sections describing the same compilation unit.


Section 7.26 String Offsets Table, p.241
----------------------------------------
Lines 6-7, replace the paragraph:

  The DW_AT_str_offsets_base attribute points to the first entry following
  the header. The entries are indexed sequentially from this base entry,
  starting from 0.

with this:

  The DW_AT_str_offsets attribute points to the header.  The entries
  are indexed sequentially, starting from 0.


Globally
--------
After all the above changes are made, replace all remaining occurrences
of DW_AT_str_offsets_base with DW_AT_str_offsets.

--
2022-05-16:  Accepted, pending update to 7.5.5.
2022-05-21:  Revised:  Update text for Section 7.5.5.