Issue 110722.1: More compact macro information - .debug_macro section

Author: Jakub Jelinek
Champion: Jakub Jelinek
Date submitted: 2011-07-22
Date revised:
Date closed:
Type: Enhancement
Status: Accepted
DWARF Version: 5
BACKGROUND

The current .debug_macinfo section doesn't allow any merging of redundant information, so e.g. 
on our compiler when built with an option to request .debug_macinfo generation, to roughly 100MB 
of code/data plus all DWARF sections except .debug_macinfo there is ~350MB .debug_macinfo section.

OVERVIEW

The following proposal replaces that ~350MB .debug_macinfo section with ~1MB .debug_macro section 
and ~1.5MB of new content in .debug_str section.

It has been implemented in GCC (starting at version 4.7) as a GNU extension (.debug_macro section
there is referenced through vendor DW_AT_GNU_macros instead of DW_AT_macros and has version 4, 
where presumably DWARF 5 will start at version 5).

This proposal adds a new operand encoding of the define/undef opcodes,
where the second operand is a reference into .debug_str section.  This is
useful for longer strings than the size of the reference, because it allows
sharing of the strings from different compilation units, or even within one
compilation unit.

Further, though already relatively smaller, savings are achieved by adding
an opcode which references a sequence of macinfo entries, where the consumer
is supposed to substitute the macinfo entry with that sequence of entries.
This again allows sharing the opcodes in between different compilation units
or even within the same compilation unit.

The proposal adds a new section, .debug_macro, that replaces .debug_macinfo 
instead of adding new opcodes to the .debug_macinfo section.  The new section 
has short headers that ease parsing of the section and allow making incompatible 
changes.  There is a mechanism for defining extension opcode arguments as well, 
so consumers may skip them when parsing the section.


PROPOSAL 

Here are preliminary DWARF edits for the proposal:

2.2
Change "Macro information (#define, #undef)" to "Legacy macro information (#define, #undef)".
Add
  "DW_AT_macros        Macro information (#define, #undef)" 
  row to the figure 2.

3.1.1
5. Replace "DW_AT_macro_info" with "DW_AT_macros".
   Replace ".debug_macinfo" with ".debug_macro".  
   Add at the end of the paragraph
   "The DW_AT_macro_info attribute instead might refer to the .debug_macinfo
    section as defined in DWARF version 4."

6.3
  Replace ".debug_macinfo" with ".debug_macro".
  Replace:
    "The macro information for each compilation unit is represented as a series
    of 'macinfo' entries. Each macinfo entry consists of a 'type code'
    and up to two additional operands."
  with:
    "The macro information for each compilation unit starts with a section
    header followed by a series of 'macinfo' entries. Each macinfo entry
    consists of a 'type code' and zero or more operands."
  Add at the end:
    "The section header starts with a 2-byte version number, followed by
    1-byte flags value.  If the least significant bit (bit 0) in the flags is
    cleared, the header is for 32-bit DWARF format macro section and offsets are 4 byte long,
    if it is set, the header is for 64-bit DWARF format macro section and offsets are 8 byte long.
    If the second least significant bit (bit 1) in the flags is set,
    the flags byte is followed by an offset in the .debug_line section of the
    beginning of the line number information, encoded as 4 byte offset for
    32-bit DWARF format macro section and 8 byte offset for 64-bit DWARF format
    macro section.  If the third least significant bit (bit 2) in the flags is set,
    the .debug_line offset (if present) or flags byte (otherwise) is followed by a table
    describing arguments of the macinfo entry types.
    The macinfo entry types defined in this standard may, but might not, be
    described in the table, other macinfo entry types used in the section
    should be described there.  Vendor extension macinfo entry types should be
    allocated in the range from DW_MACRO_lo_user to DW_MACRO_hi_user, other
    unassigned codes are reserved for future DWARF standards.
    The table starts with a 1-byte count of the defined opcodes, followed by
    an entry for each of those opcodes.  Each entry starts with a 1-byte
    opcode number, followed by unsigned LEB128 encoded number of arguments
    and for each argument there is a single byte describing the form in which
    the argument is encoded.  The allowed values are DW_FORM_data1,
    DW_FORM_data2, DW_FORM_data4, DW_FORM_data8, DW_FORM_sdata, DW_FORM_udata,
    DW_FORM_block, DW_FORM_block1, DW_FORM_block2, DW_FORM_block4, DW_FORM_flag,
    DW_FORM_string, DW_FORM_strp, DW_FORM_sec_offset and DW_FORM_strx.
    The table allows a consumer to skip over unknown macinfo entry types.

    *The DWARF 4 and earlier standards used a different section for the
    macinfo entries, the .debug_macinfo section, which didn't contain any
    section headers and didn't have the possibility for indirect string
    encodings or transparent includes.  This section may be used instead
    of or alongside with the DWARF 5 .debug_macro section, as long as each
    compilation unit refers to at most one of these, using DW_AT_macros
    attribute to refer to .debug_macro section, or using DW_AT_macro_info
    attribute to refer to the old .debug_macinfo section.  Consumers should
    be prepared to handle both of these formats.  For description of the
    .debug_macinfo section please refer to the DWARF 4 standard.*"

6.3.1
  Replace:
    "DW_MACINFO_define A macro definition.
    DW_MACINFO_undef    A macro undefinition.
    DW_MACINFO_start_file   The start of a new source file inclusion.
    DW_MACINFO_end_file The end of the current source file inclusion.
    DW_MACINFO_vendor_ext   Vendor specific macro information directives."
  with
    "DW_MACRO_define       A macro definition.
    DW_MACRO_undef          A macro undefinition.
    DW_MACRO_start_file     The start of a new source file inclusion.
    DW_MACRO_end_file       The end of the current source file inclusion.
    DW_MACRO_define_indirect    A macro definition.
    DW_MACRO_undef_indirect     A macro undefinition.
    DW_MACRO_transparent_include    Include a sequence of entries from given offset.
    DW_MACRO_define_indirectx   A macro definition.
    DW_MACRO_undef_indirectx        A macro undefinition."

6.3.1.1
  Replace all "DW_MACINFO_define" occurrences with "DW_MACRO_define" and
    all "DW_MACINFO_undef" with "DW_MACRO_undef".  
  Append:
    "All DW_MACRO_define_indirect and DW_MACRO_undef_indirect entries have
    two operands.  The first operand encodes the line number of the source line
    on which the relevant defining or undefining macro directives appeared.
    The second operand consists of an offset into a string table contained in
    the .debug_str section of the object file.  The size of the operand is
    given in the section header offset size.  Apart from the
    encoding of the operands these entries are equivalent to DW_MACRO_define
    and DW_MACRO_undef.
    All DW_MACRO_define_indirectx and DW_MACRO_undef_indirectx entries have
    two operands.  The first operand encodes the line number of the source line
    on which the relevant defining or undefining macro directives appeared.
    The second operand is represented using unsigned LEB128 encoded value,
    which is interpreted as zero-based index into an array of offsets in the
    .debug_str_offsets section.  Apart from the
    encoding of the operands these entries are equivalent to DW_MACRO_define
    and DW_MACRO_undef."

6.3.1.2
  Replace all "DW_MACINFO_start_file" occurrences with "DW_MACRO_start_file".

  Append: 
    "If any DW_MACINFO_start_file entries are present, the header should
    contain a reference to .debug_line section."

6.3.1.3
  Replace all "DW_MACINFO_end_file" occurrences with "DW_MACRO_end_file".

6.3.1.4
  Replace the whole section with:
    "6.3.1.4  Transparent inclusion of a sequence of entries

    A DW_MACRO_transparent_include entry has one operand, offset into
    another part of the .debug_macro section.  The size of the operand
    is given in the section header offset size.  The
    DW_MACRO_transparent_include macinfo entry instructs the consumer to replace
    it with a sequence of macinfo entries found
    after the section header at the given .debug_macro offset, up to, but excluding,
    the terminating entry with type code 0.  The
    DW_MACRO_transparent_include entry type is aimed at sharing duplicate
    sequences of macinfo entries between macinfo from different compilation units."

6.3.2
  Replace "DW_MACINFO_start_file" with "DW_MACRO_start_file" and
    "DW_MACINFO_end_file" with "DW_MACRO_end_file".

6.3.3
  Replace "DW_MACINFO_define and DW_MACINFO_undef" with
    "DW_MACRO_define, DW_MACRO_define_indirect, DW_MACRO_define_indirectx,
    DW_MACRO_undef, DW_MACRO_undef_indirect and DW_MACRO_undef_indirectx"
    and replace
    "DW_MACINFO_define or DW_MACINFO_undef" with
    "DW_MACRO_define, DW_MACRO_define_indirect, DW_MACRO_define_indirectx,
    DW_MACRO_undef, DW_MACRO_undef_indirect or DW_MACRO_undef_indirectx".
  Replace "DW_MACINFO_start_file" with "DW_MACRO_start_file".

6.3.4
  Replace ".debug_macinfo" with ".debug_macro".

7.5.4
  Replace ".debug_macinfo section to the first byte" with
    ".debug_macro section to the header".

  Add:
    "DW_AT_macros  0xXX    macptr"
     to figure 20.

7.22
  Remove:
    "as are the constants in a DW_MACINFO_vendor_ext entry".

  Replace figure 39 with:
    "Macinfo Type Name     Value
    DW_MACRO_define         0x01
    DW_MACRO_undef          0x02
    DW_MACRO_start_file     0x03
    DW_MACRO_end_file       0x04
    DW_MACRO_define_indirect    0x05
    DW_MACRO_undef_indirect     0x06
    DW_MACRO_transparent_include    0x07
    DW_MACRO_define_indirectx   0x0b
    DW_MACRO_undef_indirectx        0x0c
    DW_MACRO_lo_user        0xe0
    DW_MACRO_hi_user        0xff

Figure 39. Macinfo Type Encodings"

Appendix A
  Add "DW_AT_macros" to allowable attributes for "DW_TAG_compile_unit"
  and "DW_TAG_partial_unit".

Appendix B
  Replace "DW_AT_macinfo" with "DW_AT_macros".  
  Replace ".debug_macinfo" with ".debug_macro".  
  Add
    ( .debug_macro ) -> [ DW_MACRO_transparent_include (n) ] -> ( .debug_macro ) and
    ( .debug_macro ) -> [ DW_MACRO_define_indirect, DW_MACRO_undef_indirect (o) ] -> ( .debug_str )
    ( .debug_macro ) -> [ DW_MACRO_start_file (p) ] -> ( .debug_line )
    ( .debug_macro ) -> [ DW_MACRO_define_indirectx, DW_MACRO_undef_indirectx (q) ] -> ( .debug_str_offsets )
    to the picture.
  In (i) replace ".debug_macinfo" with ".debug_macro" and "DW_AT_macro_info"
  with "DW_AT_macros".
  Add:
    (n) .debug_macro A macinfo operand of the form DW_FORM_sec_offset is an
         offset into another part of the .debug_macro section,
         to the section header before the sequence to be
         transparently included.
    (o) .debug_macro A macinfo operand of the form DW_FORM_strp is an offset
         into the .debug_str section of the corresponding string.
    (p) .debug_macro DW_MACRO_start_file second operand refers to a file entry
         in the .debug_line section, with the .debug_macro header
         containing an offset to the start of the referenced
         .debug_line section.
    (q) .debug_macro A macinfo operand of the form DW_FORM_strx is an index
         into the string offset table in the .debug_str_offsets section."

Appendix F
  Change
    ".debug_macinfo    -   -   -"
  to
    ".debug_macinfo    -   -   -   x"
  Add
    ".debug_macro  x   x   x   5"
    row to the table.

--
3/19/2014 -- Revised.
3/19/2014 -- Accepted.