Issue 211108.2: Allow Non-Uniform Record Formats in the File Name Table

Author: Cary Coutant
Champion: Cary Coutant
Date submitted: 2021-11-08
Date revised: 2022-07-05
Date closed: 2022-07-25
Type: Enhancement
Status: Accepted with modifications
DWARF Version: 6
Section 6.2.4, pg 156
Proposal to Allow Non-Uniform Record Formats in the File Name Table


Background
----------

In Issue 180201.1 ("DWARF and source text embedding"), a mechanism was
proposed for embedding source text in the line table, and for making the
MD5 component optional, so that a line table may have a mixture of file
name entries with and without MD5 checksums. The committee decided to
split the second part of that proposal into a separate issue, noting
that the use of a separate boolean field (DW_LNCT_is_MD5) was perhaps
not the best approach to the problem.

Alternatives suggested were:

(1) Reserve the value 0 for an absent checksum, so that no separate flag
is required. The odds of a file having a 0 checksum are no greater than
that of a file's checksum colliding with that of another. We could
simply designate that a file whose checksum is 0 would be written with a
checksum of 1.

(2) Generalize the problem so that each file name entry in the line
number program header could have a custom format rather than impose a
uniform format on all file name entries.

This proposal describes the second alternative.


New Directory and File Name Entry Format
----------------------------------------

The line number program header contains two format tables, which
describe the record formats used by the directory and file name tables.

Each format table contains one or more format descriptors. The number of
format descriptors in each table is given by a separate field in the
program header.

Each format descriptor consists of a sequence of pairs, each consisting
of a content type code and a form, both represented as unsigned LEB128
values, terminated by a (0,0) pair.

Each directory and file name entry begins with a format code, which is
the index of a format descriptor in the corresponding format table. The
format code is followed by a sequence of fields as described by the
indicated format descriptor.

The content type codes, and the form codes that may be paired with each
content type code, are as given in Section 6.2.4.1 ("Standard Content
Descriptions").


Proposed Changes to the DWARF Specification
-------------------------------------------

The line number program header contains the following fields, which
replace items 13 through 20 in Section 6.2.4 of the DWARF 5
specification:

13. directory_format_count (ubyte)

    A 1-byte unsigned integer containing the number of format
    descriptors in the directory format table.

14. directory_format_table (sequence of record format descriptors)

    A sequence of record format descriptors. Each descriptor consists
    of the following:

    * A sequence of field descriptors. Each field descriptor consists
      of a pair of unsigned LEB128 values: (a) a content type code (see
      Sections 6.2.4.1 and 6.2.4.2), and (b) a form code (using the
      attribute form codes).
    * A pair of zero bytes to terminate the declaration.

    The line number program numbers the record format descriptors
    sequentially, beginning with 0.

    The format declarations describe the layout of the entries
    in the directories table, below.

15. directories_count (ULEB128)

    A count of the number of directory entries in the directory table.

16. directories (sequence of directory entries)

    A sequence of directory entries. Each entry consists of:

    * A format code (ubyte), which selects a record format descriptor
      from the directory format table, above, by its index.
    * A sequence of fields as described by the selected record format
      descriptor.

    Each directory entry describes a path that was searched for included
    source files in this compilation, including the compilation
    directory of the compilation. (The paths include those directories
    specified by the user for the compiler to search and those the
    compiler searches without explicit direction.)

    The first directory entry is the current directory of the
    compilation. Each additional path entry is either a full path name
    or is relative to the current directory of the compilation.

    The line number program assigns a number (index) to each of the
    directory entries in order, beginning with 0.

    [Keep the non-normative text from the DWARF 5 document.]

17. file_name_format_count (ubyte)

    A 1-byte unsigned integer containing the number of format
    descriptors in the file name format table.

18. file_name_format_table (sequence of record format descriptors)

    A sequence of record format descriptors. Each descriptor consists
    of the following:

    * A sequence of field descriptors. Each field descriptor consists
      of a pair of unsigned LEB128 values: (a) a content type code (see
      Sections 6.2.4.1 and 6.2.4.2), and (b) a form code (using the
      attribute form codes).
    * A pair of zero bytes to terminate the declaration.

    The line number program numbers the record format descriptors
    sequentially, beginning with 0.

    The format declarations describe the layout of the entries
    in the file names table, below.

19. file_names_count (ULEB128)

    A count of the number of file name entries in the file name table.

20. file_names (sequence of file name entries)

    A sequence of file name entries. Each entry consists of:

    * A format code (ubyte), which selects a record format descriptor
      from the file name format table, above, by its index.
    * A sequence of fields as described by the selected record format
      descriptor.

    Each file name entry describes a source file that contributes to the
    line number information for this compilation unit, or is used in
    other contexts, such as in a declaration coordinate or a macro file
    inclusion.

    The first file name entry is the primary source file, whose file
    name exactly matches that given in the DW_AT_name attribute in the
    compilation unit debugging information entry.

    The line number program references file names in this sequence
    beginning with 0, and uses those numbers instead of file names in
    the line number program that follows.

    [Keep the non-normative text from the DWARF 5 document.]

*If the directory format table or the file name format table contains
exactly one item, and all the field descriptors use form codes with a
fixed-length representation, a consumer may be able to optimize its
access to the directory table or file name table, respectively, by
computing the entry offset from the entry index.*

--

In Section 7.22, on page 236, change "The version number in the line
number program header is 5" to "... 6".

In Appendix G, Table G.1, on page 416, enter "6" for ".debug_line", in
the new column for DWARF V6.

--
2022-07-05:  Revised: Previous version: http://dwarfstd.org/issues/211108.2-1.html
2022-07-25:  Accepted with modifications:
    Change ubyte to ULEB in 13, 16, 17, and 20.
    Remove non-normative paragraph after 20.