Issue 240626.1: Add DW_LNS_indirect_line - update `line` to absolute value stored indirectly
Author: | Matthew Lugg |
---|---|
Champion: | David Blaikie |
Date submitted: | 2024-06-26 |
Date revised: | |
Date closed: | 2024-09-30 |
Type: | Enhancement |
Status: | Rejected |
DWARF version: | 6 |
Background
In many source languages, it is possible for many program-counter addresses with arbitrary separation to correspond to the same source line due to features like templates/generics. When designing an incremental compiler, the line number program must be updated when line numbers within a source file are moved. It would be desirable to have the property that when moving a source line corresponding to a large amount of distinct program-counter addresses, only one line number value in the DWARF information needs to be updated. For this to be true, the regions of the line number program corresponding to each such address must include the line number of the source construct not directly, but through an indirect reference. This allows one line number value stored in the binary to be shared across arbitrarily many entries in the line number matrix.
This is not currently possible: all modifications to the line
register
are given by relative offsets, and all of these offsets are directly
included in the instruction (or implicit in the case of a special
opcode).
Overview
Introduce new fields to the line number program header,
indirect_lines_length
(ULEB128) and
indirect_lines
(opaque block of bytes containing ULEB128 values). The
indirect_lines_length
field is the length in bytes of the indirect_lines
section, rather
than the number of elements.
Introduce a new standard opcode to the line number program,
DW_LNS_indirect_line
. This opcode
takes a single ULEB128 operand, which represents a byte offset into the
indirect_lines
stored in
the header. The effect of this instruction is to set the line
register
to the ULEB128 value stored
at the given byte offset into indirect_lines
. Note that
indirect_lines
is not itself validated
to be a valid sequence of ULEB128 values; decoding only occurs when
DW_LNS_indirect_line
is used.
This allows an incremental compiler to pre-allocate a large amount of
padding space in
indirect_lines
to fill in later as needed.
Note that an incremental compiler would not necessarily wish to use variable-length integers to represent this information, since certain changes of line numbers could cause a line number which was previously encoded using 1 byte to now require 2. However, since the stored values need not be densely packed, an implementation is free to reserve as much space as is necessary for each entry. For instance, the downstream Zig compiler (which is the original motivator for this proposal) may choose to reserve 4 or 5 bytes for each line number, as line numbers in Zig source files cannot exceed 1<<32. The use of ULEB128 allows the compiler to make an appropriate decision here instead of codifying such a restriction into the DWARF specification.
Proposed Changes
Pages and line numbers are given for the 2024-06-16 working draft of DWARF Version 6, which is the latest draft at the time of writing.
6.2.4 (pg 163; line 27)
21.
indirect_lines_length
(ULEB128)The length in bytes of the data stored in the
indirect_lines
field.22.
indirect_lines
(block containing ULEB128 entries)A collection of line numbers, each stored as a ULEB128 integer. These values are referenced by DW_LNS_indirect_line instructions to modify the state of the line number information state machine.
The data stored in this field is not checked to be a valid sequence of ULEB128 entries. The contained data may include padding bytes or otherwise invalid data. As such, it is expected that bytes of this field be accessed only when a DW_LNS_indirect_line instruction references them.
6.2.5.2 (pg 170; line 23)
14.
DW_LNS_indirect_line
The
DW_LNS_indirect_line
opcode takes a single unsigned LEB128 operand. This operand is interpreted as a byte offset into theindirect_lines
field of the line number program header. An unsigned LEB128 value is read fromindirect_lines
at the given offset, and this value is stored into the state machine'sline
register.
7.22 (pg 246; table 7.25)
Opcode name Value DW_LNS_indirect_line
0x0d
2024-09-30: Rejected.
The committee felt that the cost of updating line numbers without the proposed indirection was not clearly shown to be unreasonable. We would reconsider this proposal at a later date if implementation experience shows it to be worthwhile.
This could be prototyped as a producer extension by using extended line
table opcodes to build the indirect_lines
table (in the same way that
DW_LNE_define_file
could be used in DWARF 4 to build the file_names
table), or by splitting the indirect table into a separate section.
In any case, it was suggested that DW_LNS_indirect_line
should be an
extended opcode, as its use is not expected to be common enough to
warrant a standard opcode.