Issue 240626.1: Add DW_LNS_indirect_line - update `line` to absolute value stored indirectly

Author: Matthew Lugg
Champion: David Blaikie
Date submitted: 2024-06-26
Date revised:
Date closed: 2024-09-30
Type: Enhancement
Status: Rejected
DWARF Version: 6

Background

In many source languages, it is possible for many program-counter addresses with arbitrary separation to correspond to the same source line due to features like templates/generics. When designing an incremental compiler, the line number program must be updated when line numbers within a source file are moved. It would be desirable to have the property that when moving a source line corresponding to a large amount of distinct program-counter addresses, only one line number value in the DWARF information needs to be updated. For this to be true, the regions of the line number program corresponding to each such address must include the line number of the source construct not directly, but through an indirect reference. This allows one line number value stored in the binary to be shared across arbitrarily many entries in the line number matrix.

This is not currently possible: all modifications to the line register are given by relative offsets, and all of these offsets are directly included in the instruction (or implicit in the case of a special opcode).

Overview

Introduce new fields to the line number program header, indirect_lines_length (ULEB128) and indirect_lines (opaque block of bytes containing ULEB128 values). The indirect_lines_length field is the length in bytes of the indirect_lines section, rather than the number of elements.

Introduce a new standard opcode to the line number program, DW_LNS_indirect_line. This opcode takes a single ULEB128 operand, which represents a byte offset into the indirect_lines stored in the header. The effect of this instruction is to set the line register to the ULEB128 value stored at the given byte offset into indirect_lines. Note that indirect_lines is not itself validated to be a valid sequence of ULEB128 values; decoding only occurs when DW_LNS_indirect_line is used. This allows an incremental compiler to pre-allocate a large amount of padding space in indirect_lines to fill in later as needed.

Note that an incremental compiler would not necessarily wish to use variable-length integers to represent this information, since certain changes of line numbers could cause a line number which was previously encoded using 1 byte to now require 2. However, since the stored values need not be densely packed, an implementation is free to reserve as much space as is necessary for each entry. For instance, the downstream Zig compiler (which is the original motivator for this proposal) may choose to reserve 4 or 5 bytes for each line number, as line numbers in Zig source files cannot exceed 1<<32. The use of ULEB128 allows the compiler to make an appropriate decision here instead of codifying such a restriction into the DWARF specification.

Proposed Changes

Pages and line numbers are given for the 2024-06-16 working draft of DWARF Version 6, which is the latest draft at the time of writing.

6.2.4 (pg 163; line 27)

21. indirect_lines_length (ULEB128)

The length in bytes of the data stored in the indirect_lines field.

22. indirect_lines (block containing ULEB128 entries)

A collection of line numbers, each stored as a ULEB128 integer. These values are referenced by DW_LNS_indirect_line instructions to modify the state of the line number information state machine.

The data stored in this field is not checked to be a valid sequence of ULEB128 entries. The contained data may include padding bytes or otherwise invalid data. As such, it is expected that bytes of this field be accessed only when a DW_LNS_indirect_line instruction references them.

6.2.5.2 (pg 170; line 23)

14. DW_LNS_indirect_line

The DW_LNS_indirect_line opcode takes a single unsigned LEB128 operand. This operand is interpreted as a byte offset into the indirect_lines field of the line number program header. An unsigned LEB128 value is read from indirect_lines at the given offset, and this value is stored into the state machine's line register.

7.22 (pg 246; table 7.25)

Opcode name Value
DW_LNS_indirect_line 0x0d

2024-09-30: Rejected.

The committee felt that the cost of updating line numbers without the proposed indirection was not clearly shown to be unreasonable. We would reconsider this proposal at a later date if implementation experience shows it to be worthwhile.

This could be prototyped as a producer extension by using extended line table opcodes to build the indirect_lines table (in the same way that DW_LNE_define_file could be used in DWARF 4 to build the file_names table), or by splitting the indirect table into a separate section.

In any case, it was suggested that DW_LNS_indirect_line should be an extended opcode, as its use is not expected to be common enough to warrant a standard opcode.