Issue 181223.1: Add source URL (was Add Microsoft SourceLink support)

Author: Jordan Frost
Champion: Tony Tye
Date submitted: 2018-12-23
Date revised: 2021-08-06
Date closed: 2022-08-22
Type: Enhancement
Status: Accepted with modifications
DWARF Version: 6
Section , pg 

Microsoft's PDB format supports a feature, SourceLink. It allows embedding a 
JSON blob inside the PDB file that their debuggers can use to find source.

This feature is sorely missing from code compiled via GCC that has debug 
information.

The format does not have not be SourceLink at all, but just any kind of blob 
sitting in the debug file.


DISCUSSION

It is unclear if the request is seeking a standard way to put arbitrary blobs
into DWARF information, or a standard way to provide functionality similar 
to SourceLink.

If the former, existing code object formats already provide such ability. For
example, ELF already provides the ability to add arbitrary sections or note
records. It would then require agreement on the name, format, and meaning of
such sections or note records so that producers and consumers could communicate.

If the latter, then a first step is to look at the SourceLink definition 
which can be found at:

  https://github.com/dotnet/designs/blob/main/accepted/2020/diagnostics/source-link.md

It is intended to provide a means to specify a URL that can be used to locate
the source files from an online server. It uses a JSON textual representation. 
The representation supports limited wildcards to allow a single entry to
describe the location of many files.

DWARF has a file table that provides information about the source files. So one
approach is to extend it to allow the specification of a URL. This would provide
a standard place that produces and consumers could use to communicate the URL
that could be used to find the source online. A consumer could use this if the
source is not locally available using the file path. The file table already has
a means to add fields to each file entry, so an additional one could be defined.

A compiler/linker producer could accept a SourceLink JSON file and use it while
generating the DWARF file table. Standard linking would be able to preserve 
the information as it is just part of the existing DWARF file table.

The proposal below describes this alternative. It uses URL for the name of the
field, but should it use URI to be more general?

ALTERNATIVES

The above approach does not benefit from the compaction offered by the
SourceLink JSON use of wildcards. However, is this considered an issue
in DWARF size?

If it is then a more elaborate DWARF approach could be devised. For example,
a per CU “url table” could be added with entries for a file root and a URL
root (essentially what is in the JSON file). File entries could either
reference entries in the url table, or the url table could simply be an
independent table that a consumer could consult to determine the URL of a
file using the same rules defined by SourceLink.

Another approach to address this issue that does not involve any DWARF changes 
is the use of debuginfo files
(see https://fedoraproject.org/wiki/Releases/FeatureBuildId). An executable that 
contains DWARF debug information can be split into a stripped ELF executable that 
contains no debug information and a debuginfo file that is an ELF file that only 
contains the DWARF debug information. The stripped executable has an ELF note record
added that has a Build-ID 
(see https://access.redhat.com/documentation/en-us/red_hat_enterprise_linux/6/html/developer_guide/compiling-build-id)
which is a hash of the stable parts of the stripped executable that is also
saved in the debuginfo file. The stripped executable is included in the distribute
package. In addition a debug package can be made that includes the debuginfo file
together with the sources. 

The debugedit (http://manpages.ubuntu.com/manpages/bionic/man8/debugedit.8.html) 
tool can be used to split an executable, compute and insert the Build-ID, and
change the prefix of all the source file paths in the DWARF to use a fixed
location as opposed to the one used by the build system.

To debug the executable the debuginfo file and sources can then be installed using
the debug packages (see dwarf-workgroup@lists.dwarfstd.org). The debuginfod server
(https://developers.redhat.com/blog/2019/10/14/introducing-debuginfod-the-elfutils-debuginfo-server)
can be used to retrieve the debuginfo files in an automated fashion using a URL.
A debugger can use the Build-ID in the executable to request the debuginfo and 
sources from the debuginfod server and cache them locally
(see https://sourceware.org/gdb/onlinedocs/gdb/Separate-Debug-Files.html).

PROPOSED RESOLUTION

This augments DWARF Version 5 section 6.2.4.1.

6.  DW_LNCT_URL
 
    The component is a null-terminated UTF-8 URL text string. This content code
    is paired with the same forms as DW_LNCT_path. It can be used for file
    name entries.  If DW_LNCT_URL content kind is not present, or is present 
    with an empty null-terminated string value, no URL is available.  Otherwise,
    the null-terminated string value provides a URL for a location that can be
    used to access the source text.

    *When the URL is available, consumers can use it to obtain the source
    text if the file path provided by the DW_LNCT_path field is
    not accessible, or if the timestamp, size or hash of the contents of that
    file does not match that provided by the DW_LNCT_timestamp, 
    DW_LNCT_size or DW_LNCT_MD5 fields respectively.*

This augments DWARF Version 5 section 7.22 and Table 7.27.

The following table gives the encoding of the additional line number header
entry format.

  Table 7.27: Line number header entry format encodings

  ====================================  ====================
  Line number header entry format name  Value
  ====================================  ====================
  DW_LNCT_is_URL #                      0x6
  ====================================  ====================

--
2021-08-06:  Revised: Added discussion, alternatives, proposal.
2022-08-22:  Accepted with modifications: Remove discussion about empty URL string.
             Remove non-normative text.  Revise title.