||DWARF and source text embedding
Section 6.2, 7.27, pg 159, 237
Programming models such as OpenCL can often have source generated at runtime,
which is compiled online, with its output not written to file. This raises an
issue for the compiler: in the generated DWARF, what should it put as the
file name of the compile unit and associated line table information?
Common solutions to this problem include generating some temporary source
file name and having a contract with the debugger to get the source somehow
and write it out to that file. Since OpenCL and friends generally have quite
small source files, it's quite reasonable to embed the entire source in the
binary, then have the debugger look in a known section or address to extract
the source. If there was a way to express this in DWARF, then runtime-
generated source files could work without an additional contract between the
compiler and debugger. This is particularly important when dealing with
platforms where the filesystem is not writable, which is a common situation
in mobile computing.
To support flexibility, this proposal allows all properties of a file to be
optional for each of the files of a compilation unit. This includes the source
text, file size, time stamp, and MD5 hash. This supports producers that may
result in mixed properties for the files of a compilation unit. An example of
this can happen for link time optimizations (LTO) that may result in code from
other translation units being mixed into another compilation unit.
The proposal embeds the source text directly in the line number program header.
This ensures the source text is available even if the rest of the debug
information is stripped which is a common practice.
It is common practice in applications that use OpenCL to construct the
source text of the OpenCL program on the fly as part of the program
execution at “run time”. For example, the application computes what kind of
kernel is needed to solve the problem and conditionally pastes together that
kernel’s source text. An example is the OpenCL backend of the Gromacs
(https://www.gromacs.org/) molecular modeling application.
This source text is passed to an OpenCL runtime API such as clCompileProgram
The act of compiling creates a code object that contains both the executable code
and the DWARF debug information. This executable can then be linked, loaded and
executed. This is typically all done without using a disk file system.
Notice how clCompileProgram takes an optional set of “header files” specified
by their names and contents. #include directives can be used to specify one of
these “header file names”. The debugger would presumably want to present the
source positions in terms of the “header file names” specified in the original
compilation request, so the notion of the “files” having names still exists
even when the contents of these “source files” do not exist on disk.
The CUDA and HIP languages have similar run time compilation capabilities.
Issue 161018.1 was the initial proposal to support embedding source text into
DWARF to support languages that support online compilation. It was based on the
DWARF 4 standard which did not have a line table entry for the main file and so
used a DW_AT_source attribute on the compilation unit. It also used a more
complex method to support mixed file properties.
The original proposal was revised by issue 180201.1. This was based on DWARF 5
and so no longer needed the DW_AT_source compilation unit attribute. It did not
completely address the mixed file properties as the MD5 property cannot be
This proposal is a revision of 180201.1 that simplifies supporting optional
source text by defining the empty null terminated string as a sentinel value
indicating the source text is not available. It also defines a DW_LNCT_is_MD5
property to allow the MD5 hash to be optional.
This augments DWARF Version 5 section 18.104.22.168.
DW_LNCT_is_MD5 indicates if the DW_LNCT_MD5 content kind, if
present, is valid: when 0 it is not valid and when 1 it is valid. If
DW_LNCT_is_MD5 content kind is not present, and DW_LNCT_MD5
content kind is present, then the MD5 checksum is valid.
DW_LNCT_is_MD5 is always paired with the DW_FORM_udata form.
This allows a compilation unit to have a mixture of files with and
without MD5 checksums. This can happen when link time optimization
(LTO) generates code for a translation unit that includes contributions
from other translation units that have different information about
the source files.
The component is a null-terminated UTF-8 source text string with
"\n" endings. This content code is paired with the same forms as
DW_LNCT_path. It can be used for file name entries.
The value is an empty null-terminated string if no source is available. If
the source is available but is an empty file then the value is a
null-terminated single "\n".
When the source field is present, consumers can use the embedded source
instead of attempting to discover the source on disk using the file path
provided by the DW_LNCT_path field. When the source field is absent,
consumers can access the file to get the source text.
This is particularly useful for programming languages that support runtime
compilation and runtime generation of source text. In these cases, the
source text does not reside in any permanent file. For example, the OpenCL
language supports online compilation.
This augments DWARF Version 5 section 7.22 and Table 7.27.
The following table gives the encoding of the additional line number header
Table 7.27: Line number header entry format encodings
Line number header entry format name Value
2021-03-05: Revised. Previous version: http://dwarfstd.org/ShowIssue.php?issue=180201.1-1
2021-07-25: Revised non-normative text for DW_LNCT_is_MD5.
2021-08-10: Added motivation section