||DWARF and source text embedding
||Accepted with modifications
Section 6.2, 7.27, pg 159, 237
Programming models such as OpenCL can often have source generated at runtime,
which is compiled online, with its output not written to file. This raises an
issue for the compiler: in the generated DWARF, what should it put as the
file name of the compile unit and associated line table information?
Common solutions to this problem include generating some temporary source
file name and having a contract with the debugger to get the source somehow
and write it out to that file. Since OpenCL and friends generally have quite
small source files, it's quite reasonable to embed the entire source in the
binary, then have the debugger look in a known section or address to extract
the source. If there was a way to express this in DWARF, then runtime-
generated source files could work without an additional contract between the
compiler and debugger. This is particularly important when dealing with
platforms where the filesystem is not writable, which is a common situation
in mobile computing.
The proposal allows the source text to be optional for each of the files of a
compilation unit. This supports producers that may result in mixed properties
for the files of a compilation unit. An example of this can happen for link time
optimizations (LTO) that may result in code from other translation units being
mixed into another compilation unit.
The proposal embeds the source text directly in the line number program header.
This ensures the source text is available even if the rest of the debug
information is stripped which is a common practice.
It is common practice in applications that use OpenCL to construct the
source text of the OpenCL program on the fly as part of the program
execution at “run time”. For example, the application computes what kind of
kernel is needed to solve the problem and conditionally pastes together that
kernel’s source text. An example is the OpenCL backend of the Gromacs
(https://www.gromacs.org/) molecular modeling application.
This source text is passed to an OpenCL runtime API such as clCompileProgram
The act of compiling creates a code object that contains both the executable code
and the DWARF debug information. This executable can then be linked, loaded and
executed. This is typically all done without using a disk file system.
Notice how clCompileProgram takes an optional set of “header files” specified
by their names and contents. #include directives can be used to specify one of
these “header file names”. The debugger would presumably want to present the
source positions in terms of the “header file names” specified in the original
compilation request, so the notion of the “files” having names still exists
even when the contents of these “source files” do not exist on disk.
The CUDA and HIP languages have similar run time compilation capabilities.
The Clang compiler has the -gembed-source
option which puts the contents of the “file” into the DWARF regardless of
whether the source comes from a disk file or an in memory virtual “file system”.
It uses the “file name” that was specified in the source input (for example the
#include names) when generating DWARF even if it embeds the source text.
Issue 161018.1 was the initial proposal to support embedding source text into
DWARF to support languages that support online compilation. It was based on the
DWARF 4 standard which did not have a line table entry for the main file and so
used a DW_AT_source attribute on the compilation unit. It also used a more
complex method to support mixed file properties.
The original proposal was revised by issue 180201.1. This was based on DWARF 5
and so no longer needed the DW_AT_source compilation unit attribute. It did not
completely address the mixed file properties as the MD5 property cannot be
This proposal is a revision of 180201.1 that simplifies supporting optional
source text by defining the empty null terminated string as a sentinel value
indicating the source text is not available.
This augments DWARF Version 5 section 18.104.22.168.
The component is a null-terminated UTF-8 source text string with
"\n" line endings. This content code is paired with the same forms as
DW_LNCT_path. It can be used for entries in the file_names field to
provide the source text contents of the file.
The value is an empty null-terminated string if no source is available. If
the source is available but is an empty file then the value is a
null-terminated single "\n".
When the source field is present, consumers can use the embedded source
instead of attempting to discover the source on disk using the file path
provided by the DW_LNCT_path field. When the source field is absent,
consumers can access the file to get the source text.
This is particularly useful for programming languages that support runtime
compilation and runtime generation of source text. In these cases, the
source text does not reside in any permanent file. For example, the OpenCL
language supports online compilation.
This augments DWARF Version 5 section 7.22 and Table 7.27.
The following table gives the encoding of the additional line number header
Table 7.27: Line number header entry format encodings
Line number header entry format name Value
2021-03-05: Revised. Previous version: http://dwarfstd.org/issues/180201.1-1.html
2021-07-25: Revised non-normative text for DW_LNCT_is_MD5.
2021-08-10: Added motivation section
2022-03-05: Remove MD5 proposal. See http://dwarfstd.org/issues/220304.1.html
2022-03-08: Add history, remove alternatives.
Previous version: http://dwarfstd.org/issues/180201.1-3.html
2022-07-11: Accepted with modifications: Remove "with "\n" line endings" in
6. DW_LNCT_source; remove second paragraph under 6. DW_LNCT_source.