Issue 180201.1: DWARF and source text embedding
|Accepted with modifications
Section 6.2, 7.27, pg 159, 237
Programming models such as OpenCL can often have source generated at runtime, which is compiled online, with its output not written to file. This raises an issue for the compiler: in the generated DWARF, what should it put as the file name of the compile unit and associated line table information?
Common solutions to this problem include generating some temporary source file name and having a contract with the debugger to get the source somehow and write it out to that file. Since OpenCL and friends generally have quite small source files, it's quite reasonable to embed the entire source in the binary, then have the debugger look in a known section or address to extract the source. If there was a way to express this in DWARF, then runtime- generated source files could work without an additional contract between the compiler and debugger. This is particularly important when dealing with platforms where the filesystem is not writable, which is a common situation in mobile computing.
The proposal allows the source text to be optional for each of the files of a compilation unit. This supports producers that may result in mixed properties for the files of a compilation unit. An example of this can happen for link time optimizations (LTO) that may result in code from other translation units being mixed into another compilation unit.
The proposal embeds the source text directly in the line number program header. This ensures the source text is available even if the rest of the debug information is stripped which is a common practice.
It is common practice in applications that use OpenCL to construct the source text of the OpenCL program on the fly as part of the program execution at “run time”. For example, the application computes what kind of kernel is needed to solve the problem and conditionally pastes together that kernel’s source text. An example is the OpenCL backend of the Gromacs molecular modeling application.
This source text is passed to an OpenCL runtime API such as clCompileProgram. The act of compiling creates a code object that contains both the executable code and the DWARF debug information. This executable can then be linked, loaded and executed. This is typically all done without using a disk file system.
Notice how clCompileProgram takes an optional set of “header files” specified by their names and contents. #include directives can be used to specify one of these “header file names”. The debugger would presumably want to present the source positions in terms of the “header file names” specified in the original compilation request, so the notion of the “files” having names still exists even when the contents of these “source files” do not exist on disk.
The CUDA and HIP languages have similar run time compilation capabilities.
The Clang compiler has the
option which puts the contents of the “file” into the DWARF regardless of
whether the source comes from a disk file or an in memory virtual “file system”.
It uses the “file name” that was specified in the source input (for example
the #include names) when generating DWARF even if it embeds the source text.
Issue 161018.1 was the initial proposal to support embedding source text into DWARF to support languages that support online compilation. It was based on the DWARF 4 standard which did not have a line table entry for the main file and so used a DW_AT_source attribute on the compilation unit. It also used a more complex method to support mixed file properties.
The original proposal was revised by issue 180201.1. This was based on DWARF 5 and so no longer needed the DW_AT_source compilation unit attribute. It did not completely address the mixed file properties as the MD5 property cannot be optional.
This proposal is a revision of 180201.1 that simplifies supporting optional source text by defining the empty null terminated string as a sentinel value indicating the source text is not available.
This augments DWARF Version 5 section 22.214.171.124.
The component is a null-terminated UTF-8 source text string with
"\n" line endings. This content code is paired with the same forms as
DW_LNCT_path. It can be used for entries in the file_names field to
provide the source text contents of the file.
The value is an empty null-terminated string if no source is available. If
the source is available but is an empty file then the value is a
null-terminated single "\n".
*When the source field is present, consumers can use the embedded source
instead of attempting to discover the source on disk using the file path
provided by the DW_LNCT_path field. When the source field is absent,
consumers can access the file to get the source text.*
*This is particularly useful for programming languages that support runtime
compilation and runtime generation of source text. In these cases, the
source text does not reside in any permanent file. For example, the OpenCL
language supports online compilation.*
This augments DWARF Version 5 section 7.22 and Table 7.27.
The following table gives the encoding of the additional line number header entry formats.
Table 7.27: Line number header entry format encodings
Line number header entry format name Value
2021-03-05: Revised. Previous version: http://dwarfstd.org/issues/180201.1-1.html
2021-07-25: Revised non-normative text for DW_LNCT_is_MD5.
2021-08-10: Added motivation section
2022-03-05: Remove MD5 proposal. See http://dwarfstd.org/issues/220304.1.html
2022-03-08: Add history, remove alternatives.
Previous version: http://dwarfstd.org/issues/180201.1-3.html
2022-07-11: Accepted with modifications: Remove "with "\n" line endings" in 6. DW_LNCT_source; remove second paragraph under 6. DW_LNCT_source.