Frequently Asked Questions

History and Process

What is DWARF?: DWARF is a debugging format used to describe programs in many procedural programming languages. It is most widely associated with the ELF object format but it has been used with other object file formats.
Why is it called DWARF? And why isn’t it spelled “Dwarf”?: It’s a pun, since it was developed along with ELF, the Executable and Linking Format (nee Extensible Linking Format). Brian Russell, the original developer of DWARF, christened it the “Debugging With Attributed Record Formats”.
Where did DWARF come from?: DWARF was orginally developed by Bell Labs for use with the System V debugger named sdb. This format was standardized as DWARF v. 1.0 by the PLSIG (Programming Languages Special Interest Group) of Unix International.
How can I submit a proposal to change or extend the DWARF specificiation?: Please see Submitting A Comment.
What advantages does DWARF have over STABS?: DWARF is a block structured and extensible description of a program’s source and how it is translated into executable code. It’s easy to add new descriptions or extend the descriptions in DWARF. STABS is much more restricted in it’s expressive abilities. It depends on predefined symbol and type definitions and is not easily modified or extended. Additionally, DWARF has facilities for describing a more complex execution environment, such as discontiguous scopes, stack structures, and stack unwinding, which STABS cannot.
Is there an archive for the previous mailing list hosted by SGI?: Yes, you can find it here.
Is DWARF associated with XCOFF object format?: It is reasonably likely someone has used DWARF with XCOFF but a specific implementation is not known. References to an implementation would be welcome.
What is the Software Licensing Agreement for DWARF?: The DWARF specifications, from Version 3 onward, are licensed under the GNU Free Documentation License, V1.3, November 2008.
You write that the text of the DWARF standard is under GNU FDL, but if you read the GNU FDL, you should have noticed that when something is released under GNU FDL it should be written in something which is widespread or at least readable as plain text.: The latest versions of the standard are written in LaTeX, and are freely available from a git repository (see the downloads page).

DWARF Format

How many DW_TAG_compile_unit entries are allowed per compilation unit header?

Each compilation unit header should be followed by exactly one DW_TAG_compile_unit or one DW_TAG_partial_unit, and the children of the DW_TAG_compile_unit or DW_TAG_partial_unit contain Debugging Information Entries for the unit. A DW_TAG_compile_unit or DW_TAG_partial_unit has no sibling entries.

Why doesn’t the line table ‘basic block’ register have a reset operation?

It doesn’t need one.

The table is based on creating row entries, conceptually a row entry for every pc value in the executable text. All the booleans in the line table, such as is_stmt, basic_block, end_sequence, prologue_end, and epilogue_begin are reset by the creation of a new row in the table (see the individual opcodes that create table rows to see this). Each row in the line table is defined by a sequence of one or more line table opcodes and the opcodes precisely define the value of every column of every row.

How big is a DW_FORM_ref_addr?

Starting with DWARF V3, DW_FORM_ref_addr is clearly defined as being an offset into the .debug_info section, so the reference value is the size of an offset. In DWARF V2, DW_FORM_ref_addr was (confusingly) defined as being the size of an address on the target machine. The DWARF V2 definition never made any sense and was a mistake in the DWARF V2 specification: the field DW_FORM_ref_addr defines is an offset, not an address.

If you are producing DWARF V2, please use the DWARF V3 definition of DW_FORM_ref_addr.

What is a state machine which is used to decode the byte stream of line and file debug information?

A state machine is a form of virtual special-purpose computer. The intent is to make the line table be as compact (on disk) as possible while yet allowing very detailed line positions to be recorded. The state machine “executes” line table instructions and constructs a line table in a form readily usable by an application (such as a debugger).

What is the basic logic behind the extended, standard and special opcodes?

The goal is maximum density. The instructions, the opcodes, take as little space as possible yet faithfully represent much detail about the source lines (and how they relate to the object code). Most opcodes are special opcodes. These encode (in a single byte) both the opcode and a machine address and (effectively) a range of source lines. Standard opcodes take a bit more space and represent special information. Extended opcodes take even more space and encode a variable-length instruction.

This design is effectively a fourth-generation line table, all generations having been designed by one person (with help of course, and over several years). Earlier generations were originally used by MIPS COFF (generation 1) and Borland (generations 2 and 3?).

Code and Technology

Is there downloadable code to parse a DWARF file?

There is lots of information in DWARF and no tool presently does precisely this. Yet there are tools that make it fairly straightforward to get this information. Because C++ class information is complicated by its nature this is not a simple task. All the open-source codes mentioned below have license terms, be sure to understand and obey those terms if you use any code and applications mentioned.

readelf

Readelf is a GNU binutils application that can do many things, but one of those things is print DWARF DIEs and attributes as text. A script or program could read this text and find and interpret the desired information. If (instead of just running readelf) you borrow code from readelf you must obey readelf’s license terms, of course.
gdb

The GNU gdb debugger reads DWARF directly from object files. That code could be adapted. Or gdb could be used itself as a ‘backend’. See the gdb MI interface documentation for examples of one way to use gdb as a ‘backend’.
dwarfdump

Dwarfdump is an application (packaged with libdwarf) that can print DWARF DIEs and attributes as text. A script or program could read this text and find and interpret the desired information.
libdwarf

Libdwarf is a C library API for reading dwarf information (packaged with dwarfdump).
llvm-dwarfdump

In addition to pretty-printing DWARF it can also be used to query the debug information, print debug info quality metrics, and verify the structural integrity of DWARF debug information. llvm-dwarfdump is part of the LLVM project.
pyelftools

pyelftools is a pure Python library for parsing and analysing ELF files and DWARF debugging information.
dyninst

dyninst is a high level library which makes the DWARF information more easily available. In particular the symtabAPI is handy for combining ELF and DWARF into a relatively easy to use higher level abstraction.

Is there any software that can read DWARF data and output the size and offset of struct fields (and class data members)?

There are a couple of options:

The pahole utility, part of the "7 dwarves" utilities from Red Hat, is used to find alignment holes in structures, and can pretty-print a C struct annotated with offsets and sizes.
The abidw utility from libabigail can print an XML representation of a structure or class declaration.

Where do you find examples in C and other langs of using the exception handling and other features. Great for programmers to be able to look at working examples.

Please see Appendix D of the latest DWARF specification.

Are there any tools available for editing, compacting, or selectively removing DWARF symbols from object files?

The dwz utility, which is part of the GNU binutils package, can do some amount of compacting.

The split DWARF feature, introduced in DWARF V5, that can be used to reduce the size of relocatable objects and executable files, while retaining debug information.

The ELF object file format allows debug sections to be compressed.

The strip and objcopy utilities in GNU binutils can be used to selectively remove parts of the DWARF debugging information.

Why does my debugger quit soon as I start debugging?

Please contact your debugger supplier.

Where can I find a reference of the DWARF debugging symbols produced by a GCC compiler on various platforms?

For information about how GCC or any other compiler implements DWARF, please contact the developer or distributor for that compiler.

Does MS Visual Studio support the DWARF debugging standard?

No, Visual Studio uses a proprietary debug format based on the COFF object file format.

Is it possible to access local or global variables (i.e., getting their stored values) of a running program without stopping its execution using its debugging information?

This question is really about operating systems and debuggers and compilers, not so much about DWARF.

A short answer is that it is possible to access global variables in a running program from some debuggers running on some operating systems against applications compiled by some compilers. Whether one can find object information on disk (such as the DWARF information) for a running application also depends on the operating system. In most situations it makes no sense to think about accessing local variables as it’s hard to tell at any point when any given local variable is still live: by the time one has finally determined a variable is live it may have vanished or moved.

Aside from debugging, what else can DWARF be used for?

Some examples:

Static analysis; e.g., dyninst.
Understanding and optimizing code; e.g., the "7 dwarves" utilities.
Performance tuning; e.g., hpctoolkit and the Linux kernel's perf command.