DWARF Standard


HOME
SPECIFICATIONS
FAQ
ISSUES



120727.1 Paul Robinson UTF-8 for all sections with strings Clarification Accepted with modifications Paul Robinson


Section various, pg many
Places where the standard specifies or recommends UTF-8:

For the .debug_info section, the compilation-unit header may have a DW_AT_use_UTF8 attribute; and
there is a strong recommendation to use UTF-8 regardless.  By implication, this covers the .debug_str
section as well, as it contains strings that would otherwise be in the .debug_info section.

For the .debug_frame section, the "augmentation" string is explicitly specified as UTF-8.

For the .debug_pubnames section, there is no explicit statement, but its header contains a reference to
a compilation-unit header in .debug_info, making relatively easy to find any relevant DW_AT_use_UTF8
attribute. Also, it specifies that names are (in effect) copies of the strings from DW_AT_name
attributes, and so would use the same encoding as the corresponding .debug_info compilation unit.

However, other sections also contain strings, and there is no specification or recommendation for the
encoding of those strings.  

For the .debug_line section, there is no explicit statement AND no easy way to get back to the
corresponding compilation-unit entry in the .debug_info section.

It is true that any given unit within the .debug_line section should be referenced from a compilation-unit
entry in the .debug_info section, and the DW_AT_use_UTF8 attribute could be implied to apply to the
referenced .debug_line section.  However this isn't useful for all consumers, e.g. an object-file dumper
asked to dump just the .debug_line section shouldn't have to dig up the corresponding compilation-unit
entry from .debug_info just to be able to interpret the strings in the .debug_line section.

For the .debug_macinfo section, there is no explicit statement AND no easy way to get back to the
corresponding compilation-unit entry in the .debug_info section. Unlike .debug_line, however, you
can't interpret all of the information without starting from the corresponding compilation-unit entry,
because .debug_macinfo refers to entries in the .debug_line section for the same compilation unit.
So, it is not too unreasonable to say that .debug_macinfo string encoding follows from the .debug_info
compilation-unit entry.


Proposal:
- Clarify that .debug_pubnames uses the same encoding as the referenced compilation-unit in .debug_info.
- Specify that .debug_macinfo uses the same encoding as the corresponding compilation-unit in .debug_info.
- Strongly recommend UTF-8 for strings in the .debug_line section.
  OR, specify that .debug_line uses the same encoding as the corresponding compilation-unit in
  .debug_info, and don't worry about the object-file-dumper case too much.

---

1/15/2013 -- Accepted with modification.  DW_AT_utf8 in a comp unit applies to all the other tables as well. 



All logos and trademarks in this site are property of their respective owner.
The comments are property of their posters, all the rest © 2007-2017 by DWARF Standards Committee.