Issue 120727.1: UTF-8 for all sections with strings

Author: Paul Robinson
Champion: Paul Robinson
Date submitted: 2012-07-27
Date revised:
Date closed:
Type: Clarification
Status: Accepted with modifications
DWARF Version: 5
Section various, pg many
Places where the standard specifies or recommends UTF-8:

For the .debug_info section, the compilation-unit header may have a DW_AT_use_UTF8 attribute; and
there is a strong recommendation to use UTF-8 regardless.  By implication, this covers the .debug_str
section as well, as it contains strings that would otherwise be in the .debug_info section.

For the .debug_frame section, the "augmentation" string is explicitly specified as UTF-8.

For the .debug_pubnames section, there is no explicit statement, but its header contains a reference to
a compilation-unit header in .debug_info, making relatively easy to find any relevant DW_AT_use_UTF8
attribute. Also, it specifies that names are (in effect) copies of the strings from DW_AT_name
attributes, and so would use the same encoding as the corresponding .debug_info compilation unit.

However, other sections also contain strings, and there is no specification or recommendation for the
encoding of those strings.  

For the .debug_line section, there is no explicit statement AND no easy way to get back to the
corresponding compilation-unit entry in the .debug_info section.

It is true that any given unit within the .debug_line section should be referenced from a compilation-unit
entry in the .debug_info section, and the DW_AT_use_UTF8 attribute could be implied to apply to the
referenced .debug_line section.  However this isn't useful for all consumers, e.g. an object-file dumper
asked to dump just the .debug_line section shouldn't have to dig up the corresponding compilation-unit
entry from .debug_info just to be able to interpret the strings in the .debug_line section.

For the .debug_macinfo section, there is no explicit statement AND no easy way to get back to the
corresponding compilation-unit entry in the .debug_info section. Unlike .debug_line, however, you
can't interpret all of the information without starting from the corresponding compilation-unit entry,
because .debug_macinfo refers to entries in the .debug_line section for the same compilation unit.
So, it is not too unreasonable to say that .debug_macinfo string encoding follows from the .debug_info
compilation-unit entry.


Proposal:
- Clarify that .debug_pubnames uses the same encoding as the referenced compilation-unit in .debug_info.
- Specify that .debug_macinfo uses the same encoding as the corresponding compilation-unit in .debug_info.
- Strongly recommend UTF-8 for strings in the .debug_line section.
  OR, specify that .debug_line uses the same encoding as the corresponding compilation-unit in
  .debug_info, and don't worry about the object-file-dumper case too much.

---

1/15/2013 -- Accepted with modification.  DW_AT_utf8 in a comp unit applies to all the other tables as well.