Issue 120727.1: UTF-8 for all sections with strings
Section various, pg many
Places where the standard specifies or recommends UTF-8:
For the .debug_info section, the compilation-unit header may have a DW_AT_use_UTF8 attribute; and
there is a strong recommendation to use UTF-8 regardless. By implication, this covers the .debug_str
section as well, as it contains strings that would otherwise be in the .debug_info section.
For the .debug_frame section, the "augmentation" string is explicitly specified as UTF-8.
For the .debug_pubnames section, there is no explicit statement, but its header contains a reference to
a compilation-unit header in .debug_info, making relatively easy to find any relevant DW_AT_use_UTF8
attribute. Also, it specifies that names are (in effect) copies of the strings from DW_AT_name
attributes, and so would use the same encoding as the corresponding .debug_info compilation unit.
However, other sections also contain strings, and there is no specification or recommendation for the
encoding of those strings.
For the .debug_line section, there is no explicit statement AND no easy way to get back to the
corresponding compilation-unit entry in the .debug_info section.
It is true that any given unit within the .debug_line section should be referenced from a compilation-unit
entry in the .debug_info section, and the DW_AT_use_UTF8 attribute could be implied to apply to the
referenced .debug_line section. However this isn't useful for all consumers, e.g. an object-file dumper
asked to dump just the .debug_line section shouldn't have to dig up the corresponding compilation-unit
entry from .debug_info just to be able to interpret the strings in the .debug_line section.
For the .debug_macinfo section, there is no explicit statement AND no easy way to get back to the
corresponding compilation-unit entry in the .debug_info section. Unlike .debug_line, however, you
can't interpret all of the information without starting from the corresponding compilation-unit entry,
because .debug_macinfo refers to entries in the .debug_line section for the same compilation unit.
So, it is not too unreasonable to say that .debug_macinfo string encoding follows from the .debug_info
compilation-unit entry.
Proposal:
- Clarify that .debug_pubnames uses the same encoding as the referenced compilation-unit in .debug_info.
- Specify that .debug_macinfo uses the same encoding as the corresponding compilation-unit in .debug_info.
- Strongly recommend UTF-8 for strings in the .debug_line section.
OR, specify that .debug_line uses the same encoding as the corresponding compilation-unit in
.debug_info, and don't worry about the object-file-dumper case too much.
---
1/15/2013 -- Accepted with modification. DW_AT_utf8 in a comp unit applies to all the other tables as well.