Issue 121221.1: Allow DW_AT_type with DW_TAG_string_type

Author: Kendrick Wong
Champion: Kendrick Wong
Date submitted: 2012-12-21
Date revised:
Date closed:
Type: Enhancement
Status: Accepted
DWARF Version: 5
Section 5.9, pg 98
The purpose of this write up is to replace the existing proposal 120213.1 by Tobias Burnus: 
"Allow DW_AT_type with DW_TAG_string_type"

Section 5.9 String Type Entries, pg 98

The DWARF spec currently does not allow to specify the type (DW_AT_type) for strings (DW_TAG_string_type).

That's a problem for Fortran, where different string types can exists. For instance, GCC's gfortran [since 
4.4] has a kind=1 (one byte) and a kind=4 (four bytes, for UCS-4) character type. The NAG compiler (nagfor 
5.3) supports besides the one-byte type also UCS-2, UCS-4, and Shift-JIS.

Note: UCS-4 is an (optional) feature of the Fortran 2003 standard (ISO/IEC 1539-1:2004; see also Fortran 
2008, ISO/IEC 1539-1:2010).

[Information from John Bishop]
From the document with this title page:

   WD 1539-1 J3/10-007r1 (F2008 Working Document) 24th November 2010 16:43

Section 4.4.3 Character type; subsection 4.4.3.1 Character sets, paragraphy 3:

[quote]
The character set specified in ISO/IEC 646:1991 (International Reference Version) is referred to as the 
ASCII character set and its corresponding representation method is ASCII character kind.  The character 
set UCS-4 as specified in ISO/IEC 10646 is referred to as the ISO 10646 character set and its corresponding
method is the ISO 10646 character kind.
[end quote]

Section 4.4.3.2 says that there is no standard list of character "kinds" but every kind provided by the 
"processor" (the combined hardware/software system from a vendor) must have a blank character.

Section 13.7.145 ("selected_char_kind(name)") says that the defined kinds are DEFAULT, ASCII and ISO_10646 
and that processors may support other kinds.
[end Information from John Bishop]

Proposal:

A string type entry may have a DW_AT_type attribute describing how each character is encoded and is to be 
interepreted.  The value of this attribute is a base type entry represented by a debugging information entry 
with the tag DW_TAG_base_type.  If the attribute is absent, then the character is encoded using the system 
default.

Add new encoding value:
DW_ATE_UCS      Unicode character (fixed size per character)
DW_ATE_ascii    ASCII character

The DW_ATE_UCS encoding is intended for Unicode character encoding (see ISO/IEC 10646).  For example, UCS-4
(ISO 10646 character set) is represented by a base type entry with an encoding attribute whose value is 
DW_ATE_UCS and a byte size attribute whose value is 4.

The DW_ATE_ascii encoding is intended for single byte ASCII character encoding (see ISO/IEC 646:1991, ASCII 
character set).

Fortran 2003 example:

              program character_kind
                use iso_fortran_env
                implicit none
                integer, parameter :: ascii = selected_char_kind ("ascii")
                integer, parameter :: ucs4  = selected_char_kind ('ISO_10646')
              
                character(kind=ascii, len=26) :: alphabet
                character(kind=ucs4,  len=30) :: hello_world
              
                alphabet = ascii_"abcdefghijklmnopqrstuvwxyz"
                hello_world = ucs4_'Hello World and Ni Hao -- ' &
                              // char (int (z'4F60'), ucs4)     &
                              // char (int (z'597D'), ucs4)
              
                write (*,*) alphabet
              
                open (output_unit, encoding='UTF-8')
                write (*,*) trim (hello_world)
              end program character_kind

Proposed DWARF output:
$1: DW_TAG_base_type
      DW_AT_encoding (DW_ATE_ascii)
      
$2: DW_TAG_base_type
      DW_AT_encoding (DW_ATE_UCS)
      DW_AT_byte_size (4)
      
$3: DW_TAG_string_type
      DW_AT_type ($1)
      DW_AT_string_length ( ... )
      DW_AT_string_length_byte_size ( ... )
      DW_AT_data_location ( ... )
      
$4: DW_TAG_string_type
      DW_AT_type ($2)
      DW_AT_string_length ( ... )
      DW_AT_string_length_byte_size ( ... )
      DW_AT_data_location ( ... )

$5: DW_TAG_variable
      DW_AT_name (alphabet)
      DW_AT_type ($3)
      DW_AT_location ( ... )
      
$6: DW_TAG_variable
      DW_AT_name (hello_world)
      DW_AT_type ($4)
      DW_AT_location ( ... )