Issue 210419.1: Split DW_AT_language into DW_AT_language and DW_AT_language_version

Author: Jakub Jelinek
Champion: Jakub Jelinek
Date submitted: 2021-04-19
Date revised: 2022-05-16
Date closed: 2022-06-13
Type: Enhancement
Status: Accepted
DWARF Version: 6
Section 3.1.1, pg 61-63
PROBLEM DESCRIPTION

Many of the DW_LANG_* codes embed a version number (typically in
a form of a year).  For debugging information consumers, whenever
new DW_LANG_* codes are added to the standard, most of the consumers
need to be changed, because typically consumers want to ask either
is the TU's implementation language C++ (or some other standard)
regardless of version, or somewhat less often is the TU's implementation
language C++11 or later.

RESOLUTION

In Table 2.2 on pages 20/21, remove
DW_AT_language  Programming language
row and add
DW_AT_language_name Programming language name
DW_AT_language_version  Programming language version
rows.

On page 61, change DW_AT_language to DW_AT_language_name.

Replace Table 3.1 on pages 62-63 with a new table:
Language name   Meaning Versioning scheme
DW_LNAME_Ada †   ISO Ada YYYY
DW_LNAME_BLISS  BLISS
DW_LNAME_C  C   YYYYMM
DW_LNAME_C_plus_plus    ISO C++ YYYYMM
DW_LNAME_Cobol † ISO COBOL   YYYY
DW_LNAME_Crystal
DW_LNAME_D  D
DW_LNAME_Dylan  Dylan
DW_LNAME_Fortran    ISO Fortran YYYY
DW_LNAME_Go †    Go
DW_LNAME_Haskell †   Haskell
DW_LNAME_Java   Java
DW_LNAME_Julia  Julia
DW_LNAME_Kotlin Kotlin
DW_LNAME_Modula2    ISO Modula-2
DW_LNAME_Modula3 †   Modula-3
DW_LNAME_ObjC   Objective C YYYYMM
DW_LNAME_ObjC_plus_plus Objective C++   YYYYMM
DW_LNAME_OCaml  OCaml
DW_LNAME_OpenCL †    OpenCL
DW_LNAME_Pascal ISO Pascal
DW_LNAME_PLI †   ANSI PL
DW_LNAME_Python †    Python  VVMM
DW_LNAME_RenderScript   RenderScript Kernel Language
DW_LNAME_Rust   Rust
DW_LNAME_Swift  Swift
DW_LNAME_UPC    UPC (Unified Parallel C)
DW_LNAME_Zig    Zig

In Section 3.1.1 on page 63, add as new paragraph
A DW_AT_language_version attribute may be specified whose constant value
is an integer code indicating the version of the source language.
Depending on the Versioning scheme column in Table 3.1 this can be:
1. YYYY 4 digit decimal year when the standard has been ratified.
2. YYYYMM 4 digit decimal year multiplied by 100 plus decimal month
when the standard has been ratified, e.g. 201703 for a standard ratified
in March 2017.
3. VVMM 2 digit decimal major version multipled by 100 plus minor version,
e.g. 306 for version 3.6 of the language.
When the attribute is omitted, default value is 0, which stands for
unspecified version of the source language.

In Section 3.1.3 on page 68, replace DW_AT_language with DW_AT_language_name 
and add a DW_AT_language_version attribute.

In Section 3.1.4 on page 69, replace DW_AT_language with DW_AT_language_name
and add
"A DW_AT_language_version attribute, whose constant value is an integer code
indicating source language version as described on page 63."

In Section 5.5 on page 111 replace DW_AT_language with DW_AT_language_name.

In Section 7.5.4 in Table 7.5, replace
DW_AT_language  0x13    constant
with
Reserved    0x13    not applicable
and add
DW_AT_language_name 0x8d    constant
DW_AT_language_version  0x8e    constant
entries.

In Section 7.12 replace DW_AT_language with DW_AT_language_name and
replace Table 7.17 with:
DW_LNAME_Ada    0x0001  1
DW_LNAME_BLISS  0x0002  0
DW_LNAME_C  0x0003  0
DW_LNAME_C_plus_plus    0x0004  0
DW_LNAME_Cobol  0x0005  1
DW_LNAME_Crystal    0x0006  0
DW_LNAME_D  0x0007  0
DW_LNAME_Dylan  0x0008  0
DW_LNAME_Fortran    0x0009  1
DW_LNAME_Go 0x000a  0
DW_LNAME_Haskell    0x000b  0
DW_LNAME_Java   0x000c  0
DW_LNAME_Julia  0x000d  1
DW_LNAME_Kotlin 0x000e  0
DW_LNAME_Modula2    0x000f  1
DW_LNAME_Modula3    0x0010  1
DW_LNAME_ObjC   0x0011  0
DW_LNAME_ObjC_plus_plus 0x0012  0
DW_LNAME_OCaml  0x0013  0
DW_LNAME_OpenCL 0x0014  0
DW_LNAME_Pascal 0x0015  1
DW_LNAME_PLI    0x0016  1
DW_LNAME_Python 0x0017  0
DW_LNAME_RenderScript   0x0018  0
DW_LNAME_Rust   0x0019  0
DW_LNAME_Swift  0x001a  0
DW_LNAME_UPC    0x001b  0
DW_LNAME_Zig    0x001c  0

In Appendix A in Table A.1, rename DW_AT_language to DW_AT_language_name
and add DW_AT_language_version to DW_TAG_compile_unit, DW_TAG_partial_unit,
DW_TAG_type_unit.

In Appendixes D, E and F, adjust examples to use DW_AT_language_name and
DW_AT_language_version attributes instead of DW_AT_language perhaps with
DW_FORM_implicit_const, rename DW_LANG_* codes to DW_LNAME_* with
appropriate tweaks in the names.

In Appendix F in Table F.1, rename DW_AT_language with DW_AT_language_name
and add
DW_AT_language_version  Y   Y   -   Y   Y
row.

Now, for the exact YYYY and YYYYMM values, I wonder if it is enough to
keep those to https://dwarfstd.org/Languages.html table outside of the
standard, which could have
6 columns
Language name   Value   Default Lower Bound Versioning scheme   ""    Version
and contain say:
DW_LNAME_C_plus_plus    0x0004  0   YYYYMM  C++98   199711
                        C++11   201103
                        C++14   201402
                        C++17   201703
                        C++20   202002
Or do we want that also directly in the standard at least for the already
released versions of the standard?

As for Modula, my understanding is that Modula, Modula-2 and Modula-3 are
largely incompatible languages and so probably shouldn't pretend to be one
language with different versions.

--
2022-05-16: Revised.  Previous version: http://dwarfstd.org/issues/210419.1-1.html
2022-06-13: Accepted.