Issue 020702.1: FORM_data signed or unsigned.

Author: James Cownie
Champion: James Cownie
Date submitted: 2002-07-02
Date revised:
Date closed:
Type: Clarification
Status: Accepted
DWARF Version: 3
I'm unable to find anywhere in the document which specifies whether
the value of DW_AT_upper_bound should be interpreted as a signed
or unsigned value.

GCC 3.1 is emitting an upper bound for something like this

int foo[199];

like this :-

 <2><215>: Abbrev Number: 10 (DW_TAG_subrange_type)
     DW_AT_type        : <21c>    
     DW_AT_upper_bound : 199    

Where abbrev 10 is this :-

   10      DW_TAG_subrange_type    [no children]
    DW_AT_type         DW_FORM_ref4
    DW_AT_upper_bound  DW_FORM_data1

so, it's assuming that the encoding is unsigned (as if you read 199 as
a signed byte you get -57 :-( ).

This seems wrong to me, since when we're dealing with Fortran upper
bounds can reasonably be negative. (integer, dimension (-50:-20) :: foo)

Before I report this as a gcc bug, though, I'd like to be able to
reference the chapter and verse in the standard which says that
DW_AT_upper_bound should be interpreted as a signed value and I can't
find it :-(

So, what do you think, 

1) can somone point me at somewhere in the standard which asserts this
or 
2) do we need to add something to assert it
or
3) do we need to add something which says it is langugage dependent
   (yeuch).

Petr Sorfa

It definitely must be signed as several languages, e.g. FORTRAN, allow
negative lower and upper bounds.

(All references are to the Draft 7 version DWARF Debugging Format
document.)

Page 130 lists DW_AT_upper_bound as being either a block, constant or
reference. On page 122 a "constant" is defined to be (amongst other
forms) either DW_FORM_sdata or DW_FORM_udata (signed or unsigned.) A
"block" on page 121 is unsigned, and the description of "reference" is
on page 123.

Keith.Walker

Ah!!!   I now see the problem.    You are quite correct.

This does appear to be an ommision from the DWARF3 Specification that I had
not previously noticed.    I had always believed that the DW_FORM_data<n>
forms were unsigned and that if you needed a signed value then you would
use the DW_FORM_sdata.      However as you point out this is currently not
actually what the specification states;  however I believe that it is was
is the most probable intention.

I believe that this means that the definitions of a constant should be
ammended to read something like ....

    "constant
    There are size forms of constant.  There are fixed
    length unsigned constant data forms for one, two ........."

Does anyone know of an implementation what has used the DW_FORM_data<n>
forms when a signed value was intended?

David Weatherford

The signedness of an upper bound is the same as that of the subtype.
Your example shows a subrange type used as array bounds.  What is the
type refereced (at <21c>)?  The signedness of that type (usually, signed
integer) is the signedness of the upper bound.  See Section 5.11,
"Subrange Type Entries."

I believe that the intent was that FORM_data<n> encode unsigned
integers, but that has little to do with the type of the subrange
itself.  If a particular (signed) value happes to be representable
as an unsigned byte, feel free to use FORM_data1.


James Cownie


> The signedness of an upper bound is the same as that of the subtype.
> Your example shows a subrange type used as array bounds.  What is
> the type refereced (at <21c>)?  The signedness of that type
> (usually, signed integer) is the signedness of the upper bound.  See
> Section 5.11, "Subrange Type Entries."

Which, in this case, is potentially extremely unpleasant for the
consumer of the DWARF since the base type of the subrange (the type at
<21c>) is a forward reference.

> I believe that the intent was that FORM_data<n> encode unsigned
> integers, 

Then the standard should say that, at present it merely says "the
value" which is subject to interpretation.

If FORM_data<n> _is_ defined to be unsigned I don't have a problem. 

If FORM_data<n> can be either signed or unsigned depending on the
context everything is very unpleasant, since in this case I don't know
how to interpret the FORM_data<n> until I have read some other type
which could be anywhere.

> but that has little to do with the type of the subrange itself.  If
> a particular (signed) value happes to be representable as an
> unsigned byte, feel free to use FORM_data1.

Hmm, I have no choice in the matter. I just have to (try to) read
whatever each of you compiler folks chooses to generate :-(



David Weatherford

Glancing through the sources for libdwarf, it appears that FORM_data<n>
can be used for both signed and unsigned data.  libdwarf contains the
two functions

    int dwarf_formudata(Dwarf_Attribute attr,
        Dwarf_Unsignod *retval, Dwarf_Error *err);
    int dwarf_formsdata(Dwarf_Attribute attr,
        Dwarf_Signed *retval, Dwarf_Error *err);

Both accept DW_FORM_data[1,2,4,8] for the source attribute.

Your example seems to demonstrate that FORM_data<n> has been used to
encode signed values by at least one compiler, and libdwarf supports
that usage, so I retract my statement that the intent of FORM_data<n>
was to encode unsigned values (only).  I'm afraid you have to know
what you're looking for to correctly sign-extend the value of an
attribute with FORM_data<n>.  The signedness depends upon context.

todd.allen


[ ...]  Our various dwarf writers & readers all
assume that DW_FORM_data[1248] implies that the value is unsigned.

James Cownie


> Your example seems to demonstrate that FORM_data<n> has been used to
> encode signed values by at least one compiler

No, it doesn't. In my example a FORM_data1 is being used by gcc 3.1 to
encode an unsigned value (greater than 128).

(My problem was that I had assumed that an upper_bound was a signed
quantity and therefore that I should interpret the FORM_data1 as
signed).



David Anderson

As I recall, the lack of a statement of signedness on FORM_data<n>
was intentional and the reason was 'you were supposed to 
be able to tell signedness from context'.
Which is why libdwarf is as it is.

Yes, it can cause nasty lookahead problems.
The requirement for context makes it harder (impossible, I think) for a 
simple dwarf dumper to always print such constants in the 
most appropriate human-readable form.


PROPOSAL (with some trepidation)

Section 7.5.4 Attribute Encodings

To the paragraph explaining 'constant', add the following:

"The data in DW_FORM_data1, DW_FORM_data2, DW_FORM_data4, 
DW_FORM_data8  can be anything.  Depending on context,
it may be an offset to a DIE, a signed integer, an
unsigned integer, a floating-point constant, or anything else.
One must have context to know how to interpret the bits, which
if they are target-machine data (such as an integer or floating
point constant) will be in target-machine byte-order."

----------------------------------------------------------

Revised proposal, May 27, 2005:

1) On page 121 in the section 7.5.4 discussion of constants,
   add at the end of the page.

   The data in DW_FORM_data1, DW_FORM_data2, DW_FORM_data4,
   DW_FORM_data8 can be anything.  Depending on context, it
   may be an offset to a DIE, a signed integer, an unsigned
   integer, a floating-point constant, or anything else.  The
   consumer must use context to know how to interpret the bits,
   which if they are target-machine data (such as an integer
   or floating point constant) will be in target-machine
   byte-order.

2) Add non-normative text after this

   If DW_FORM_data is used to represent a signed or unsigned
   integer it can be extremely hard for a consumer to discover
   the context necessary to determine which interpretation is
   intended. Producers are therefore strongly recommended to
   use DW_FORM_sdata or DW_FORM_udata for signed and unsigned
   integers respectively, rather than DW_FORM_data. 

Accepted as revised.