Issue 081130.1: Packed unaligned bit fields

Author: Ron Brender
Champion: Ron Brender
Date submitted: 2008-11-30
Date revised:
Date closed:
Type: Enhancement
Status: Accepted
DWARF Version: 4
Revised Mar 1, 2009

Background
----------

Several problems have been identified by Brender, Robinson and
Allen regarding the description of packed bit fields. Briefly,
the problems appear to be of two kinds:

  1) There is a big-endian bias in the way that bit fields are
     described in DWARF that works for some simpler cases on little-
     endian architectures but fails to work for more complicated
     packed compositions and arrays

  2) The current formulation improperly assumes that the containing
     entity of a bit field is aligned on a storage unit boundary.
     This makes it impossible to reasonably handle types with packed
     fields that are nested in other packed arrays or records.

For a more complete description of these problems, see the original
emails:

    Brender email of 2008-05-17
    Robinson email of 2008-05-19
    Allen email of 2008-07-07

Proposal
--------

This proposal has several parts:

++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++

 1) In Section 5.6.6, replace the existing paragraphs, beginning with
    "If a data member is defined" (paragraph 5 of the section, bottom
    of page 74 in V3) and continuing through the end of the section
    with the following:

<begin proposal text part 1>

The beginning of a data member is described relative to the beginning
of the object in which it is immediately contained. In general, the
beginning is characterized by both an address and a bit offset within
the byte at that address. When the storage for an entity includes all
of the bits in the beginning byte, the beginning bit offset is defined
to be 0.

Bit offsets in DWARF use the bit ordering and direction conventions
that are appropriate to the current language on the target system.

The member entry corresponding to a data member that is defined in a
structure, union or class has either a DW_AT_data_member_location
attribute or a DW_AT_data_bit_offset attribute. If the beginning of the
data member is the same as the beginning of the containing entity then
neither attribute is required.

For a DW_AT_data_member_location attribute there are two cases.

 1) If the value is an integer constant, it is the offset in bytes
    from the beginning of the containing entity. If the beginning of
    the containing entity has a non-zero bit offset then the beginning
    of the member entry has that same bit offset as well.

 2) Otherwise, the value must be a location description. In this case,
    the beginning of the containing entity must be byte aligned. The
    beginning address is pushed on the DWARF stack before the location
    description is evaluated; the result of the evaluation is the base
    address of the member entry.

    *A DW_AT_data_member_location attribute is not valid for a data
    member contained in an entity that is not byte aligned because
    DWARF operations do not allow for manipulating or computing bit
    offsets.*

For a DW_AT_data_bit_offset attribute, the value is an integer
constant (see Section 2.19) that specifies the number of bits from
the beginning of the containing entity to the beginning of the data
member. This value must be greater than or equal to zero, but is not
limited to less than the number of bits per byte.

If the size of a data member is not the same as the size of the
type given for the data member, the data member has either a 
DW_AT_byte_size or a DW_AT_bit_size attribute whose integer constant 
value (see Section 2.19) is the amount of storage needed to hold 
the value of the data member.

*C and C++ bit fields typically require the use of the
DW_AT_data_bit_offset and DW_AT_bit_size attributes.*

*This Standard uses the following bit ordering and direction
conventions in examples.

  . For big-endian architectures, bit offsets are counted from
    high-order to low-order bits within a byte (or larger storage unit);
    in this case, the bit offset identifies the high-order bit of the
    object.

  . For little-endian architectures, bit offsets are counted from
    low-order to high-order bits within a byte (or larger storage unit);
    in this case, the bit offset identifies the low-order bit of the
    object.

In either case, the bit so identified is defined as the "beginning"
of the object.*

*For example, take one possible representation of the following
C structure definition in both big- and little-endian byte orders:

    struct S {
        int j:5;
        int k:6;
        int m:5;
        int n:8;
    };*

*The following diagrams show the structure layout and data bit
offsets for example big- and little-endian architectures, respectively.
Both diagrams show a structure that begins at address A and whose
size is four bytes. Also, high order bits are to the left and low
order bits are to the right.

Big-Endian Data Bit Offsets:
    j:0
    k:5
    m:11
    n:16

    Addresses increase ->
    |       A       |     A + 1     |    A + 2      |    A + 3      |

    Data bit offsets increase ->
    +---------------+---------------+---------------+---------------+
    |0       4|5       10|11      15|16           23|24           31|
    |    j    |    k     |     m    |       n       |     <pad>     |
    |         |          |          |               |               |
    +---------------------------------------------------------------+

Little-Endian Data Bit Offsets:
    j:0
    k:5
    m:11
    n:16

                                                <- Addresses increase
    |     A + 3     |     A + 2     |    A + 1      |       A       |

                                         <- Data bit offsets increase
    +---------------+---------------+---------------+---------------+
    |31           24|23           16|15     11|10        5|4       0|
    |     <pad>     |       n       |    m    |     k     |    j    |
    |               |               |         |           |         |
    +---------------------------------------------------------------+*

*Note that data member bit offsets in this example are the same for
both big- and little-endian architectures even though the fields are
allocated in different "directions" (high-order to low-order versus
low-order to high-order); the bit naming conventions for memory and/or
registers of the target architecture may or may not make this seem
natural.*

*For a more extensize example showing nested and packed records
and arrays, see Appendix D.2.3.*

*Attribute DW_AT_data_bit_offset is new in DWARF Version 4 and is
also used for base types (see Section 5.1). It replaces the attributes
DW_AT_bit_offset and DW_AT_byte_size when used to identify the beginning 
of "bit field" data members as defined in DWARF V3 and earlier. The 
earlier attributes are defined in a manner suitable for bit field 
members on big-endian architectures but which is either awkward or 
incomplete for use on little-endian architectures. (DW_AT_byte_size 
also has other uses that are not affected by this change.)*

*The DW_AT_byte_size, DW_AT_bit_size and DW_AT_bit_offset attribute 
combination is deprecated for data members in DWARF Version 4, but 
implementations may continue to support this use for backward compatibility.*

*The DWARF Version 3 definitions of these attributes are as follows.  

  If the data member entry describes a bit field, then that entry has
  the following attributes:

  * A DW_AT_byte_size attribute whose value (see Section 2.19) is the
  number of bytes that contain an instance of the bit field and any
  padding bits.

  The byte size attribute may be omitted if the size of the object
  containing the bit field can be inferred from the type attribute of
  the data member containing the bit field.

  * A DW_AT_bit_offset attribute whose value (see Section 2.19) is the
  number of bits to the left of the leftmost (most significant) bit of
  the bit field value.
  
  * A DW_AT_bit_size attribute whose value (see Section2.19) is the
  number of bits occupied by the bit field value.
  
  The location description for a bit field calculates the address of an
  anonymous object containing the bit field. The address is relative to
  the structure, union, or class that most closely encloses the bit field
  declaration. The number of bytes in this anonymous object is the value
  of the byte size attribute of the bit field. The offset (in bits) from
  the most significant bit of the anonymous object to the most significant
  bit of the bit field is the value of the bit offset attribute.*

*Diagrams similar to the above that show the use of the DW_AT_byte_size,
DW_AT_bit_size and DW_AT_bit_offset attribute combination may be found 
in the DWARF Version 3 Standard.*

*In comparing DWARF versions 3 and 4, note that DWARF V4 defines
the following combinations of attributes:

    . either DW_AT_data_member_location or DW_AT_data_bit_offset
      (to specify the beginning of the data member)

optionally together with

    . either DW_AT_byte_size or DW_AT_bit_size (to specify the size
      of the data member)

DWARF V3 defines the following combinations

    . DW_AT_data_member_location (to specify the beginning of the
      data member, except this specification is only partial in the
      case of a "bit field")

optionally together with

    . DW_AT_byte_size, DW_AT_bit_size and DW_AT_bit_offset (to
      further specify the beginning of a "bit field" data member
      as well as specify the size of the data member)*

<end proposal part 1>

####################################################################

 2) In Section 5.1, replace the two paragraphs regarding byte size and
    bit fields that begin "A base type entry has" and "If the value
    ... does not fully occupy" (last two paragraphs on page 63 in V3)
    with the following:

<begin proposal text part 2>

A base type entry has either a DW_AT_byte_size attibute or a
DW_AT_bit_size attribute whose integer constant value (see Section
2.19) is the amount of storage needed to hold values of the type.

If the value of an object of the given type does not fully occupy the
storage described by a byte size attribute, the base type entry may 
also have a DW_AT_bit_size and a DW_AT_data_bit_offset attribute,
both of whose values are integer constant values (see Section 2.19).
The bit size attribute describes the actual size in bits used to
represent values of the given type. The data bit offset attribute
is the offset in bits from the beginning of the containing storage to
the beginning of the value. Bits that are part of the offset are padding.
The data bit offset uses the bit ordering and direction conventions
that are appropriate to the current language on the target system. If
this attribute is omitted a default data bit offset of 0 is assumed.

*For additional discussion of data bit offsets, see Section 5.6.6.*

*Attribute DW_AT_data_bit_offset is new in DWARF Version 4 and is also
used for bit field members (see Section 5.6.6). It replaces the attribute
DW_AT_bit_offset when used for base types as defined in DWARF V3 and
earlier. The earlier attribute is defined in a manner suitable for bit
field members on big-endian architectures but which is wasteful for use
on little-endian architectures.*

*The attribute DW_AT_bit_offset is deprecated in DWARF Version 4
for use in base types, but implementations may continue to support its
use for backward compatibility.*

*The DWARF Version 3 definition of these attributes is as follows.*

*A base type entry has a DW_AT_byte_size attribute, whose value (see
  Section 2.19) is the size in bytes of the storage unit used to represent
  an object of the given type.

  If the value of an object of the given type does not fully occupy the
  storage unit described by the byte size attribute, the base type entry
  may have a DW_AT_bit_size attribute and a DW_AT_bit_offset attribute,
  both of whose values (see Section 2.19) are integers. The bit size
  attribute describes the actual size in bits used to represent a value
  of the given type. The bit offset attribute describes the offset in
  bits of the high order bit of a value of the given type from the high
  order bit of the storage unit used to contain that value.*

*In comparing DWARF Versions 3 and 4, note that DWARF V4 defines
the following combinations of attributes:

  . DW_AT_byte_size
  . DW_AT_bit_size
  . DW_AT_byte_size, DW_AT_bit_size and optionally DW_AT_data_bit_offset

DWARF V3 defines the following combinations:

  . DW_AT_byte_size
  . DW_AT_byte_size, DW_AT_bit_size and DW_AT_bit_offset*

<end proposal text part 2>

====================================================================

  3) In Appendix D, insert new Section D.2.3 as follows:

<begin proposal text part 3>

D.2.3 Pascal Example

The Pascal example in Figure 54 is used to illustrate the representation
of packed unaligned bit fields.

    TYPE T : PACKED RECORD      ! bit size is 2
             F5 : BOOLEAN;      ! bit offset is 0
             F6 : BOOLEAN;      ! bit offset is 1
             END;
    VAR V  : PACKED RECORD
             F1 : BOOLEAN;      ! bit offset is 0
             F2 : PACKED RECORD ! bit offset is 1
                  F5 : INTEGER; ! bit offset is 0 in F2, 1 in V
                  END;
             F4 : PACKED ARRAY [0..1] OF T;  ! bit offset is 33
             F7 : T;            ! bit offset is 37
             END;

          Figure 54: Packed record example: source fragment


The DWARF representation in Figure 55 is appropriate. Note that this
same representation is appropriate for both typical big- and little-
endian architectures using the conventions described in Section 5.6.6.

10$:    DW_TAG_base_type
            DW_AT_name("BOOLEAN")
            ...
11$:    DW_TAF_base_type
            DW_AT_name("INTEGER")
            ...

20$:    DW_TAG_structure_type
            DW_AT_name("T")
            DW_AT_bit_size(2)
            DW_TAG_member
                DW_AT_name("F5")
                DW_AT_type(reference to 10$)
                DW_AT_data_bit_offset(0)        ! may be omitted
                DW_AT_bit_size(1)
            DW_TAG_member
                DW_AT_name("F6")
                DW_AT_type(reference to 10$)
                DW_AT_data_bit_offset(1)
                DW_AT_bit_size(1)

21$:    DW_TAG_structure_type           ! anonymous type for F2
            DW_TAG_member
                DW_AT_name("F5")
                DW_AT_type(reference to 11$)

22$:    DW_TAG_array_type               ! anonymous type for F4
            DW_AT_type(reference to 20$)
            DW_TAG_subrange_type
                DW_AT_type(reference to 11$)
                DW_AT_lower_bound(0)
                DW_AT_upper_bound(1)
            DW_AT_bit_stride(2)
            DW_AT_bit_size(4)

23$:    DW_TAG_structure_type           ! anonymous type for V
            DW_AT_bit_size(39)
            DW_TAG_member
                DW_AT_name("F1")
                DW_AT_type(reference to 10$)
                DW_AT_data_bit_offset(0)! may be omitted
                DW_AT_bit_size(1)       ! may be omitted
            DW_AT_member
                DW_AT_name("F2")
                DW_AT_type(reference to 21$)
                DW_AT_data_bit_offset(1)
                DW_AT_bit_size(32)      ! may be omitted
            DW_AT_member
                DW_AT_name("F4")
                DW_AT_type(reference to 22$)
                DW_AT_data_bit_offset(33)
                DW_AT_bit_size(4)       ! may be omitted
            DW_AT_member
                DW_AT_name("F7")
                DW_AT_type(reference to 20$)    ! type T
                DW_AT_data_bit_offset(37)
                DW_AT_bit_size(2)       ! may be omitted

        DW_TAG_variable
            DW_AT_name("V")
            DW_AT_type(reference to 23$)
            DW_AT_location(...)
            ...

        Figure 55: Packed record example: DWARF description


<end proposal text part 3>

--------------------------------------------------------------------

4) In Table 2...

5) In Appendix A, Table...

6) Other places where DW_AT_data_bit_offset needs to be added...


Discussion
----------

This proposal adopts the endian-neutral formulation already used for
bit pieces in DWARF (Section 2.6.3 in V3, 2.6.2 in V4) and applies
it to bit fields in members and base types. This allows producers and
consumers to follow whatever bit field allocation and description
conventions are used by the target language and architecture.
The respective typical and usually most natural big-endian and little-
endian conventions are described in non-normative text, but neither is
mandated.

I believe that all of the little-endian problems described by Robinson
and Allen are addressed simply and naturally using the applicable little-
endian conventions. This is demonstrated by the example taken from
Robinson's email that is presented in the new Section D.2.3.

A limitation is that a data member may not have a non-constant DWARF
location expression if the containing entity is not byte aligned. This
was implicit in V3 but is now made explicit.

In an earlier discussion it was agreed that any proposed solution
to these problems must be able to coexist with the existing attributes
so that implementations can provide backward compatible support if
warranted. The proposal does this.

Finally, the change for basic types is included to keep base types
and bit field members conceptually and structurally similar (as they
are in V3). It also changes the interpretation of bit size to be a
free standing alternative to byte size (rather than used only in
combination).

Also, the new formulation allows a space savings: For both big-
and little-endian architectures the DW_AT_data_bit_offset can usually
be omitted thereby defaulting to an offset of 0. In V3, a little-endian
architure can rarely omit the attribute in practice. [Recall, this
savings applies to base types not members.]


Summary of Changes wrt to the Draft of 2009-01-01
-------------------------------------------------

For data members:
. Allow DW_AT_byte_size as an alterative to DW_AT_bit_size (to be
  consistent with other parts of DWARF V4)
. Add a non-normative comparison of DWARF V4 vs V3 combinations of
  location and size attributes

For basic types:
. Allow DW_AT_byte_size and DW_AT_bit_size to be used together as
  in V3, optionally in combination with DW_AT_data_bit_offset
. Drop the use of DW_AT_data_bit_offset with either DW_AT_byte_size
  or DW_AT_bit_size separately

Summary of Changes wrt to the Draft of 2009-12-08
-------------------------------------------------

For data members:
. Make DW_AT_data_bit_offset and DW_AT_data_member location either/or
. Recast the diagrams and add more description
. Drop claim that example bit ordering conventions are "typical"
. Drop use of the term "bit field"
. Make bit size optional
. Correct D.2.3 example (again)
. Allow DW_AT_byte_size as an alternative to DW_AT_bit_size and
  treat them as orthoginal to whether the member is a bit field or not
. Define DW_OP_bit_size to be optional if the member size is the
  same as that of the member's type

For basic types:
. Make DW_AT_byte_size and DW_AT_bit_size either/or
. Clarify that data offset bits are padding


Summary of Changes wrt to the Draft of 2008-11-30
-------------------------------------------------

. Clarify italics close notation
. In part 1, paragraph 2: "corresponding entry" -> "corresponding member
  entry"
. Replace "storage unit" with "byte"
. Add italics comment to emphaize that a member that is not itself
  a bit field is not necessarily byte aligned
. Add note that Figure 55 applies to both big- & little-ending
  architectures
. Correct array bounds in Figures 54 & 55 

--

Accepted March 3, 2009.