DWARF Standard


HOME
SPECIFICATIONS
FAQ
ISSUES



211206.1 Markus Metzger SIMD location expressions Enhancement Open Markus Metzger


Section 2.5.1.3, pg 29ff

Implicitly vectorized code executes multiple instances of a source code
loop or of a source code kernel function simultaneously in a single
sequence of instructions operating on a vector of data elements (cf. SIMD:
Single Instruction Multiple Data).  The size of this vector shall be
referred to as SIMD width and an individual element as SIMD lane in the
remainder of this text.

The user has written the source code from the point of view of a single
SIMD lane.  The compiler vectorized it to execute multiple SIMD lanes
simultaneously in one instruction.  This is very prominent in languages
like OpenCL or SYCL, where the user writes code for a single work item
that is then executed over a 1-, 2-, or 3-dimensional matrix.

We also see this with OpenMP's SIMD construct and we may encounter this in
other languages when using an optimizing compiler.  The scalar loop in
this C function, for example,
 
void foo(char dst[], char src[], int len) {
  for (int i = 0; i < len; ++i)
    dst[i] += src[i];
}

can be translated into one or more vectorized loops of varying width.  For
example, `gcc -O -ftree-vectorize` on IA generates a 16-wide loop.

The vectorized loop packs 16 elements into an IA vector register and
processes all 16 elements in one instruction.  The code would load 16
adjacent elements, each, add them, and store the result back.

When debugging the above loop, the user would like to be able to inspect
the variable i and the array elements dst[i] and src[i].  Since the code
had been vectorized, multiple instances of the code are executed in
parallel.  To map such vectorized machine code back to scalar source code,
debuggers may allow users to focus on a single SIMD lane at a time.  Debug
information must hence be capable of describing the location of a given
variable with respect to a given SIMD lane.

Since the trip count is not known at compile-time, the compiler also
generates a 1-wide instance of the loop.  This version processes one
element at a time.  Control flows from the vectorized loop to this scalar
loop when the remaining trip count falls below a compiler-determined
threshold.

The same scalar source code may hence be executed using different SIMD
widths at different locations in the same function.  Debuggers would want
to show the SIMD width for the current code and only allow the user to
select a SIMD lane within those boundaries.

To be able to describe this, we propose a new operator
    DW_OP_push_simd_lane
to describe the location of a variable as function of the SIMD lane and a
new line table register
    simd_width
to describe the SIMD width of vectorized code regions.

---

Section 2.5.1.3, pg. 29ff.

Add
    16. DW_OP_push_simd_lane
        The DW_OP_push_simd_lane operation pushes the SIMD lane for which
        the expression shall be evaluated.

        Non-normative: Producers that widen scalar source into vectorized
        machine code may use this operation to describe the location of a
        source variable as function of a single SIMD lane in the widened
        machine code.  Consumers will supply the SIMD lane argument to
        obtain the location of the instance of that source variable that
        corresponds to the provided SIMD lane argument.

Section 6.2.2, p.150ff.

Add
    simd_width    | An unsigned integer whose value encodes the width of
                    implicitly vectorized code.

                    A value of one means that either the code is not
                    vectorized or that the source has already been
                    vectorized.

                    If the compiler implicitly vectorized alread
                    vectorized source code, e.g. by widening an 8-wide
                    vectorized source into 16-wide machine code, this
                    value gives the implicit widening factor, 2 in the
                    above example.

                    This value does not only apply to vector instructions.
                    If a loop has been widened, the entire loop body shall
                    be annotated with the widening factor.

                    The value zero is reserved.
to Table 6.3.

Section 6.2.3, p.153.

Add
    simd_width    | 1
to Table 6.4.

Section 6.2.5.2, p.164.

Add
     13. DW_LNS_set_simd_width

         The DW_LNS_set_simd_width opcode takes a single unsigned LEB128
         operand and stores that value in the simd_width register of the
         state machine.



All logos and trademarks in this site are property of their respective owner.
The comments are property of their posters, all the rest © 2007-2021 by DWARF Standards Committee.