Issue 050726.1: Meanings of "basic_block" and "end_sequence"
I've been working with our local compiler guys on a bug, and it turns out that we
had read two different meanings into the descriptions of the "basic_block" and
"end_sequence" flags in the line table. I therefore suggest re-wording them and
adding italicized sections as follow.
-John
----
basic_block
A boolean indication that the current instruction is the beginning of a
basic block.
*This is used to indicate the beginning of a set of instructions which has
a single executuion entry point. It can be used to indicate the beginning
of a block in the source code set off by explicit delimiters, but should
also be used on basic blocks without explicit delimiters.
For example, in the C code below, there are three basic blocks set off by
explicit brackets. Each should have "basic_block" set:
line 17: (int i; a(i); {int i; b(i);}; (int i; c(i)}}
In the C code below, the instruction sets implementing the two branches of
the "if" ("a()" and "b()") each constitute a basic block. Each should have
the "basic_block" flag set.
line 15: if (cond)
line 16: a() else b();*
end_sequence
A boolean indicating that the current address is that of an instruction after
the end of a sequence of target machine instructions. It means that execution
may transfer to the current address without necessarily executing the previous
sequence of instructions.
*This flag is used to represent breaks in execution flow. It should be set on
any address which is the target of a jump, branch or call. In the C code example
below, "end_sequence" should be set on whichever of the "then" or the "else"
branches is the one which is not the fall-though from the conditional evaluation:
line 15: if (cond)
line 16: {int j; j = a(); b();} else (c(); d();}
In this example, "end_sequence" should be set on the code for "x()" as it is
outside the loop and thus not part of the loop-body sequence:
line 19: while (cond) {cond = foo(); bar()} x();*
==================================================
This proposal is rejected.
Explaination:
basic_block is an flag used to identify the start of a sequence of machine
instructions which does not contain any branches. A basic block is a
attribute of the generated code, not of the source code.
The lexical structure of a program is described by the debug information
entries which have an inherent tree structure. Given that a compiler may
reorder, combine, or elminate instructions, especially when optimized,
there may be no clear relationship between the source code and the generated
code. In the example above, there is no defined relationship between the
*lexical blocks* described and the generated code. Without review of
the generated code, it cannot be determined which (if any) of the lexical
blocks described would be contained in sequence of instructions which
represent a basic block.
end_sequence describes the end of a sequence of instructions. It indicates
that the address identified with this flag may not contain an instruction and
may not be addressable. It *does not* indicate that the location is
a valid instruction to which the program may branch. As with basic_block,
end_sequence is an artifact of the generated code and does not have a defined
relationship with the source code or it's lexical structure.
end_sequence may be used to identify tables (such as might be generated inline
for a C switch statement) or addresses (such as subroutine addresses) which
should not be interpreted as instructions.