Issue 050726.1: Meanings of "basic_block" and "end_sequence"

Author: John Bishop
Champion: John Bishop
Date submitted: 2005-07-26
Date revised:
Date closed:
Type: Improvement
Status: Rejected with explaination
DWARF Version: 3
I've been working with our local compiler guys on a bug, and it turns out that we
had read two different meanings into the descriptions of the "basic_block" and 
"end_sequence" flags in the line table.  I therefore suggest re-wording them and
adding italicized sections as follow.

    -John

----
basic_block
   A boolean indication that the current instruction is the beginning of a 
   basic block.

*This is used to indicate the beginning of a set of instructions which has
   a single executuion entry point.  It can be used to indicate the beginning
   of a block in the source code set off by explicit delimiters, but should 
   also be used on basic blocks without explicit delimiters.

   For example, in the C code below, there are three basic blocks set off by
   explicit brackets.  Each should have "basic_block" set:

   line 17: (int i; a(i); {int i; b(i);}; (int i; c(i)}}

   In the C code below, the instruction sets implementing the two branches of
   the "if" ("a()" and "b()") each constitute a basic block.  Each should have
   the "basic_block" flag set.

   line 15:  if (cond)
   line 16:     a() else b();*


end_sequence
   A boolean indicating that the current address is that of an instruction after
   the end of a sequence of target machine instructions.  It means that execution
   may transfer to the current address without necessarily executing the previous
   sequence of instructions.

*This flag is used to represent breaks in execution flow.  It should be set on
   any address which is the target of a jump, branch or call.  In the C code example
   below, "end_sequence" should be set on whichever of the "then" or the "else"
   branches is the one which is not the fall-though from the conditional evaluation:
 
   line 15:  if (cond)
   line 16:     {int j; j = a(); b();} else (c(); d();}

   In this example, "end_sequence" should be set on the code for "x()" as it is
   outside the loop and thus not part of the loop-body sequence:

   line 19:  while (cond) {cond = foo(); bar()} x();*

==================================================

This proposal is rejected.

Explaination:
  basic_block is an flag used to identify the start of a sequence of machine
  instructions which does not contain any branches.  A basic block is a 
  attribute of the generated code, not of the source code.  

  The lexical structure of a program is described by the debug information 
  entries which have an inherent tree structure.  Given that a compiler may 
  reorder, combine, or elminate instructions, especially when optimized, 
  there may be no clear relationship between the source code and the generated 
  code.  In the example above, there is no defined relationship between the
  *lexical blocks* described and the generated code.  Without review of
  the generated code, it cannot be determined which (if any) of the lexical
  blocks described would be contained in sequence of instructions which 
  represent a basic block.

  end_sequence describes the end of a sequence of instructions.  It indicates
  that the address identified with this flag may not contain an instruction and
  may not be addressable.  It *does not* indicate that the location is
  a valid instruction to which the program may branch.  As with basic_block, 
  end_sequence is an artifact of the generated code and does not have a defined 
  relationship with the source code or it's lexical structure.

  end_sequence may be used to identify tables (such as might be generated inline
  for a C switch statement) or addresses (such as subroutine addresses) which 
  should not be interpreted as instructions.