TMS470 C/C++ CODE GENERATION TOOLS
Release 2.54


================================================================================
Table of Contents
================================================================================

1.  CODE_STATE pragma
2.  Default DWARF2 Debug Support
3.  Integer Division With Constant Divisor
4.  The MUST_ITERATE Pragma
5.  The UNROLL Pragma
6.  New linker command file operator, palign
7.  64-Bit Integer Support
8.  Static Stack Depth Analysis
9.  Improved -mn Switch
10.  Branch Chaining
11. Static template instantiation switch
12. New --verbose switch *(New in release 2.53)
13. New --default_order linker switch *(New in release 2.53)
14. Added --align_structs switch *(New in release 2.54)


================================================================================
1. CODE_STATE Pragma
================================================================================

A pragma to override the compilation state of a file, at the function level,
has existed since 2.20 but has not been documented.  For example, if a file
is compiled in thumb mode, but it is desired that a function in that file
be compiled in 32-bit mode, add the following pragma:

#pragma CODE_STATE(function,32);

For the reverse situation add:

#pragma CODE_STATE(function,16);

With C++, the function name is not used and the pragma must precede the 
function definition:

#pragma CODE_STATE(16)
void function()
{
}


================================================================================
2. Default DWARF2 Debug Support
================================================================================

The TMS470 C/C++ Code Generation Tools support the generation of DWARF
symbolic debug information in the output object code.  The DWARF debug output
contains detailed type information about objects and functions used in an
application.  This is the default debug information generated with the -g
shell switch.  The compiler will normally generate some amount of DWARF debug
information, even without the -g switch.  This may include information on
functions, files, and global variables.  This information does not hinder
any optimizations.

It is possible to disable the generation of all symbolic debugging with the
use of the -gn shell switch.

Previous releases of the compiler generated STABS debug information by
default.  The compiler can still generate STABS if necessary.  This
is available with the -gt shell switch.

Debug type merging and type checking are performed by default in the linker.
Any type inconsistencies in the uses of symbols from different object files
are now reported.  This feature can be turned off by passing the -b switch to
the linker.


================================================================================
3. Integer Division With Constant Divisor
================================================================================

The optimizer will attempt to rewrite integer divide operations with constant
divisors.  The integer divides are rewritten as a multiply with the reciprocal
of the divisor.  This occurs at level -o2 and higher.  It is also necessary to 
use the compile-for-speed shell switch: -mf.


================================================================================
4. The MUST_ITERATE Pragma
================================================================================

Two pragmas have been added in this release that aid the user in unrolling
loops.  It is necessary to use optimization level -o1 or higher for these
pragmas.

The MUST_ITERATE pragma specifies to the compiler certain properties of
a loop. Through the use of the MUST_ITERATE pragma, you can guarantee
that a loop executes a certain number of times. The pragma can help the
compiler eliminate unnecessary code.

Any time the UNROLL pragma is applied to a loop, MUST_ITERATE should
be applied to the same loop. In this case, the MUST_ITERATE pragma's third
argument, multiple, should always be specified.

No statements are allowed between the MUST_ITERATE pragma and the for,
while, or do-while loop to which it applies. However, other pragmas, such as
UNROLL, can appear between the MUST_ITERATE pragma and the loop.

The syntax of the pragma for C and C++ is:

#pragma MUST_ITERATE ( min, max, multiple) [;]

The arguments min and max are programmer-guaranteed minimum and maximum
trip counts. The trip count is the number of times a loop iterates. The trip
count of the loop must be evenly divisible by multiple. All arguments are
optional. For example, if the trip count could be 5 or greater, you can specify
the argument list as follows:

#pragma MUST_ITERATE(5);

However, if the trip count could be any nonzero multiple of 5, the pragma would
look like this:

#pragma MUST_ITERATE(5, , 5); /* A blank field for max*/

It is sometimes necessary for you to provide min and multiple in order for the
compiler to perform unrolling. This is especially the case when the compiler
cannot easily determine how many iterations the loop will perform (i.e., the
loop has a complex exit condition).

When specifying a multiple via the MUST_ITERATE pragma, results of the
program are undefined if the trip count is not evenly divisible by multiple. 
Also, results of the program are undefined if the trip count is less than the 
minimum or greater than the maximum specified.

If no min is specified, zero is used. If no max is specified, the largest 
possible number is used. If multiple MUST_ITERATE pragmas are specified for the
same loop, the smallest max and largest min are used.


================================================================================
Using MUST_ITERATE to Expand Compiler Knowledge of Loops
================================================================================

Through the use of the MUST_ITERATE pragma, you can guarantee that a
loop executes a certain number of times. The example below tells the compiler
that the loop is guaranteed to run exactly 10 times:

#pragma MUST_ITERATE(10,10);
for(i = 0; i < trip_count; i++) { ...

If the MUST_ITERATE pragma is not specified for a loop such as this, the 
compiler generates code to bypass the loop, to account for the possibility of 0 
iterations. With the pragma specification, the compiler knows that the loop 
iterates at least once and can eliminate the loop-bypassing code.

MUST_ITERATE can specify a range for the trip count as well as a factor of
the trip count. For example:

#pragma MUST_ITERATE(8,48,8);
for(i = 0; i < trip_count; i++) { ...

This example tells the compiler that the loop executes between 8 and 48 times
and that the trip_count variable is a multiple of 8 (8, 16, 24, 32, 40, 48). The
multiple argument allows the compiler to unroll the loop.

You should also consider using MUST_ITERATE for loops with complicated
bounds. In the following example:

for(i2 = ipos[2]; i2 < 40; i2 += 5) { ...

the compiler would have to generate a divide function call to determine, at 
run-time, the exact number of iterations performed. The compiler will not do 
this.  In this case, using MUST_ITERATE to specify that the loop always 
executes 8 times allows the compiler to generate a hardware loop:

#pragma MUST_ITERATE(8,8);
for(i2 = ipos[2]; i2 < 40; i2 += 5) { ...


================================================================================
5. The UNROLL Pragma
================================================================================

The UNROLL pragma specifies to the compiler how many times a loop should
be unrolled. The optimizer must be invoked (use -o1, -o2, or -o3) in order for
pragma-specified loop unrolling to take place. The compiler has the option of
ignoring this pragma.

No statements are allowed between the UNROLL pragma and the for, while,
or do-while loop to which it applies. However, other pragmas, such as
MUST_ITERATE, can appear between the UNROLL pragma and the loop.
The syntax of the pragma for C and C++ is:

#pragma UNROLL ( n) [;]

If possible, the compiler unrolls the loop so there are n copies of the original
loop. The compiler only unrolls if it can determine that unrolling by a factor 
of n is safe. In order to increase the chances the loop is unrolled, the 
compiler needs to know certain properties:

   * The loop iterates a multiple of n times. This information can be specified      
     to the compiler via the multiple argument in the MUST_ITERATE pragma.
   * The smallest possible number of iterations of the loop.
   * The largest possible number of iterations of the loop.

The compiler can sometimes obtain this information itself by analyzing the
code. However, the compiler can be overly conservative in its assumptions
and may generate more code than is necessary when unrolling. This can also
lead to not unrolling at all.

Furthermore, if the mechanism that determines when the loop should exit is
complex, the compiler may not be able to determine these properties of the
loop. In these cases, you must tell the compiler the properties of the loop by
using the MUST_ITERATE pragma.

The following pragma specification:

#pragma UNROLL(1);

asks that the loop not be unrolled. Automatic loop unrolling also is not
performed in this case.

If multiple UNROLL pragmas are specified for the same loop, it is undefined
which UNROLL pragma is used, if any.


================================================================================
6. New linker command file operator, palign 
================================================================================

The linker will now support the use of a "palign" operator in the 
linker command file. Details are found in the CGTLinker.txt document.


================================================================================
7. 64-bit Integer Support
================================================================================

The TMS470 Compiler now supports the following new data types. The range values
are available as standard macros in the header file limits.h.

-------------------------------------------------------------------------------
Type                Size     Representation    Minimum             Maximum 
                                               Value               Value
-------------------------------------------------------------------------------
long long        
signed long long    64bits   2's        -9223372036854775808 9223372036854775807
                             complement  

unsigned long long  64bits   Binary             0           18446744073709551615
-------------------------------------------------------------------------------

The long long data types are stored in register pairs. In memory they are 
stored as 64-bit objects at word (32-bit) aligned addresses. The ordering of 
the bytes in the 64-bit object depends on the endianness of the target. For 
example in big endian target the value 0x0011223344556677 will be stored as
follows:

	Address x 	00112233
	Address x+4	44556677

long long integer constant can have an "ll" or "LL" suffix. Without the suffix
the value of the constant will determine the type of the constant. 

The formatting rules for long long in C I/O require "ll" in the format string.
For example:

printf("%lld", 0x0011223344556677);
printf("%llx", 0x0011223344556677);

The following new library functions are added:
llabs(), strtoll() and strtoull().


================================================================================
8. Static Stack Depth Analysis
================================================================================

The static stack depth profiler will provide information to the user about the 
maximum stack depth requirements of their application based on the static 
information available to it in the output file generated by the linker.

The profiler is implemented as a stand-alone application called sdp470.  The 
profiler will take a linked output file as input and produce a listing that 
details the stack usage of all of the functions defined in the application.  If 
an application contains indirect calls and/or reentrant procedures, then a 
configuration file should also be provided as input to the profiler.

The syntax for invoking the static stack depth profiler is as follows:

		sdp470 [-c config] out-file

-c config	Identifies a configuration file to be used by the profiler to 
supply information about indirectly called functions and reentrant procedures.

out-file	Identifies linked output file for an application to be analyzed 
by the profiler.  This file will contain debug information about all functions 
included in the final link of an application.

Details of the static stack depth profiler are provided in a separate document,
sdprof.htm (unix platforms) or sdprof.doc (PC platforms).


================================================================================
9. Improved -mn Switch
================================================================================

The existing -mn shell switch is documented as reenabling optimizations 
disabled by the -g option. In previous releases there existed some 
optimizations that were still disabled under -mn. This is no longer the case; 
equivalent code should now be produced with "-g -mn" as compared to when those 
switches are omitted. Note that if only the -g switch is used, a variety of 
optimizations may be disabled to ensure the maximum debugging support possible.


================================================================================
10. Branch Chaining
================================================================================

Support for branch chaining, in 16-BIS mode only, has been added.

a. What is Branch Chaining?

Consider the following code sequence

LAB1:	BR  L10
	....

LAB2:	BR L10
	....
	....
L10:

If L10 is far away from LAB1 (large offset), the assembler would convert BR into
a sequence of branch around and unconditional branch, resulting in a sequence of
2 instructions that are either 4 or 6 bytes long. Instead, if the branch at LAB1
can jump to LAB2, and LAB2 is close enough so that BR can be replaced by a
single, short branch instruction, the resulting code is smaller as the BR in
LAB1 would be converted into one instruction that is 2 bytes long. This method
of branching to branches that jump to the desired target is known as "branch
chaining". Note that LAB2 can in turn jump to another branch if L10 is too far
away from LAB2. Thus, branch chaining can be extended to arbitrary depths.

b. How does it work?

The codegen generates the following (new pseduo) instructions

1. BTcc instead of BRcc
Format: BTcc target,#depth

The #depth is an optional argument and if not specified, is set to the default
branch chaining depth. If specified, the chaining depth for this branch
instruction is set to #depth. The assembler issues a warning if #depth is less
than zero and sets the branch chaining depth for this instruction to zero.

2. BQcc instead of Bcc
Format: BQcc target,#depth

The #depth argument is similar to the one in BT instruction.

The BT pseudo instruction replaces BR (pseduo branch) instruction. Similarly, BQ
replaces B. The assembler performs branch chain optimizations for these
instructions, if branch chaining is enabled. The assembler replaces the BT and
BQ jump targets with the offset to the branch to which these instructions jump.

The default branch chaining is 10 (to prevent longer branch chains from impeding
performance).

c. How to control branch chaining?

The shell supports the  command-line argument "-ab num" that controls the depth
of branch chaining. num is the depth of branch chaining. A value of zero
indicates that no branch chaining should be performed (branch chaining
disabled). A negative value for num results in an assembler warning and the
branch chaining depth is taken as zero.

Alternatively, the argument "--max_branch_chain num" can be  passed to the
assembler directly.

Note that the pseudo instructions for branch chaining are generated only if
compiling in thumb mode and compiling for code size. In all other cases, these
instructions (BT and BQ) are not generated.

Assembly language programmers can use the BT and BQ instructions to enable the
assembler to perform branch chaining. The programmer can control the branch
chaining depth for each instruction using the second (optional) argument.
Assembly programmers must use the BR and B instructions if they wish to prevent
branch chaining for those branches

Branch chaining can be turned off by using the --disable_branch_chaining switch.


================================================================================
11. Static Template Instantiation
================================================================================

This arm release contains an updated shell tool, cl470.exe, with the 
following new switch:

--static_template_instantiation

For example: cl470 --static_template_instantiation aa.cpp

With this switch all template entities are instantiated as needed in that
file by the parser.  These instantiations are also given internal linkage.


================================================================================
12. New --verbose switch
================================================================================

The compiler did not print any banner information during compilation.  For 
example:

cl470 file.c

will not output any compiler details, such as a version number.   A new switch 
has been added, --verbose.  For example:

cl470 --verbose file.c
TMS470 C/C++ Compiler             Version 2.53
Tools Copyright (c) 1996-2004 Texas Instruments Incorporated


================================================================================
13. New --default_order linker switch
================================================================================

The default linker algorithm to allocate sections not listed in a linker 
command file changed with version 2.52.  The new algorithm is a sized-based 
algorithm.  This meant applications relying on the old behavior encountered 
linker errors.  The old behavior found in version 2.51 and earlier is available 
with a new linker switch, --default_order. For example:

cl470 file.c -z --default_order lnk.cmd ...

or 

lnk470 --default_order file.obj lnk.cmd ...


================================================================================
14. Added --align_structs switch *(New in release 2.54)
================================================================================

To replace the old -mw switch which would align a struct to a word boundary,
the following switch has been added:

   --align_structs=<n>

Structure alignment can be forced to a minimum "n" byte boundary where "n" is a 
power of 2 with this switch.  To align all structs to a word boundary use:

   cl470 --align_structs=4

All structs in the file will contain that minimum alignment, including nested 
structs.  This will only set a minimum alignment, this does not "pack" data 
structures.  This can also break a program if one file is compiled with this 
switch and another is not, or a different alignment is used.  The offsets of a 
nested switch could be incorrect in such a case.
