******************************************************************************
*                Host Intrinsics port for c62, 64, 64plus                    *
*                               readme.txt                                   *
*                                                                            *
******************************************************************************

Description
===========
Host port of c62, 64, c64+ intrinsics.

Enables customers to run/prototype code on the host (eg PC, Sparc...)
where the debug environment is often richer.

New users: Please read the PowerPoint slides in host_intrinsics_overview.ppt.

Tests & Dependencies
====================
DSP Side : Tested on :-
- K_*\ tests : 
	- c64x tests
	- designed to be 'real world' testcase usage of intrinsics
	- work *only in little endian*. Hence will pass when using
	TI C6000 DSP, cygwin gcc, Linux, MSVC etc. Will not work on 
	a Sparc Unix box (since this is big-endian). 
	- tested with CCS 3.1 C64xx simulator
- J_*\ tests
	- c64Plus tests
	- designed to be 'real world' testcase usage of intrinsics
	- endian neutral. Hence will work on PC, Linux, Sparc etc.
	- tested with CCS 3.2 Early Adopter Release 5 : Himalaya Simulator
- C6xSimulator_test.[ch] with C6xSimulator_main.[ch] 
	- small unit tests of each and every intrinsic
	- endian neutral
	- tested on PC Microsoft Visual C, Cygwin GCC, PC and Sparc Matlab
	
Host-side : Platforms tested on :- 
	- cygwin with gcc3.3.3
	- Linux hosted on a PC
	- Sparc Solaris workstation (for tests that support big-endian)
	- MSVC 6.0


How to build/run
================
DSP-side
- build the pjt via eg (example for K_iir) 
- [>] c:\<ccsdirectory>\dosrun.bat
- [>] timake K_iir.pjt DEBUG -a
- open the relevant CCS C6000 simulator
- Load & run Debug\K_iir.out

Host-side
- type 'make' in any of the example K_*/, J_*/ dirs.
- Run ./dbg/<nameoftest>.exe

- Should get identical results for cn[] (natural C) .v. c[] (C with 
intrinsics)
- Failures will be noted via 'result failure' else a pass will be shown.
- All tests return fail = 0 or 1. 
0 indicates success, 1 is failure. This enables you to script regression tests.

- Note that there are basically 3 -D pre-processor switches to take note of: -
        -DSIMULATION : this is required on a host-side makefile because it 
        ensures	that e.g. function prototypes are given for the host-side 
        intrinsics. On the DSP-side examples, function prototypes should 
        never be listed for the native DSP intrinsics hence the 
        -DSIMULATION is not used.
	
	-DLITTLE_ENDIAN_HOST / -DBIG_ENDIAN_HOST : you need to select one or 
        the other depending on your host type e.g. if on an x86 hosted Linux 
        box it will be -DLITTLE_ENDIAN_HOST, whilst if on a Sparc Solaris it 
        will be -DBIG_ENDIAN_HOST.
	
	-DTMS320C62X / -DTMS320C64X / -DTMS320C64PX : this host intrinsics 
        port is supplied for c62, c64, and c64plus architectures.
	
	- as an example, if you are building for c64x on a PC host you'd add to 
	your makefile: -
               -DSIMULATION -DLITTLE_ENDIAN_HOST -DTMS320C64X

Big-endian testing (host-side)
- to enable zero changes to the examples and unit_test when evaluating
host intrinsics on a Big-Endian machine (e.g. Sparc), make 'goals'
have been added for endian-neutral samples. Just do: -
	[>] make debug_be
Presently the only endian-neutral samples are J_cnv_dec/ and unit_test/
The default goal is still little endian (i.e. 'make' alone suffices here)


Misc
====
1. each of the K_*/ & J_* tests require certain alignment to run on 
the DSP. In addition, the tests themselves depend on this alignment.
Its satisfied by the DATA_ALIGN pragma on TI DSP. However on a host
platform this pragma is unknown. To provide alignment on a host platform
we added e.g. for an alignment of 8 bytes "__attribute__ ((aligned (8)))" 
This works on GNU C. It can be easily #ifdef'd out via :-

/* If we're not using GNU C, elide __attribute__ */
#ifndef __GNUC__
#  define  __attribute__(x)  /*NOTHING*/
#endif

However, we elected to leave the warnings in place on the host build
re the DATA_ALIGN pragma. This is to flag users on the host that 
alignment is important and can affect performance & correctness
when moving code to the DSP. Also, non-GNU-C host environments
need to find a different way to model the alignment.

2. the amemdN() intrinsics do not meet the characteristics of the 
C6000 spec at present. For example, the _amemd8 and _amemd8_const 
intrinsics tell the compiler to read eg an array of shorts 
with doubleword accesses. This causes LDDW and STDW instructions
to be issued for the array accesses on TI C64x/c64x+ DSP.
There is no easy way to model this via an intrinsic on the host side
given that (i) often these amemdN() intrinsics are on the left hand
side of an expression (hence a macro is required instead of a function)
(ii) any wrapper macro around amemdN() on the host-side may require
users to change large portions of their code - we wanted to avoid this
since customers may have code from 3rd parties.

3. In some cases you may not have to modify your DSP code at all
to 'port' it to the host and use these host intrinsics. This may 
occur in cases where you are only using intrinsics which operate
on types of 32 bits or less. At these sizes, we have found that 
basically all (at least the range we tested on) hosts type-sizes 
are the same as c6x e.g. char = 1 8-bit byte, short = 2 bytes,
int = 4 bytes. 

However, if you use e.g: -
long _lssub (int src1, long src2)
then long is a 40 bit type. This exists on few, if any,
host platforms. Hence to port to a host platform it needs to be
of the form: -
int40 _lssub(int32 a,int40 b);
where int40 is a typedef to 'long' on TI C6000 DSP and
'long long' (most hosts) or '__int64' MSVC 6.0 which 
sign-extends the result to 64 bits. If the prototype
were simply ported without regard for this, it would
give wrong results on the host, since long is 32 bits on most hosts.

The advice is to grep thru your code for usage of 'long',
'double' etc and take appropriate action ie change to typedefs.

Note that if your DSP code uses C99 types such as uint64_t
and you are on a platform which supports stdint.h e.g.
Linux, cygwin GNU C etc, you may have to do nothing...however
its likely you'll still need to account for the 40-bit long case.

4. You should do 
#include "C6xSimulator.h"
#include "C6xSimulator_type_modifiers.h"
in your code so that you can use the typedefs to enable your
code to run on both TI C6000 DSP & your host platform.
Use eg int64_d, int40 typedefs etc.

On TI C6000 DSP the only effect will be inclusion of the typedefs.
On a host platform the intrinsic prototypes will be included etc.

C6xSimulator.h has similarities with c6000\cgtools\include\c6x.h
c6x.h is useful on dsp-code to check you've supplied the 
appropriate argument datatypes to c6000 intrinsics. However 
it is not suitable for use in a host environment because the types
long, float, double etc do not map 1-1 to host environments
(e.g. long = 40 bits on c6x but 32 bits on Win). Hence a typedef
abstraction is required which C6xSimulator.h supplies.

Note that 
#include "C6xSimulator_type_modifiers.h"
is optional. 
It is a header file which defines/undefines certain keywords abstracted 
into its own file because different environments may support/not-support 
different keywords. For example 'restrict' is newly supported in C99. 
Some environments and compilers may support this whilst others may not. 
Those that dont should undefine restrict. By virtue of this abstraction, 
several options exist for the user: -
- use this file as is
- dont use this file. Instead do defines/undefines in your makefile
- use this file and modify it as per the keyword support in your host-env.


******************************************************************************
                      Release History Table of Contents
******************************************************************************

==============================================================================
Version 0.7 Table of Contents
==============================================================================
0.7-01. Bugs fixed
0.7-02. Known bugs

0.7-01. Bugs fixed

1583  _smpy32 fails with certain inputs
1590  40-bit intrinsics (like lsadd) may return wrong results
1596  Multiply intrinsics mishandle 0 times a negative number
1597  Add support for _rotl intrinsic

0.7-02. Known bugs

==============================================================================
Version 0.6 Table of Contents
==============================================================================
0.6-01. Bugs fixed
0.6-02. Known bugs

0.6-01. Bugs fixed

1484  RESOLVED  The intrinsic _cmpyr1 works incorrectly  
1485  RESOLVED  _lmbd intrinsic has the variable m32 datatyped incorrectly  

0.5-02. Known bugs

==============================================================================
Version 0.5 Table of Contents
==============================================================================
0.5-01. New Features
0.5-02. Bugs fixed
0.5-03. Known bugs

0.5-01. New Features 
- added installers with click-wrap license for Linux & Windows
- added unit tests for itof, ftoi

0.5-02. Bugs fixed
1157  WONTFIX  readme version # is wrong for v0.4  
1180  INVALID  potential MSB issue on _lsadd host intrinsic  
1226  FIXED  mem8 intrinsic uses int40 instead of int64  
1344  FIXED  itof and ftoi produce erroneous result  
1349  FIXED  cnv_dec.c - issue when using -Wall or splint  
1456  FIXED  _mpyhlu uses x2 instead of x2u in multiplication  

0.5-03. Known bugs


==============================================================================
Version 0.4 Table of Contents
==============================================================================
0.4-01. New Features
0.4-02. Bugs fixed
0.4-03. Known bugs

0.4-01. New Features 
- changed the header file hierarchy in c6xsim/ to minimize preprocessor
inclusions (and hence potential collisions) in user-code. We now have: -
  - C6xSimulator_type_modifiers.h : defines/undefines keywords e.g.
    restrict which is a C99 keyword, and hence some platforms may/may-not
    support this. Optional hdr file.
  - C6xSimulator_base_types.h : defines the typedefs needed to attain 
    DSP<->host code portability. Required hdr file (included by 
    file C6xSimulator.h)
  - C6xSimulator.h : prototypes for host intrinsics. Required hdr file.
  - _C6xSimulator_priv.h : only used internally bu host intrinsics
    implementation. Do *not* include this header file in user-code.

  One side-effect of this if you were using a previous version of 
  host intrinsics is that you may need to do...
    #include "C6xSimulator.h"
    #include "C6xSimulator_type_modifiers.h"
  instead of just...
    #include "C6xSimulator.h"

0.4-02. Bugs fixed
Bugzilla #1122 - definition of "_hill" doesn't match the declaration
Bugzilla #1150 - annoying ^M chars in several files 

0.4-03. Known bugs


==============================================================================
Version 0.3 Table of Contents
==============================================================================
0.3-01. New Features
0.3-02. Bugs fixed
0.3-03. Known bugs

0.3-01. New Features 
- added in host_intrinsics_overview.ppt to provide FAEs & customers 
with an overview of purpose and how host intrinsics library works.


0.3-02. Bugs fixed
Bugzilla #1071 - fix unions #ifdef protection in C6xSimulatorTypes.h 
                 for dsp build


0.3-03. Known bugs


==============================================================================
Version 0.2 Table of Contents
==============================================================================
0.2-01. New Features
0.2-02. Bugs fixed
0.2-03. Known bugs

0.2-01. New Features 

(a) Implementation change to adopt unions and avoid aliasing at -O2. 

The previous implementation of host intrinsics had code of the following
nature: -

  uint64_d y;
  int64x2u *py;

  py = (int64x2u *)&y;

  py->hi = a;
  py->lo = b;

  return(y);

This implementation has been changed to adopt unions I.e. 

  union reg64 y64;

  y64.x2u.hi = a;
  y64.x2u.lo = b;

  return(y64.x1u_d);

The reason this was done was because the previous code breaks when 
aliasing is enabled, which is automatic at -O2 on many host compilers. 

Aliasing occurs when you can access a single object in more than one way, such
as when two pointers point to the same object or when a pointer points to a
named object. Disambiguating the alias allows the compiler to produce better 
code since it can retain values in registers.

The compiler cannot easily determine if the structure definition
(e.g. int64x2u) and the typedef'ed data type, (e.g. uint64_d)
are compatible types of the same size. The ISO C spec states 
that the result is undefined when you dereference a pointer that points 
to an object of a different (incompatible) type.
In contrast, the union works, because it 
tells the compiler explicitly that they are the same size. 

Another solution to the aliasing problem is to simply add the
-fno-strict-aliasing flag. However, this prevents the host
compiler from making certain optimizations since it must assume
that any pointer may alias with any other. Hence the code was modified
to use unions instead.

There are *NO* changes in the interface I.e. users are *NOT*
required to make *ANY* changes to their application code
to leverage the new, improved host intrinsics.

(b) A new folder unit_test was added. It contains the unit test
updated to run on any host platform. A sample makefile is supplied
for hosts using gcc. If any single intrinsic produces an error
a message will be shown indicating the guilty intrinsic and the 
erroneous value. if all intrinsics produce the correct result
a simple 'pass' message will be displayed.

Recall that this unit_test is endian neutral. Flip the -D build
switch from -DLITTLE_ENDIAN_HOST to -DBIG_ENDIAN_HOST if you 
are on e.g. a Sparc.


0.2-02. Bugs fixed
Bugzilla #874 - readme doesnt give details on -D flags needed


0.2-03. Known bugs


==============================================================================
Version 0.00.01 Table of Contents
==============================================================================
0.00.01-01. New Features
0.00.01-02. Bugs fixed
0.00.01-03. Known bugs

0.00.01-01. New Features 

- First release.

0.00.01-02. Bugs fixed

- First release.

0.00.01-03. Known bugs

- First release.