RTSC Interface Primer/Lesson 13

From RTSC-Pedia

Jump to: navigation, search

revision tip

—— LANDSCAPE orientation

[printable version]

offline version generated on 04-Aug-2010 21:08 UTC

RTSC Interface Primer/Lesson 13

Proxy modules — managing target-specific Fir implementations

Applying what you've already learned about RTSC interfaces and proxy modules, this lesson lays out a series of transformations of the Fir module originally introduced back in Lesson 9. Beginning with a portable baseline where correctness trumps efficiency, we'll then proceed to significantly improve the performance of Fir—but at the cost of polluting an otherwise clean body of code with non-portable constructs.

Through skillful use of a proxy, however, we can better isolate target-specific code from an otherwise portable implementation of the Fir module. By the end of this process, Fir will not only have greater flexibility and thus higher potential for re-use across a broad spectrum of applications, the module will also enjoy an quantum improvement in its run-time performance.

1 Establishing a baseline
2 Optimizing for TMS32064+
3 Generalizing through a proxy
4 See also

Establishing a baseline

With several alternative implementations of Fir lying ahead of us, let's first abstract the essence of the module's original Fir.xdc specification into a new RTSC interface named stdsorg.math.IFir.

stdsorg/math/IFir.xdc

/*! Abstract FIR filter */
interface IFir {
 
instance:
 
    /*! Number of data-values per frame */
    config Int frameLen = 64;
 
    /*! Create a new filter
     *
     *  @param(coeffs)      filter coefficients
     *  @param(coeffsLen)   number of coefficients
     */
    create(Int16 coeffs[], Int coeffsLen);
 
    /*! Apply this filter and update its internal history
     *  @param(inFrame)    readonly frame of input data
     *  @param(outFrame)   next frame of output data
     */
    Void apply(Int16 inFrame[], Int16 outFrame[]);
}

We'll now trivially re-work the original acme.filters.Fir module into a derivative named acme.filters2.FirA. Other than what seems like a gratuitous change of name—we'll eventually produce a FirB and FirC module in the same package—the spec of FirA boils down to a more abbreviated form of the original Fir.xdc file from Lesson 9.

acme/filters2/FirA.xdc

import stdsorg.math.IFir;
 
/*! Baseline IFir implementation */
@InstanceInitError
@InstanceFinalize
module FirA inherits IFir {
 
internal:
 
    struct Instance_State {
        Int16 coeffs[];         /* create argument */
        Int coeffsLen;          /* create argument */
        Int frameLen;           /* instance config */
        Int16 history[];        /* filter history */
    }
}

By virtue of inheriting IFir, what persists in the new FirA.xdc spec pertains exclusively to the internal implementation of this module. Here again, other than a simple name change (Fir becomes FirA) the contents of acme/filters2/FirA.c and acme/filters2/FirA.xs remain identical to the original target-implementation and meta-implementation of acme.filters.Fir; if you want to see for yourself, feel free to browse the source in the «examples»/acme/filters2 package directory.

Optimizing for TMS32064+

Anyone who's worked with digital signal processors (DSPs) will immediately recognize that FirA—based on the original implementation of acme.filters.Fir from Lesson 9—offers many opportunities for slashing execution time. Always wanting to improve performance by optimizing for a particular processor, consider the acme.filters2.FirB module whose spec virtually clones FirA but whose target-implementation diverges considerably inside the apply function.

acme/filters2/FirB.xdc

import stdsorg.math.IFir;
 
/*! Optimized IFir implementation */
@InstanceInitError
@InstanceFinalize
module FirB inherits IFir {
 
internal:
 
    struct Instance_State {
        Int16 coeffs[];         /* create argument */
        Int coeffsLen;          /* create argument */
        Int frameLen;           /* instance config */
        Int16 history[];        /* filter history */
    }
}

acme/filters2/FirB.c

1 2 3

    ...
#include "package/internal/FirB.xdc.h"
 
#include <string.h>
#include <c6x.h>       /* declares TMS32064+ intrinsics */
 
    ...
 
Void FirB_apply(FirB_Object *obj, Int16 inFrame[], Int16 outFrame[])
{
    Int i, j;
    Int32 sum0, sum1;
 
    Int coeffsLen = obj->coeffsLen;
    Int frameLen = obj->frameLen;
 
    Int16 *history = obj->history;
    Int16 *coeffs = obj->coeffs;
 
    Int32 *history2 = (Int32 *)history;
    Int32 *coeffs2 = (Int32 *)coeffs;
 
    memcpy(&history[coeffsLen - 1], inFrame, frameLen * sizeof (Int16));
 
    for (j = 0; j < frameLen / 2; j++) {
        sum0 = sum1 = 0;
        for (i = 0; i < coeffsLen / 2; i++) {
            sum0 += _mpy(history2[i + j], coeffs2[i]);
            sum0 += _mpyh(history2[i + j], coeffs2[i]);
            sum1 += _mpyhl(history2[i + j], coeffs2[i]);
            sum1 += _mpylh(history2[i + j + 1], coeffs2[i]);
        }
        *outFrame++ = (Int16)(sum0 >> 15);
        *outFrame++ = (Int16)(sum1 >> 15);
    }
 
    memcpy(history, &history[frameLen], (coeffsLen - 1) * sizeof (Int16));    
}

Besides using compiler-specific intrinsics that effectively map directly to underlying processor instructions—declared at line 1 and used after line 3—we've also unrolled the for loop beginning at line 2 to further increase opportunities for parallelism. As we'll see in Lesson 14 when we benchmark FirB against our FirA baseline, we've significantly improved runtime performance here—but only by transforming a portable implementation (FirA) into one that can now targets a specific processor supported by a specific compiler (FirB).

If none of this makes any sense, don't worry about it: knowledge of digital signal process(ing/ors) has never been a prerequisite for working with RTSC. At the same, do appreciate that processor-dependent or compiler-specific C code not unlike the body of FirB_apply will indeed exist within most embedded application programs; the trick becomes isolating this sort of code as much as possible.

Generalizing through a proxy

We come now to acme.filters2.FirC which, through skillful use of a proxy, transforms the (non-portable) FirB implementation seen above into one that we can now build for any RTSC target or platform. Applying the same design technique we've illustrated with Bench in Lesson 12—where we employed a PClock proxy to ultimately abstract alternate implementations of a getTime function, one of which depended directly upon the TMS32064+ processor—the FirC module specifies a PMathOps proxy that in turn can delegate to a module specifically tailored for the target, platform, or application at hand.

acme/filters2/FirC.xdc

1

import stdsorg.math.IFir;
import stdsorg.math.IMathOps;
 
/*! Generalized IFir implementation */
@InstanceInitError
@InstanceFinalize
module FirC inherits IFir {
 
    /*! Selectable IMathOps service provider */
    proxy PMathOps inherits IMathOps;
 
internal:
 
    struct Instance_State {
        Int16 coeffs[];         /* create argument */
        Int coeffsLen;          /* create argument */
        Int frameLen;           /* instance config */
        Int16 history[];        /* filter history */
    }
}

The new stdsorg.math.IMathOps interface referenced here at line 1 captures the essence of the low-level math operations used directly after line 3 in the earlier FirB.c target-implementation. Later on, we'll examine two different implementations of this interface—one that's portable, and one that's not.

stdsorg/math/IMathOps.xdc

2

/*! Abstract math operations */
interface IMathOps {
 
    /*! Product of low 16-bits of x and y */
    Int32 mpy(Int32 x, Int32 y);
 
    /*! Product of high 16-bits of x and y */
    Int32 mpyh(Int32 x, Int32 y);
 
    /*! Product of high 16-bits of x and low 16-bits of y */
    Int32 mpyhl(Int32 x, Int32 y);
 
    /*! Product of low 16-bits of x and high 16-bits of y */
    Int32 mpylh(Int32 x, Int32 y);
}

FirC target-implementation. Before we investigate some alternate implementations of the new IMathOps interface, consider how introducing the PMathOps proxy at line 1 of FirC.xdc now leads to a portable C implementation of the module itself within the target-domain.

acme/filters2/FirC.c

3

    ...
#include "package/internal/FirC.xdc.h"
 
#include <string.h>
 
    ...
 
Void FirC_apply(FirC_Object *obj, Int16 inFrame[], Int16 outFrame[])
{
    ...
 
    for (j = 0; j < frameLen / 2; j++) {
        sum0 = sum1 = 0;
        for (i = 0; i < coeffsLen / 2; i++) {
            sum0 += FirC_PMathOps_mpy(history2[i + j], coeffs2[i]);
            sum0 += FirC_PMathOps_mpyh(history2[i + j], coeffs2[i]);
            sum1 += FirC_PMathOps_mpyhl(history2[i + j], coeffs2[i]);
            sum1 += FirC_PMathOps_mpylh(history2[i + j + 1], coeffs2[i]);
        }
        *outFrame++ = (Int16)(sum0 >> 15);
        *outFrame++ = (Int16)(sum1 >> 15);
    }
 
    ...
}

Focusing on some small (but important) differences between the FirB and FirC implementations, we've basically replaced direct use of TMS32064+ intrinsics at line 3 of FirB.c with corresponding calls to proxy functions at line 3 of FirC.c; you'll find declarations for the latter functions back at line 2 of the IMathOps interface spec. Equally important, FirC no longer requires the non-portable <c6x.h> intrinsics header brought in at line 1 of FirB.c.

Alternate IMathOps implementations. As with the IClock interface used by Bench in Lesson 12, we'll offer up two rather distinct implementations of IMathOps—one tied to the TMS32064+ intrinsics (MathOps64P), the other expressed in portable C code (MathOpsStd). Though deployed in separate packages, the specs for this pair of modules look virtually identical.

	`txn/mathops/MathOps64P.xdc`
	import stdsorg.math.IMathOps; /! TMS32064+ IMathOps module / module MathOps64P inherits IMathOps {}

	`stdsorg/math/MathOpsStd.xdc`
	import stdsorg.math.IMathOps; /! Portable IMathOps module / module MathOpsStd inherits IMathOps {}

Although IMathOps and MathOpsStd already reside in the same package scope, we've still explicitly named stdsorg.math.IMathOps through an import statement in MathOpsStd.xdc. This discipline tracks a similar pattern seen with #include directives in the target-domain and with xdc.useModule directives in the meta-domain—that is, fully-qualified references to spec'd units near the top of the file followed by subsequent use of the unit's short name.

Turning first to MathOps64P, its target-implementation simply leverages the same TMS32064+ intrinsics used directly at line 3 within FirB.c. Just like the earlier Clock64P module from Lesson 12, we've consciously fielded a non-portable implementation dependent upon a specific compiler and processor.

txn/mathops/MathOps64P.c

#include "package/internal/MathOps64P.xdc.h"
 
#include <c6x.h>        /* declares TMS32064+ intrinsics */
 
Int32 MathOps64P_mpy(Int32 x, Int32 y)
{
    return _mpy(x, y);
}
 
Int32 MathOps64P_mpyh(Int32 x, Int32 y)
{
    return _mpyh(x, y);
}
 
Int32 MathOps64P_mpyhl(Int32 x, Int32 y)
{
    return _mpyhl(x, y);
}
 
Int32 MathOps64P_mpylh(Int32 x, Int32 y)
{
    return _mpylh(x, y);
}

By contrast, the MathOpsStd module effectively serves as a portable reference implementation of the IMathOps interface and—rather appropriately—resides in the same package containing the latter's spec.

stdsorg/math/MathOpsStd.c

#include "package/internal/MathOpsStd.xdc.h"
 
Int32 MathOpsStd_mpy(Int32 x, Int32 y)
{
    return (Int16)(x) * (Int16)(y);
}
 
Int32 MathOpsStd_mpyh(Int32 x, Int32 y)
{
    return (Int16)((0xFFFF0000 & x) >> 16) * (Int16)((0xFFFF0000 & y) >> 16);
}
 
Int32 MathOpsStd_mpyhl(Int32 x, Int32 y)
{
    return (Int16)((0xFFFF0000 & x) >> 16) * (Int16)(y);
}
 
Int32 MathOpsStd_mpylh(Int32 x, Int32 y)
{
    return (Int16)(x) * (Int16)((0xFFFF0000 & y) >> 16);
}

As we've already seen in some earlier examples, both the MathOps64P and MathOpsStd modules have empty implementations within the meta-domain.

Besides other processor-specific implementations of the IMathOps interface suitable for packages like txn.mathops, you could likewise imagine other portable implementations contained in stdsorg.mathops or elsewhere. One such implementation—used perhaps during initial program testing—might track the range of values actually passed to these functions at runtime; another might raise runtime errors if values fall outside a prescribed range. In general, we can design these sorts of (portable) modules to stack atop existing IMathOps implementations (portable or otherwise) through introduction of their own IMathOps proxy similarly bound to a downstream delegate.

FirC meta-implementation. To complete the picture, the module$use function defined inside FirC.xs binds a suitable IMathOps delegate implementation to the FirC.PMathOps proxy if some client hasn't already done so.

acme/filters2/FirC.xs

1

    ...
function module$use()
{
    Error = xdc.useModule('xdc.runtime.Error');
    Memory = xdc.useModule('xdc.runtime.Memory');
    Program = xdc.useModule('xdc.cfg.Program');
 
    var FirC = xdc.useModule('acme.filters2.FirC');
 
    if (FirC.PMathOps == null) {
        if (Program.build.target.isa == "64P") {
            FirC.PMathOps = xdc.useModule('txn.mathops.MathOps64P');
        }
        else {
            FirC.PMathOps = xdc.useModule('stdsorg.math.MathOpsStd');
        }
    }
}
    ...

Through disciplined choice of canonical names for target-specific implementations of IMathOps interface, we could then generalize the if statement at line 1 into a closed expression that computes the module's fully-qualified name as a function of the RTSC target used to build the current program.

From RTSC-Pedia

Contents

Establishing a baseline

Optimizing for TMS32064+

Generalizing through a proxy

See also

Views

Personal tools

Navigation

binders

package reference

lists

tools

Search

more tools