9.2. NPU Device Deployment

This guide covers deployment to TI devices with Neural Processing Unit (NPU) hardware acceleration.

9.2.1. NPU-Enabled Devices

The following devices include TI’s TINPU:

Device

Family

NPU Features

F28P55

C2000

8-bit/4-bit inference, up to 60k params

AM13E2

MSPM33C

8-bit inference, Cortex-M33 + NPU

MSPM0G5187

MSPM0

8-bit inference, ultra-low power

9.2.2. NPU Compilation

To compile for NPU, use the correct preset:

common:
  target_device: 'F28P55'  # NPU device

training:
  model_name: 'CLS_4k_NPU'  # NPU-compatible model

compilation:
  enable: True
  preset_name: 'compress_npu_layer_data'  # NPU optimization

The compress_npu_layer_data preset:

  • Optimizes memory layout for NPU

  • Compresses weight data

  • Generates NPU-specific code

9.2.3. NPU Model Requirements

Models must follow NPU constraints (see NPU Guidelines):

  • Use model names ending in _NPU

  • Channel counts must be multiples of 4

  • Kernel heights ≤ 7

  • Must use INT8 or INT4 quantization

9.2.4. NPU Compilation Artifacts

After compilation:

.../compilation/artifacts/
├── mod.a                       # Compiled library (includes NPU code)
├── mod.h                       # Model interface
├── model_config.h              # NPU configuration
├── npu_layer_data.bin          # NPU weight data
├── feature_extraction.c        # Feature extraction
└── inference_example.c         # Example code

9.2.5. NPU Initialization

NPU requires initialization before inference:

#include "mod.h"
#include "npu.h"

void main(void) {
    // Initialize system
    System_Init();

    // Initialize NPU hardware
    NPU_Init();

    // Initialize model (loads weights to NPU)
    mod_init();

    // Now ready for inference
    while (1) {
        if (data_ready) {
            run_npu_inference();
        }
    }
}

9.2.6. NPU Inference Code

#include "mod.h"
#include "feature_extraction.h"

// Buffers
float input_buffer[INPUT_SIZE];
float feature_buffer[FEATURE_SIZE];
float output_buffer[NUM_CLASSES];

void run_npu_inference(void) {
    // 1. Collect sensor data
    collect_sensor_data(input_buffer);

    // 2. Extract features (runs on CPU)
    extract_features(input_buffer, feature_buffer);

    // 3. Run NPU inference
    // NPU handles quantization internally
    mod_inference(feature_buffer, output_buffer);

    // 4. Get prediction
    int prediction = argmax(output_buffer, NUM_CLASSES);

    // 5. Act on result
    handle_prediction(prediction);
}

9.2.7. NPU Memory Management

NPU requires specific memory regions:

Weight Memory:

NPU weights are stored in dedicated memory:

// Linker command file
MEMORY
{
    NPU_WEIGHTS : origin = 0x00080000, length = 0x00010000
}

SECTIONS
{
    .npu_weights : > NPU_WEIGHTS
}

Activation Memory:

NPU uses scratch memory for intermediate results:

// Allocate NPU scratch buffer
#pragma DATA_SECTION(npu_scratch, ".npu_scratch")
uint8_t npu_scratch[NPU_SCRATCH_SIZE];

9.2.8. NPU Performance

Typical NPU performance on F28P55:

Model

CPU Time

NPU Time

Speedup

CLS_1k_NPU

2000 µs

150 µs

13x

CLS_4k_NPU

5000 µs

300 µs

17x

CLS_13k_NPU

15000 µs

600 µs

25x

Note: Actual performance depends on model architecture and input size.

9.2.9. NPU Power Considerations

NPU can be power-managed:

// Disable NPU when not in use
void enter_low_power(void) {
    NPU_Disable();  // Saves power
}

// Re-enable before inference
void prepare_inference(void) {
    NPU_Enable();
    // May need small delay for NPU to stabilize
    delay_us(10);
}

9.2.10. NPU Debugging

Verify NPU Initialization:

if (NPU_GetStatus() != NPU_STATUS_READY) {
    // NPU initialization failed
    handle_error();
}

Check Inference Results:

Compare NPU results with expected values from training:

// Known test input
float test_input[] = {...};
float expected_output[] = {...};

mod_inference(test_input, output_buffer);

// Compare
float max_error = 0;
for (int i = 0; i < NUM_CLASSES; i++) {
    float error = fabs(output_buffer[i] - expected_output[i]);
    if (error > max_error) max_error = error;
}

// Quantization error should be small
if (max_error > 0.1) {
    // Unexpected deviation
    debug_print("Max error: %f\n", max_error);
}

9.2.11. NPU Error Handling

Handle NPU errors gracefully:

int run_safe_inference(float* features, float* output) {
    // Check NPU status
    if (NPU_GetStatus() != NPU_STATUS_READY) {
        NPU_Reset();
        if (NPU_GetStatus() != NPU_STATUS_READY) {
            return -1;  // NPU unavailable
        }
    }

    // Run inference
    int result = mod_inference(features, output);

    if (result != 0) {
        // Inference error
        NPU_Reset();
        return -2;
    }

    return 0;  // Success
}

9.2.12. CCS Project Setup for NPU

1. Include NPU Support Files:

From your device SDK, add:

  • NPU driver files

  • NPU header files

  • NPU configuration files

2. Configure Linker:

Ensure linker command file includes NPU memory regions.

3. Add Compiler Defines:

Project Properties → Build → Compiler → Predefined Symbols
Add: NPU_ENABLED=1

9.2.13. Example: Arc Fault on F28P55 NPU

Complete deployment example:

Configuration:

common:
  task_type: 'generic_timeseries_classification'
  target_device: 'F28P55'

dataset:
  dataset_name: 'dc_arc_fault_example_dsk'

training:
  model_name: 'ArcFault_model_400_t'
  quantization: 2
  quantization_method: 'QAT'
  quantization_weight_bitwidth: 8
  quantization_activation_bitwidth: 8

compilation:
  enable: True
  preset_name: 'compress_npu_layer_data'

Main Application:

#include "device.h"
#include "mod.h"
#include "feature_extraction.h"
#include "npu.h"

#define SAMPLE_SIZE 1024
#define FEATURE_SIZE 256
#define NUM_CLASSES 2  // Normal, Arc

float adc_buffer[SAMPLE_SIZE];
float feature_buffer[FEATURE_SIZE];
float output_buffer[NUM_CLASSES];

volatile uint8_t inference_flag = 0;

void main(void) {
    // System initialization
    Device_init();
    Device_initGPIO();

    // Initialize ADC for current sensing
    ADC_Init();

    // Initialize NPU
    NPU_Init();

    // Initialize model
    mod_init();

    // Enable interrupts
    EINT;

    while (1) {
        if (inference_flag) {
            // Extract features
            extract_features(adc_buffer, feature_buffer);

            // Run NPU inference
            mod_inference(feature_buffer, output_buffer);

            // Check for arc fault
            if (output_buffer[1] > output_buffer[0]) {
                // Arc detected!
                GPIO_writePin(ALERT_PIN, 1);
                trigger_protection();
            }

            inference_flag = 0;
        }
    }
}

__interrupt void ADC_ISR(void) {
    static uint16_t sample_idx = 0;

    adc_buffer[sample_idx++] = ADC_readResult();

    if (sample_idx >= SAMPLE_SIZE) {
        sample_idx = 0;
        inference_flag = 1;
    }

    ADC_clearInterruptStatus();
}

9.2.14. Troubleshooting NPU Issues

NPU Initialization Fails:

  • Check device is NPU-enabled

  • Verify NPU clock is enabled

  • Ensure NPU memory regions are defined

Incorrect Results:

  • Verify model is NPU-compatible

  • Check quantization settings match

  • Compare with float model on same input

NPU Hangs:

  • Check for memory conflicts

  • Verify buffer alignments

  • Reset NPU and retry

9.2.15. Next Steps