8.3. Standalone Quantization Examples

The tinyml-modeloptimization package includes standalone Python examples demonstrating direct use of the quantization wrappers. These examples are located in tinyml-modeloptimization/torchmodelopt/examples/ and can be run independently of the Tiny ML Tensorlab YAML-based toolchain.

Note

These examples are for users who want to integrate quantization into their own PyTorch training scripts. For most users, the YAML-based toolchain (see Quantization) is recommended.

8.3.1. Overview

Example	Dataset	Description
FMNIST Image Classification	Fashion MNIST	Beginner example: LinearReLU model, QAT, TINPU export, ONNX inference, node renaming
Audio Keyword Spotting	Speech Commands v2	DSCNN model, QAT/PTQ, mixed precision, bias calibration (MLPerf Tiny benchmark)
Motor Fault Classification	Motor vibration CSV	CNN on time series, QAT/PTQ, 2b/4b/8b weights, confusion matrix
MNIST Digit Classification	MNIST	LeNet-5, QAT/PTQ, 4/8-bit quantization, ONNX export
Torque Regression	Torque measurement CSV	CNN regression, QAT/PTQ, R2/SMAPE metrics

8.3.2. FMNIST Image Classification

A beginner-friendly example using the Fashion MNIST dataset (60,000 training + 10,000 test samples, 10 classes).

Model: Simple neural network with Linear + ReLU layers.

Workflow:

Create train and test dataloaders from torchvision.datasets
Define a classification neural network (LinearReLU stack)
Train and test the float model (5 epochs)
Wrap with TINPUTinyMLQATFxModule for quantization-aware training
Train and test the quantized model
Convert from PyTorch QDQ layers to TI NPU int8 layers
Rename input node to 'input' for inference compatibility
Export as fmnist_int8.onnx

Key Code:

from tinyml_torchmodelopt.quantization import TINPUTinyMLQATFxModule

# After float training:
ti_model = TINPUTinyMLQATFxModule(model, total_epochs=5)

# QAT training loop (same loss/optimizer as float)
for epoch in range(5):
    for images, targets in train_loader:
        output = ti_model(images)
        loss = criterion(output, targets)
        loss.backward()
        optimizer.step()

# Convert and export
ti_model.eval()
ti_model = ti_model.convert()
ti_model.export(dummy_input, 'fmnist_int8.onnx', input_names=['input'])

Run:

cd tinyml-modeloptimization/torchmodelopt/examples/fmnist_image_classification
python fmnist_tinpu_qat.py

8.3.3. Audio Keyword Spotting

An advanced example based on the MLPerf Tiny keyword spotting benchmark. Uses a Depthwise Separable Convolutional Neural Network (DSCNN) to identify 10 keywords from the Google Speech Commands v2 dataset.

Model: DSCNN (2D conv + 4 depthwise separable conv blocks + global average pooling + FC layer).

Dataset: Modified Speech Commands v2 with 12 classes (10 keywords + “unknown” + “silence”). Training/validation/test split: 80%/10%/10%.

Features Demonstrated:

QAT and PTQ workflows
Mixed precision quantization
Bias calibration (bias_calibration_factor)
Cosine annealing LR scheduler
ONNX export for NPU deployment

Workflow:

Download and prepare Speech Commands v2 dataset
Build DSCNN model with batch normalization and dropout
Train float model (10 epochs with cosine LR schedule)
Wrap with TINPUTinyMLQuantFxModule
Perform QAT or PTQ calibration
Export as quant_kws.onnx

Run:

cd tinyml-modeloptimization/torchmodelopt/examples/audio_keyword_spotting
python main.py

8.3.4. Motor Fault Time Series Classification

Demonstrates quantization for time series classification using motor vibration sensor data. The CNN model classifies fault conditions from 3-axis accelerometer readings.

Model: Small CNN for vibration data classification.

Data Format: CSV with columns Vibx, Viby, Vibz, Target. Rows are segmented into sliding windows controlled by WINDOW_LENGTH and WINDOW_OFFSET.

Features Demonstrated:

Both TINPU and Generic quantization device types
QAT and PTQ workflows
2-bit, 4-bit, and 8-bit weight quantization
Confusion matrix evaluation
ONNX Runtime inference validation

Configuration (edit constants in script):

QUANTIZATION_METHOD = 'QAT'      # or 'PTQ'
WEIGHT_BITWIDTH = 8              # 2, 4, or 8
ACTIVATION_BITWIDTH = 8          # typically 8
QUANTIZATION_DEVICE_TYPE = 'TINPU'  # or 'GENERIC'
WINDOW_LENGTH = 256
WINDOW_OFFSET = 64

Script Structure:

Function	Purpose
`get_dataset_from_csv()`	Load CSV data and create sliding windows
`get_nn_model()`	Create the CNN model
`train_model()`	Float and QAT training loop
`calibrate_model()`	PTQ calibration pass
`get_quant_model()`	Select TINPU/Generic and QAT/PTQ wrapper
`export_model()`	Export to ONNX format
`validate_saved_model()`	ONNX Runtime inference validation

Run:

cd tinyml-modeloptimization/torchmodelopt/examples/motor_fault_time_series_classification
python motor_fault_classification_tinpu_quant.py

8.3.5. MNIST Digit Classification

Uses the classic LeNet-5 architecture on MNIST, demonstrating a complete TinyML workflow from training to quantization and ONNX export.

Model: LeNet-5 CNN:

Conv1: 8 filters, 3x3, BatchNorm + ReLU + MaxPool
Conv2: 16 filters, 3x3, BatchNorm + ReLU + MaxPool
FC1: 400 → 120
FC2: 120 → 84
FC3: 84 → 10

Features Demonstrated:

QAT and PTQ workflows
4-bit and 8-bit quantization
Cosine annealing LR scheduler
Float and quantized ONNX export
Complete 14-epoch training pipeline

Workflow:

Download MNIST (28x28 grayscale, 60K train + 10K test)
Normalize with mean=0.1307, std=0.3081
Train LeNet-5 for 14 epochs (SGD + CosineAnnealingLR)
Wrap with TINPUTinyMLQuantFxModule
Perform QAT or PTQ
Validate quantized model accuracy
Export both float and quantized ONNX models

Run:

cd tinyml-modeloptimization/torchmodelopt/examples/mnist_lenet5_classification
python main.py

8.3.6. Torque Time Series Regression

Demonstrates quantization for a regression task (continuous value prediction) using sensor time-series data from motor torque measurements.

Model: Small CNN for torque prediction.

Data: CSV dataset with sensor columns and a torque target column. Available from TI’s public dataset server. Data is segmented into sliding windows.

Features Demonstrated:

QAT quantization for regression models
R2 and SMAPE evaluation metrics
TINPU and Generic device types
ONNX Runtime inference validation

Configuration (edit constants in script):

QUANTIZATION_METHOD = 'QAT'
WEIGHT_BITWIDTH = 8
ACTIVATION_BITWIDTH = 8
QUANTIZATION_DEVICE_TYPE = 'TINPU'

Run:

cd tinyml-modeloptimization/torchmodelopt/examples/torque_time_series_regression
python torque_regression_tinpu_quant.py

8.3.7. Quantization Guidance

These best practices apply when using the wrappers directly:

Choosing QAT vs PTQ:

PTQ (safe default): Use QUANTIZATION_METHOD = 'PTQ' with 8-bit weights and activations. Fast, requires only a calibration pass.
QAT (better accuracy): Switch to QAT if PTQ accuracy degrades, especially for sub-8-bit quantization. Use a smaller learning rate and more epochs.

Sub-8-bit Quantization:

For 4-bit or 2-bit weights: prefer QAT with per-channel weight quantization
Careful tuning of calibration, clipping, and bias correction is important
TINPU prefers symmetric per-channel weight quantization with power-of-two scales

PTQ Calibration:

Use representative inputs (hundreds to a few thousand samples)
Poor calibration causes large activation quantization errors
Ensure calibration data covers the full input distribution

ONNX Evaluation:

Always validate the exported ONNX model before deploying to device:

import onnxruntime as ort

session = ort.InferenceSession('model_int8.onnx')
prediction = session.run(None, {'input': test_input.numpy()})

8.3.8. Next Steps

Quantization - Quantization via the YAML-based toolchain
Neural Architecture Search - Automatic model architecture search
NPU Device Deployment - Deploy quantized models to NPU devices