8.3. Standalone Quantization Examples
The tinyml-modeloptimization package includes standalone Python examples
demonstrating direct use of the quantization wrappers. These examples are
located in tinyml-modeloptimization/torchmodelopt/examples/ and can be
run independently of the Tiny ML Tensorlab YAML-based toolchain.
Note
These examples are for users who want to integrate quantization into their own PyTorch training scripts. For most users, the YAML-based toolchain (see Quantization) is recommended.
8.3.1. Overview
Example |
Dataset |
Description |
|---|---|---|
FMNIST Image Classification |
Fashion MNIST |
Beginner example: LinearReLU model, QAT, TINPU export, ONNX inference, node renaming |
Audio Keyword Spotting |
Speech Commands v2 |
DSCNN model, QAT/PTQ, mixed precision, bias calibration (MLPerf Tiny benchmark) |
Motor Fault Classification |
Motor vibration CSV |
CNN on time series, QAT/PTQ, 2b/4b/8b weights, confusion matrix |
MNIST Digit Classification |
MNIST |
LeNet-5, QAT/PTQ, 4/8-bit quantization, ONNX export |
Torque Regression |
Torque measurement CSV |
CNN regression, QAT/PTQ, R2/SMAPE metrics |
8.3.2. FMNIST Image Classification
A beginner-friendly example using the Fashion MNIST dataset (60,000 training + 10,000 test samples, 10 classes).
Model: Simple neural network with Linear + ReLU layers.
Workflow:
Create train and test dataloaders from
torchvision.datasetsDefine a classification neural network (LinearReLU stack)
Train and test the float model (5 epochs)
Wrap with
TINPUTinyMLQATFxModulefor quantization-aware trainingTrain and test the quantized model
Convert from PyTorch QDQ layers to TI NPU int8 layers
Rename input node to
'input'for inference compatibilityExport as
fmnist_int8.onnx
Key Code:
from tinyml_torchmodelopt.quantization import TINPUTinyMLQATFxModule
# After float training:
ti_model = TINPUTinyMLQATFxModule(model, total_epochs=5)
# QAT training loop (same loss/optimizer as float)
for epoch in range(5):
for images, targets in train_loader:
output = ti_model(images)
loss = criterion(output, targets)
loss.backward()
optimizer.step()
# Convert and export
ti_model.eval()
ti_model = ti_model.convert()
ti_model.export(dummy_input, 'fmnist_int8.onnx', input_names=['input'])
Run:
cd tinyml-modeloptimization/torchmodelopt/examples/fmnist_image_classification
python fmnist_tinpu_qat.py
8.3.3. Audio Keyword Spotting
An advanced example based on the MLPerf Tiny keyword spotting benchmark. Uses a Depthwise Separable Convolutional Neural Network (DSCNN) to identify 10 keywords from the Google Speech Commands v2 dataset.
Model: DSCNN (2D conv + 4 depthwise separable conv blocks + global average pooling + FC layer).
Dataset: Modified Speech Commands v2 with 12 classes (10 keywords + “unknown” + “silence”). Training/validation/test split: 80%/10%/10%.
Features Demonstrated:
QAT and PTQ workflows
Mixed precision quantization
Bias calibration (
bias_calibration_factor)Cosine annealing LR scheduler
ONNX export for NPU deployment
Workflow:
Download and prepare Speech Commands v2 dataset
Build DSCNN model with batch normalization and dropout
Train float model (10 epochs with cosine LR schedule)
Wrap with
TINPUTinyMLQuantFxModulePerform QAT or PTQ calibration
Export as
quant_kws.onnx
Run:
cd tinyml-modeloptimization/torchmodelopt/examples/audio_keyword_spotting
python main.py
8.3.4. Motor Fault Time Series Classification
Demonstrates quantization for time series classification using motor vibration sensor data. The CNN model classifies fault conditions from 3-axis accelerometer readings.
Model: Small CNN for vibration data classification.
Data Format: CSV with columns Vibx, Viby, Vibz, Target.
Rows are segmented into sliding windows controlled by WINDOW_LENGTH
and WINDOW_OFFSET.
Features Demonstrated:
Both TINPU and Generic quantization device types
QAT and PTQ workflows
2-bit, 4-bit, and 8-bit weight quantization
Confusion matrix evaluation
ONNX Runtime inference validation
Configuration (edit constants in script):
QUANTIZATION_METHOD = 'QAT' # or 'PTQ'
WEIGHT_BITWIDTH = 8 # 2, 4, or 8
ACTIVATION_BITWIDTH = 8 # typically 8
QUANTIZATION_DEVICE_TYPE = 'TINPU' # or 'GENERIC'
WINDOW_LENGTH = 256
WINDOW_OFFSET = 64
Script Structure:
Function |
Purpose |
|---|---|
|
Load CSV data and create sliding windows |
|
Create the CNN model |
|
Float and QAT training loop |
|
PTQ calibration pass |
|
Select TINPU/Generic and QAT/PTQ wrapper |
|
Export to ONNX format |
|
ONNX Runtime inference validation |
Run:
cd tinyml-modeloptimization/torchmodelopt/examples/motor_fault_time_series_classification
python motor_fault_classification_tinpu_quant.py
8.3.5. MNIST Digit Classification
Uses the classic LeNet-5 architecture on MNIST, demonstrating a complete TinyML workflow from training to quantization and ONNX export.
Model: LeNet-5 CNN:
Conv1: 8 filters, 3x3, BatchNorm + ReLU + MaxPool
Conv2: 16 filters, 3x3, BatchNorm + ReLU + MaxPool
FC1: 400 → 120
FC2: 120 → 84
FC3: 84 → 10
Features Demonstrated:
QAT and PTQ workflows
4-bit and 8-bit quantization
Cosine annealing LR scheduler
Float and quantized ONNX export
Complete 14-epoch training pipeline
Workflow:
Download MNIST (28x28 grayscale, 60K train + 10K test)
Normalize with mean=0.1307, std=0.3081
Train LeNet-5 for 14 epochs (SGD + CosineAnnealingLR)
Wrap with
TINPUTinyMLQuantFxModulePerform QAT or PTQ
Validate quantized model accuracy
Export both float and quantized ONNX models
Run:
cd tinyml-modeloptimization/torchmodelopt/examples/mnist_lenet5_classification
python main.py
8.3.6. Torque Time Series Regression
Demonstrates quantization for a regression task (continuous value prediction) using sensor time-series data from motor torque measurements.
Model: Small CNN for torque prediction.
Data: CSV dataset with sensor columns and a torque target column.
Available from TI’s public dataset server. Data is segmented into
sliding windows.
Features Demonstrated:
QAT quantization for regression models
R2 and SMAPE evaluation metrics
TINPU and Generic device types
ONNX Runtime inference validation
Configuration (edit constants in script):
QUANTIZATION_METHOD = 'QAT'
WEIGHT_BITWIDTH = 8
ACTIVATION_BITWIDTH = 8
QUANTIZATION_DEVICE_TYPE = 'TINPU'
Run:
cd tinyml-modeloptimization/torchmodelopt/examples/torque_time_series_regression
python torque_regression_tinpu_quant.py
8.3.7. Quantization Guidance
These best practices apply when using the wrappers directly:
Choosing QAT vs PTQ:
PTQ (safe default): Use
QUANTIZATION_METHOD = 'PTQ'with 8-bit weights and activations. Fast, requires only a calibration pass.QAT (better accuracy): Switch to QAT if PTQ accuracy degrades, especially for sub-8-bit quantization. Use a smaller learning rate and more epochs.
Sub-8-bit Quantization:
For 4-bit or 2-bit weights: prefer QAT with per-channel weight quantization
Careful tuning of calibration, clipping, and bias correction is important
TINPU prefers symmetric per-channel weight quantization with power-of-two scales
PTQ Calibration:
Use representative inputs (hundreds to a few thousand samples)
Poor calibration causes large activation quantization errors
Ensure calibration data covers the full input distribution
ONNX Evaluation:
Always validate the exported ONNX model before deploying to device:
import onnxruntime as ort
session = ort.InferenceSession('model_int8.onnx')
prediction = session.run(None, {'input': test_input.numpy()})
8.3.8. Next Steps
Quantization - Quantization via the YAML-based toolchain
Neural Architecture Search - Automatic model architecture search
NPU Device Deployment - Deploy quantized models to NPU devices