8.5. Post-Training Analysis
After training, Tiny ML Tensorlab provides comprehensive analysis tools to evaluate model performance and understand its behavior.
8.5.1. Overview
Post-training analysis helps you:
Evaluate model accuracy and error patterns
Understand which classes are confused
Select optimal operating thresholds
Verify quantization impact
Generate reports for stakeholders
8.5.2. Enabling Analysis
Analysis is enabled through the testing section:
testing:
enable: True
analysis:
confusion_matrix: True
roc_curve: True
class_histograms: True
error_analysis: True
8.5.3. Output Files
After testing, you’ll find analysis outputs:
.../testing/
├── confusion_matrix_test.png # Confusion matrix
├── One_vs_Rest_MultiClass_ROC_test.png # ROC curves
├── Histogram_Class_Score_differences_test.png # Score distributions
├── fpr_tpr_thresholds.csv # Threshold analysis
├── classification_report.txt # Per-class metrics
├── error_samples/ # Misclassified examples
│ ├── error_001.csv
│ └── ...
└── test_results.json # Summary statistics
8.5.4. Confusion Matrix
Shows classification results in matrix form:
Predicted
A B C
Actual A 95 3 2
B 1 97 2
C 2 1 97
Interpreting:
Diagonal = correct predictions
Off-diagonal = misclassifications
Rows sum to actual class counts
Columns show predicted distribution
Good matrix:
Strong diagonal (high values)
Weak off-diagonal (low values)
Problem indicators:
High off-diagonal values = specific class confusion
Asymmetric confusion = direction-specific errors
8.5.5. ROC Curves
Receiver Operating Characteristic shows trade-off between:
True Positive Rate (sensitivity)
False Positive Rate (1 - specificity)
Example ROC Curves:
One-vs-Rest Multi-class ROC curves for arc fault detection
The ROC curve shows the trade-off between sensitivity and specificity at different thresholds.
TPR (Sensitivity)
1.0 | ******
| **
| **
0.5 | **
| **
0.0 +*-------------- FPR
0.0 0.5 1.0
Key Metrics:
AUC (Area Under Curve): 1.0 = perfect, 0.5 = random
Operating Point: Where you set the threshold
Multi-Class ROC:
For multi-class problems, one-vs-rest ROC shows each class:
Class A: AUC = 0.98
Class B: AUC = 0.95
Class C: AUC = 0.99
8.5.6. Class Score Histograms
Shows distribution of model confidence for each class:
Example Class Score Histogram:
Distribution of class score differences showing model confidence
Correct predictions: [=====|=====] centered at high score
Wrong predictions: [==|==] centered at low score
Interpretation:
Well-separated histograms: Model is confident and correct
Overlapping histograms: Model is uncertain
Wrong predictions at high scores: Confident mistakes (investigate)
8.5.7. FPR/TPR Thresholds
CSV file for threshold selection:
threshold,tpr,fpr,precision,recall,f1
0.1,0.99,0.15,0.87,0.99,0.93
0.3,0.97,0.08,0.92,0.97,0.94
0.5,0.95,0.03,0.97,0.95,0.96
0.7,0.90,0.01,0.99,0.90,0.94
0.9,0.80,0.00,1.00,0.80,0.89
Using this data:
Choose your priority (minimize FPR or maximize TPR)
Find the threshold that meets your requirement
Use that threshold in deployment code
8.5.8. Classification Report
Per-class performance metrics:
Class Precision Recall F1-Score Support
Normal 0.98 0.96 0.97 500
Fault_A 0.95 0.97 0.96 480
Fault_B 0.97 0.95 0.96 520
Accuracy: 0.96
Macro Avg: 0.97 0.96 0.96 1500
Weighted Avg: 0.96 0.96 0.96 1500
Metrics explained:
Precision: Of predicted positives, how many are correct?
Recall: Of actual positives, how many were detected?
F1-Score: Harmonic mean of precision and recall
Support: Number of samples per class
8.5.9. Error Analysis
Detailed examination of misclassified samples:
testing:
error_analysis:
save_errors: True
max_errors_per_class: 20
Error sample files:
Each saved error includes:
Original input data
True label
Predicted label
Model confidence scores
Using error analysis:
Identify patterns in errors
Check for labeling mistakes
Find data collection issues
Improve feature extraction
8.5.10. Quantized vs Float Comparison
Compare quantized model to float baseline:
testing:
enable: True
test_float: True
test_quantized: True
compare_results: True
Output:
Float32 Model:
Accuracy: 99.2%
F1-Score: 0.992
INT8 Quantized Model:
Accuracy: 98.8%
F1-Score: 0.988
Degradation: 0.4%
8.5.11. Regression Analysis
For regression tasks, different metrics apply:
testing:
enable: True
regression_metrics:
mse: True
mae: True
r2: True
scatter_plot: True
Output:
Mean Squared Error (MSE): 0.023
Mean Absolute Error (MAE): 0.12
R² Score: 0.95
Max Error: 0.45
8.5.12. Anomaly Detection Analysis
For anomaly detection:
testing:
enable: True
anomaly_metrics:
reconstruction_error: True
threshold_analysis: True
Output:
Normal Data:
Mean reconstruction error: 0.05
Std reconstruction error: 0.02
Anomaly Data:
Mean reconstruction error: 0.35
Std reconstruction error: 0.15
Recommended threshold: 0.15
At threshold 0.15:
TPR: 0.92
FPR: 0.05
8.5.13. Custom Analysis Scripts
For advanced analysis, use the saved model and data:
import torch
import numpy as np
# Load model
model = torch.load('path/to/best_model.pt')
model.eval()
# Load test data
test_data = np.load('path/to/test_data.npy')
test_labels = np.load('path/to/test_labels.npy')
# Run inference
with torch.no_grad():
outputs = model(torch.tensor(test_data))
predictions = outputs.argmax(dim=1)
# Custom analysis
# ... your analysis code
8.5.14. Generating Reports
For documentation or stakeholder communication:
testing:
enable: True
generate_report: True
report_format: 'pdf' # or 'html', 'markdown'
Report includes:
Model summary (architecture, parameters)
Training curves
Test metrics
Confusion matrix
ROC curves
Recommendations
8.5.15. Example: Complete Analysis Configuration
common:
task_type: 'generic_timeseries_classification'
target_device: 'F28P55'
dataset:
dataset_name: 'dc_arc_fault_example_dsk'
data_processing_feature_extraction:
feature_extraction_name: 'FFT1024Input_256Feature_1Frame_Full_Bandwidth'
variables: 1
training:
model_name: 'ArcFault_model_400_t'
training_epochs: 30
quantization: 2
quantization_method: 'QAT'
quantization_weight_bitwidth: 8
quantization_activation_bitwidth: 8
testing:
enable: True
test_float: True
test_quantized: True
analysis:
confusion_matrix: True
roc_curve: True
class_histograms: True
error_analysis: True
save_errors: True
max_errors_per_class: 10
compare_results: True
8.5.16. Best Practices
Always review confusion matrix: Understand error patterns
Check ROC curves: Ensure good class separation
Analyze errors: Learn from misclassifications
Compare quantized: Verify acceptable accuracy drop
Document findings: Record analysis for future reference
8.5.17. Troubleshooting Low Accuracy
If overall accuracy is low:
Check GoF test results (dataset quality)
Try larger model
Increase training epochs
Improve feature extraction
If specific classes have low accuracy:
Check class balance
Investigate error samples
May need more data for those classes
Classes might be inherently similar
If quantized accuracy drops significantly:
Try QAT instead of PTQ
Use more calibration data
Keep sensitive layers at higher precision
Use larger model (more robust to quantization)
8.5.18. Next Steps
Deploy model: CCS Integration Guide
Optimize further: Neural Architecture Search
Review Common Errors if issues arise