8.4. Goodness of Fit
The Goodness of Fit (GoF) test helps you analyze dataset quality and class separability before investing time in model training.
8.4.1. Overview
GoF testing answers:
Are my classes separable in feature space?
Is my feature extraction appropriate?
Will a neural network be able to learn the patterns?
Which classes might be confused?
Running GoF tests before training saves time by identifying data or feature extraction problems early.
8.4.2. Enabling GoF Test
Add the GoF section to your configuration:
common:
task_type: 'generic_timeseries_classification'
target_device: 'F28P55'
dataset:
dataset_name: 'your_dataset'
data_processing_feature_extraction:
feature_extraction_name: 'Generic_1024Input_FFTBIN_64Feature_8Frame'
gof_test: True
frame_size: 256
training:
enable: True # Can set to False for GoF-only analysis
8.4.3. Running the Test
cd tinyml-modelzoo
./run_tinyml_modelzoo.sh examples/your_example/config.yaml
cd tinyml-modelzoo
run_tinyml_modelzoo.bat examples\\your_example\\config.yaml
8.4.4. Output Files
GoF test generates analysis files:
.../gof_test/
├── gof_pca_2d.png # PCA visualization
├── gof_tsne_2d.png # t-SNE visualization
├── gof_lda_2d.png # LDA visualization
├── class_separability.csv # Quantitative metrics
├── confusion_potential.csv # Likely confusion pairs
└── feature_importance.csv # Important features
8.4.5. Understanding the Visualizations
PCA Plot (gof_pca_2d.png)
Principal Component Analysis projection:
Example GoF Plots:
GoF analysis for arc fault detection with 256 frame size
GoF analysis for arc fault detection with 1024 frame size
GoF analysis for motor bearing fault detection
PC2
^
| * * * Class A
| * * * *
|
| + + + Class B
| + + + +
+-------------------> PC1
Well-separated clusters = Good separability
Overlapping clusters = Potential confusion
Scattered points = High variance, harder to classify
t-SNE Plot (gof_tsne_2d.png)
Non-linear dimensionality reduction:
Better at revealing complex cluster structures
Preserves local neighborhoods
May show separability that PCA misses
LDA Plot (gof_lda_2d.png)
Linear Discriminant Analysis:
Maximizes class separation
Shows best linear separation achievable
Most relevant for linear-like classifiers
8.4.6. Interpreting Results
Class Separability Score:
class_separability.csv:
class_pair,separability_score,overlap_percentage
A-B,0.95,2.3%
A-C,0.82,8.5%
B-C,0.99,0.1%
Score > 0.9: Excellent separability
Score 0.7-0.9: Good separability
Score 0.5-0.7: Moderate (may need better features)
Score < 0.5: Poor (investigate data or features)
Confusion Potential:
confusion_potential.csv:
class_1,class_2,potential_confusion
A,C,high
B,D,low
Identifies which classes are most likely to be confused.
8.4.7. 8-Plot Analysis
GoF generates 8 different visualizations using combinations of:
Transforms: PCA, LDA, t-SNE
Scalings: Standard, MinMax
Feature sets: All features, top features
Plot 1: PCA + Standard scaling + All features
Plot 2: PCA + MinMax scaling + All features
Plot 3: LDA + Standard scaling + All features
Plot 4: LDA + MinMax scaling + All features
Plot 5: t-SNE + Standard scaling + All features
Plot 6: t-SNE + MinMax scaling + All features
Plot 7: PCA + Standard scaling + Top 50 features
Plot 8: LDA + Standard scaling + Top 50 features
Examining all 8 helps identify the best analysis approach.
8.4.8. Common Patterns
Good Dataset:
- Tight, well-separated clusters
- Consistent within-class variance
- Clear boundaries between classes
Problematic Dataset:
- Overlapping clusters
- Outliers far from clusters
- One class scattered, others tight
Feature Extraction Issue:
- All classes overlap completely
- No structure visible
- Random-looking scatter
8.4.9. Actionable Insights
If classes overlap:
Try different feature extraction:
data_processing_feature_extraction: # Try FFT instead of raw feature_extraction_name: 'Generic_1024Input_FFTBIN_64Feature_8Frame'
Increase feature count:
data_processing_feature_extraction: feature_extraction_name: 'Generic_512Input_RAW_512Feature_1Frame'
Review data labeling for errors
If one class is scattered:
Check for mislabeled samples
Consider splitting into sub-classes
Need more training data for that class
If all classes overlap:
Feature extraction may be inappropriate
Data might not contain discriminative information
Consider domain expertise for better features
8.4.10. Example: Motor Fault GoF Analysis
common:
task_type: 'generic_timeseries_classification'
target_device: 'F28P55'
dataset:
dataset_name: 'motor_fault_classification_dsk'
data_processing_feature_extraction:
feature_extraction_name: 'Input256_FFTBIN_16Feature_8Frame_3InputChannel_removeDC_2D1'
variables: 3
gof_test: True
frame_size: 256
training:
enable: False # GoF only, skip training
Expected Good Results:
6 fault classes showing clear separation:
- Normal: tight cluster, well separated
- Contaminated: distinct from normal
- Erosion: some overlap with flaking (similar faults)
- Flaking: some overlap with erosion
- No Lubrication: well separated
- Localized Fault: distinct signature
8.4.11. GoF Without Training
Run GoF analysis only (no model training):
data_processing_feature_extraction:
gof_test: True
training:
enable: False
testing:
enable: False
compilation:
enable: False
This is useful for:
Rapid dataset evaluation
Feature extraction comparison
Data quality assessment
8.4.12. Comparing Feature Extraction
Run GoF with different feature extraction to compare:
Configuration 1:
data_processing_feature_extraction:
feature_extraction_name: 'Generic_1024Input_FFTBIN_64Feature_8Frame'
gof_test: True
Configuration 2:
data_processing_feature_extraction:
feature_extraction_name: 'Generic_512Input_RAW_512Feature_1Frame'
gof_test: True
Compare the visualizations to see which gives better separability.
8.4.13. Best Practices
Always run GoF first: Before long training runs
Compare multiple feature extractions: Find the best approach
Investigate overlapping classes: May need more/different data
Use domain knowledge: Understand why classes separate (or don’t)
Document findings: GoF results inform model expectations
8.4.14. Limitations
GoF is a linear analysis; neural networks can learn non-linear boundaries
Good GoF doesn’t guarantee good model accuracy
Poor GoF may still yield acceptable models with enough complexity
2D projections can hide separability in higher dimensions
Use GoF as a guide, not a definitive answer.
8.4.15. Next Steps
Learn about Feature Extraction options
See Post-Training Analysis for model evaluation
Proceed to training if GoF looks good