8.4. Goodness of Fit

The Goodness of Fit (GoF) test helps you analyze dataset quality and class separability before investing time in model training.

8.4.1. Overview

GoF testing answers:

  • Are my classes separable in feature space?

  • Is my feature extraction appropriate?

  • Will a neural network be able to learn the patterns?

  • Which classes might be confused?

Running GoF tests before training saves time by identifying data or feature extraction problems early.

8.4.2. Enabling GoF Test

Add the GoF section to your configuration:

common:
  task_type: 'generic_timeseries_classification'
  target_device: 'F28P55'

dataset:
  dataset_name: 'your_dataset'

data_processing_feature_extraction:
  feature_extraction_name: 'Generic_1024Input_FFTBIN_64Feature_8Frame'
  gof_test: True
  frame_size: 256

training:
  enable: True  # Can set to False for GoF-only analysis

8.4.3. Running the Test

cd tinyml-modelzoo
./run_tinyml_modelzoo.sh examples/your_example/config.yaml

8.4.4. Output Files

GoF test generates analysis files:

.../gof_test/
├── gof_pca_2d.png           # PCA visualization
├── gof_tsne_2d.png          # t-SNE visualization
├── gof_lda_2d.png           # LDA visualization
├── class_separability.csv   # Quantitative metrics
├── confusion_potential.csv  # Likely confusion pairs
└── feature_importance.csv   # Important features

8.4.5. Understanding the Visualizations

PCA Plot (gof_pca_2d.png)

Principal Component Analysis projection:

Example GoF Plots:

GoF Plot - Arc Fault 256

GoF analysis for arc fault detection with 256 frame size

GoF Plot - Arc Fault 1024

GoF analysis for arc fault detection with 1024 frame size

GoF Plot - Motor Bearing Fault

GoF analysis for motor bearing fault detection

PC2
 ^
 |    * * *        Class A
 |  * * * *
 |
 |            + + +     Class B
 |          + + + +
 +-------------------> PC1
  • Well-separated clusters = Good separability

  • Overlapping clusters = Potential confusion

  • Scattered points = High variance, harder to classify

t-SNE Plot (gof_tsne_2d.png)

Non-linear dimensionality reduction:

  • Better at revealing complex cluster structures

  • Preserves local neighborhoods

  • May show separability that PCA misses

LDA Plot (gof_lda_2d.png)

Linear Discriminant Analysis:

  • Maximizes class separation

  • Shows best linear separation achievable

  • Most relevant for linear-like classifiers

8.4.6. Interpreting Results

Class Separability Score:

class_separability.csv:
class_pair,separability_score,overlap_percentage
A-B,0.95,2.3%
A-C,0.82,8.5%
B-C,0.99,0.1%
  • Score > 0.9: Excellent separability

  • Score 0.7-0.9: Good separability

  • Score 0.5-0.7: Moderate (may need better features)

  • Score < 0.5: Poor (investigate data or features)

Confusion Potential:

confusion_potential.csv:
class_1,class_2,potential_confusion
A,C,high
B,D,low

Identifies which classes are most likely to be confused.

8.4.7. 8-Plot Analysis

GoF generates 8 different visualizations using combinations of:

  • Transforms: PCA, LDA, t-SNE

  • Scalings: Standard, MinMax

  • Feature sets: All features, top features

Plot 1: PCA + Standard scaling + All features
Plot 2: PCA + MinMax scaling + All features
Plot 3: LDA + Standard scaling + All features
Plot 4: LDA + MinMax scaling + All features
Plot 5: t-SNE + Standard scaling + All features
Plot 6: t-SNE + MinMax scaling + All features
Plot 7: PCA + Standard scaling + Top 50 features
Plot 8: LDA + Standard scaling + Top 50 features

Examining all 8 helps identify the best analysis approach.

8.4.8. Common Patterns

Good Dataset:

- Tight, well-separated clusters
- Consistent within-class variance
- Clear boundaries between classes

Problematic Dataset:

- Overlapping clusters
- Outliers far from clusters
- One class scattered, others tight

Feature Extraction Issue:

- All classes overlap completely
- No structure visible
- Random-looking scatter

8.4.9. Actionable Insights

If classes overlap:

  1. Try different feature extraction:

    data_processing_feature_extraction:
      # Try FFT instead of raw
      feature_extraction_name: 'Generic_1024Input_FFTBIN_64Feature_8Frame'
    
  2. Increase feature count:

    data_processing_feature_extraction:
      feature_extraction_name: 'Generic_512Input_RAW_512Feature_1Frame'
    
  3. Review data labeling for errors

If one class is scattered:

  1. Check for mislabeled samples

  2. Consider splitting into sub-classes

  3. Need more training data for that class

If all classes overlap:

  1. Feature extraction may be inappropriate

  2. Data might not contain discriminative information

  3. Consider domain expertise for better features

8.4.10. Example: Motor Fault GoF Analysis

common:
  task_type: 'generic_timeseries_classification'
  target_device: 'F28P55'

dataset:
  dataset_name: 'motor_fault_classification_dsk'

data_processing_feature_extraction:
  feature_extraction_name: 'Input256_FFTBIN_16Feature_8Frame_3InputChannel_removeDC_2D1'
  variables: 3
  gof_test: True
  frame_size: 256

training:
  enable: False  # GoF only, skip training

Expected Good Results:

6 fault classes showing clear separation:
- Normal: tight cluster, well separated
- Contaminated: distinct from normal
- Erosion: some overlap with flaking (similar faults)
- Flaking: some overlap with erosion
- No Lubrication: well separated
- Localized Fault: distinct signature

8.4.11. GoF Without Training

Run GoF analysis only (no model training):

data_processing_feature_extraction:
  gof_test: True

training:
  enable: False

testing:
  enable: False

compilation:
  enable: False

This is useful for:

  • Rapid dataset evaluation

  • Feature extraction comparison

  • Data quality assessment

8.4.12. Comparing Feature Extraction

Run GoF with different feature extraction to compare:

Configuration 1:

data_processing_feature_extraction:
  feature_extraction_name: 'Generic_1024Input_FFTBIN_64Feature_8Frame'
  gof_test: True

Configuration 2:

data_processing_feature_extraction:
  feature_extraction_name: 'Generic_512Input_RAW_512Feature_1Frame'
  gof_test: True

Compare the visualizations to see which gives better separability.

8.4.13. Best Practices

  1. Always run GoF first: Before long training runs

  2. Compare multiple feature extractions: Find the best approach

  3. Investigate overlapping classes: May need more/different data

  4. Use domain knowledge: Understand why classes separate (or don’t)

  5. Document findings: GoF results inform model expectations

8.4.14. Limitations

  • GoF is a linear analysis; neural networks can learn non-linear boundaries

  • Good GoF doesn’t guarantee good model accuracy

  • Poor GoF may still yield acceptable models with enough complexity

  • 2D projections can hide separability in higher dimensions

Use GoF as a guide, not a definitive answer.

8.4.15. Next Steps