8.5. Goodness of Fit

The Goodness of Fit (GoF) test helps you analyze dataset quality and class separability before investing time in model training.

8.5.1. Overview

GoF testing answers:

  • Are my classes separable in feature space?

  • Is my feature extraction appropriate?

  • Will a neural network be able to learn the patterns?

  • Which classes might be confused?

Running GoF tests before training saves time by identifying data or feature extraction problems early.

8.5.2. Enabling GoF Test

Add the GoF section to your configuration:

common:
  task_type: 'generic_timeseries_classification'
  target_device: 'F28P55'

dataset:
  dataset_name: 'your_dataset'

data_processing_feature_extraction:
  feature_extraction_name: 'Generic_1024Input_FFTBIN_64Feature_8Frame'
  gof_test: True
  frame_size: 256

training:
  enable: True  # Can set to False for GoF-only analysis

8.5.3. Running the Test

cd tinyml-modelzoo
./run_tinyml_modelzoo.sh examples/your_example/config.yaml

8.5.4. Output Files

GoF test generates analysis files:

.../gof_test/
├── gof_pca_2d.png           # PCA visualization
├── gof_tsne_2d.png          # t-SNE visualization
├── gof_lda_2d.png           # LDA visualization
├── class_separability.csv   # Quantitative metrics
├── confusion_potential.csv  # Likely confusion pairs
└── feature_importance.csv   # Important features

8.5.5. Understanding the Visualizations

PCA Plot (gof_pca_2d.png)

Principal Component Analysis projection:

Example GoF Plots:

GoF Plot - Arc Fault 256

GoF analysis for arc fault detection with 256 frame size

GoF Plot - Arc Fault 1024

GoF analysis for arc fault detection with 1024 frame size

GoF Plot - Motor Bearing Fault

GoF analysis for motor bearing fault detection

PC2
 ^
 |    * * *        Class A
 |  * * * *
 |
 |            + + +     Class B
 |          + + + +
 +-------------------> PC1
  • Well-separated clusters = Good separability

  • Overlapping clusters = Potential confusion

  • Scattered points = High variance, harder to classify

t-SNE Plot (gof_tsne_2d.png)

Non-linear dimensionality reduction:

  • Better at revealing complex cluster structures

  • Preserves local neighborhoods

  • May show separability that PCA misses

LDA Plot (gof_lda_2d.png)

Linear Discriminant Analysis:

  • Maximizes class separation

  • Shows best linear separation achievable

  • Most relevant for linear-like classifiers

8.5.6. Interpreting Results

Class Separability Score:

class_separability.csv:
class_pair,separability_score,overlap_percentage
A-B,0.95,2.3%
A-C,0.82,8.5%
B-C,0.99,0.1%
  • Score > 0.9: Excellent separability

  • Score 0.7-0.9: Good separability

  • Score 0.5-0.7: Moderate (may need better features)

  • Score < 0.5: Poor (investigate data or features)

Confusion Potential:

confusion_potential.csv:
class_1,class_2,potential_confusion
A,C,high
B,D,low

Identifies which classes are most likely to be confused.

8.5.7. 8-Plot Analysis

GoF generates 8 different visualizations by combining three processing stages: 2 Transforms x 2 Scalings x 2 Dimensionality Reductions = 8 plots.

Transforms

  • FFT + Abs + Log: Converts the time-series data from the time domain into the frequency domain using a Fast Fourier Transform. Only the positive half of the symmetric FFT output is retained. The absolute value is taken, followed by a log transform to compress large magnitudes.

  • Wavelet Transform (WT): Analyzes the data in both the time and frequency domains simultaneously. This makes it especially effective at capturing localized events in time-series data, such as sudden spikes or anomalies.

Scalings

  • Standard Scaler (Z-score normalization): Standardizes the data by subtracting the mean and dividing by the standard deviation, producing features with mean=0 and standard deviation=1.

  • Min-Max Scaler: Scales each feature to the [0, 1] range. Useful when you want to preserve the relative distances between data points.

Dimensionality Reduction

  • PCA (Principal Component Analysis): A linear method that projects the data into fewer dimensions while preserving as much variance as possible.

  • t-SNE (t-Distributed Stochastic Neighbor Embedding): A non-linear method that maps high-dimensional data into 2D by preserving local neighborhood relationships. Especially good at revealing cluster structures.

The 8 Combinations

Plot 1: FFT+Abs+Log  +  Standard Scaler  +  PCA
Plot 2: FFT+Abs+Log  +  Standard Scaler  +  t-SNE
Plot 3: FFT+Abs+Log  +  MinMax Scaler    +  PCA
Plot 4: FFT+Abs+Log  +  MinMax Scaler    +  t-SNE
Plot 5: WT           +  Standard Scaler  +  PCA
Plot 6: WT           +  Standard Scaler  +  t-SNE
Plot 7: WT           +  MinMax Scaler    +  PCA
Plot 8: WT           +  MinMax Scaler    +  t-SNE

Important

Not all 8 plots need to show separable clusters. Each plot represents a different method of analyzing the time-series data. If any one of the 8 plots shows separable clusters, it is a strong sign that the dataset is suitable for classification.

8.5.8. Common Patterns

Good Dataset:

- Tight, well-separated clusters
- Consistent within-class variance
- Clear boundaries between classes

Problematic Dataset:

- Overlapping clusters
- Outliers far from clusters
- One class scattered, others tight

Feature Extraction Issue:

- All classes overlap completely
- No structure visible
- Random-looking scatter

8.5.9. Actionable Insights

If classes overlap:

  1. Try different feature extraction:

    data_processing_feature_extraction:
      # Try FFT instead of raw
      feature_extraction_name: 'Generic_1024Input_FFTBIN_64Feature_8Frame'
    
  2. Increase feature count:

    data_processing_feature_extraction:
      feature_extraction_name: 'Generic_512Input_RAW_512Feature_1Frame'
    
  3. Review data labeling for errors

If one class is scattered:

  1. Check for mislabeled samples

  2. Consider splitting into sub-classes

  3. Need more training data for that class

If all classes overlap:

  1. Feature extraction may be inappropriate

  2. Data might not contain discriminative information

  3. Consider domain expertise for better features

8.5.10. Frame Size Sweeping

Sometimes the default frame_size does not capture enough of the signal to produce meaningful GoF plots. In such cases, sweeping across multiple frame sizes can reveal the right setting.

Arc Fault Classification Example

The Arc Fault dataset (two classes: Arc and Normal) demonstrates this clearly. Starting with a small frame size and progressively increasing it:

frame_size=256   --> Poor: clusters lack purity, no class separation
frame_size=512   --> Still poor: significant overlap persists
frame_size=1024  --> Improving: WT-based plots (5-8) begin showing less overlap
frame_size=2048  --> Better: continued improvement in WT plots
frame_size=4096  --> Good: clear, well-separated clusters visible in
                     Plot 7 (WT + MinMax Scaler + PCA)

Why Larger Frame Sizes Help

The Arc Fault dataset has a high sampling frequency, meaning many data points are recorded per second. With a small frame size, each frame captures only a tiny slice of the signal and misses the broader pattern. Increasing the frame size allows more data points per frame, revealing the true structure of the data.

Recommendation: Start with a frame_size that matches the intended model input size. If plots are inconclusive, try 2x and 4x larger frame sizes. The frame size that produces the clearest separation in the GoF plots is a good indicator of the minimum signal length needed for reliable classification.

# Sweep example: run GoF at multiple frame sizes
data_processing_feature_extraction:
  gof_test: True
  frame_size: 1024   # Try 256, 512, 1024, 2048, 4096

8.5.11. Multi-Cluster Analysis

When a single class appears as multiple separate clusters in the GoF plots, it does not necessarily mean the data is bad. Multiple clusters per class can arise from real variations within the data collection process.

Motor Fault Four-Class Example: Sampling Frequency

Running GoF on the Motor Fault Four-Class Dataset (frame_size=256) revealed that each class formed roughly 10 separate clusters. Investigation showed that the dataset contained samples collected at 10 different sampling frequencies (10 Hz to 100 Hz). Filtering the dataset to a single frequency (40 Hz) reduced the cluster count from 10 per class to 4 clusters – one per actual class. The sampling frequency variation had introduced unwanted structure.

Motor Fault Six-Class Example: Equipment Variation

The Motor Fault Six-Class Dataset showed a similar multi-cluster pattern. However, filtering to a single sampling frequency (40 Hz) still left roughly 3 clusters per class. A deeper look revealed that the data had been collected from 3 different motors. Filtering to a single motor finally produced exactly 6 clusters, matching the actual number of classes.

Common Causes of Multi-Cluster Patterns

  • Different sampling frequencies in the dataset

  • Different environmental conditions during data collection

  • Different equipment or sensors used across collection sessions

  • Varying operational states within the same nominal class

What To Do

  1. Examine metadata (frequency, sensor ID, conditions) for systematic variation.

  2. Filter or stratify the data by the suspected variable and re-run GoF.

  3. If filtering resolves the multi-cluster pattern, consider whether the model should be trained on the full mixed dataset or on a controlled subset.

8.5.12. Example: Motor Fault GoF Analysis

common:
  task_type: 'generic_timeseries_classification'
  target_device: 'F28P55'

dataset:
  dataset_name: 'motor_fault_classification_dsk'

data_processing_feature_extraction:
  feature_extraction_name: 'Input256_FFTBIN_16Feature_8Frame_3InputChannel_removeDC_2D1'
  variables: 3
  gof_test: True
  frame_size: 256

training:
  enable: False  # GoF only, skip training

Expected Good Results:

6 fault classes showing clear separation:
- Normal: tight cluster, well separated
- Contaminated: distinct from normal
- Erosion: some overlap with flaking (similar faults)
- Flaking: some overlap with erosion
- No Lubrication: well separated
- Localized Fault: distinct signature

8.5.13. GoF Without Training

Run GoF analysis only (no model training):

data_processing_feature_extraction:
  gof_test: True

training:
  enable: False

testing:
  enable: False

compilation:
  enable: False

This is useful for:

  • Rapid dataset evaluation

  • Feature extraction comparison

  • Data quality assessment

8.5.14. Comparing Feature Extraction

Run GoF with different feature extraction to compare:

Configuration 1:

data_processing_feature_extraction:
  feature_extraction_name: 'Generic_1024Input_FFTBIN_64Feature_8Frame'
  gof_test: True

Configuration 2:

data_processing_feature_extraction:
  feature_extraction_name: 'Generic_512Input_RAW_512Feature_1Frame'
  gof_test: True

Compare the visualizations to see which gives better separability.

8.5.15. Best Practices

  1. Always run GoF first: Before long training runs

  2. Compare multiple feature extractions: Find the best approach

  3. Investigate overlapping classes: May need more/different data

  4. Use domain knowledge: Understand why classes separate (or don’t)

  5. Document findings: GoF results inform model expectations

8.5.16. Limitations

  • GoF is a linear analysis; neural networks can learn non-linear boundaries

  • Good GoF doesn’t guarantee good model accuracy

  • Poor GoF may still yield acceptable models with enough complexity

  • 2D projections can hide separability in higher dimensions

Use GoF as a guide, not a definitive answer.

8.5.17. Next Steps