8.5. Goodness of Fit

The Goodness of Fit (GoF) test helps you analyze dataset quality and class separability before investing time in model training.

8.5.1. Overview

GoF testing answers:

Are my classes separable in feature space?
Is my feature extraction appropriate?
Will a neural network be able to learn the patterns?
Which classes might be confused?

Running GoF tests before training saves time by identifying data or feature extraction problems early.

8.5.2. Enabling GoF Test

Add the GoF section to your configuration:

common:
  task_type: 'generic_timeseries_classification'
  target_device: 'F28P55'

dataset:
  dataset_name: 'your_dataset'

data_processing_feature_extraction:
  feature_extraction_name: 'Generic_1024Input_FFTBIN_64Feature_8Frame'
  gof_test: True
  frame_size: 256

training:
  enable: True  # Can set to False for GoF-only analysis

8.5.3. Running the Test

cd tinyml-modelzoo
./run_tinyml_modelzoo.sh examples/your_example/config.yaml

cd tinyml-modelzoo
run_tinyml_modelzoo.bat examples\\your_example\\config.yaml

8.5.4. Output Files

GoF test generates analysis files:

.../gof_test/
├── gof_pca_2d.png           # PCA visualization
├── gof_tsne_2d.png          # t-SNE visualization
├── gof_lda_2d.png           # LDA visualization
├── class_separability.csv   # Quantitative metrics
├── confusion_potential.csv  # Likely confusion pairs
└── feature_importance.csv   # Important features

8.5.5. Understanding the Visualizations

PCA Plot (gof_pca_2d.png)

Principal Component Analysis projection:

Example GoF Plots:

GoF Plot - Arc Fault 256 — GoF analysis for arc fault detection with 256 frame size

GoF Plot - Arc Fault 1024 — GoF analysis for arc fault detection with 1024 frame size

GoF Plot - Motor Bearing Fault — GoF analysis for motor bearing fault detection

PC2
 ^
 |    * * *        Class A
 |  * * * *
 |
 |            + + +     Class B
 |          + + + +
 +-------------------> PC1

Well-separated clusters = Good separability
Overlapping clusters = Potential confusion
Scattered points = High variance, harder to classify

t-SNE Plot (gof_tsne_2d.png)

Non-linear dimensionality reduction:

Better at revealing complex cluster structures
Preserves local neighborhoods
May show separability that PCA misses

LDA Plot (gof_lda_2d.png)

Linear Discriminant Analysis:

Maximizes class separation
Shows best linear separation achievable
Most relevant for linear-like classifiers

8.5.6. Interpreting Results

Class Separability Score:

class_separability.csv:
class_pair,separability_score,overlap_percentage
A-B,0.95,2.3%
A-C,0.82,8.5%
B-C,0.99,0.1%

Score > 0.9: Excellent separability
Score 0.7-0.9: Good separability
Score 0.5-0.7: Moderate (may need better features)
Score < 0.5: Poor (investigate data or features)

Confusion Potential:

confusion_potential.csv:
class_1,class_2,potential_confusion
A,C,high
B,D,low

Identifies which classes are most likely to be confused.

8.5.7. 8-Plot Analysis

GoF generates 8 different visualizations by combining three processing stages: 2 Transforms x 2 Scalings x 2 Dimensionality Reductions = 8 plots.

Transforms

FFT + Abs + Log: Converts the time-series data from the time domain into the frequency domain using a Fast Fourier Transform. Only the positive half of the symmetric FFT output is retained. The absolute value is taken, followed by a log transform to compress large magnitudes.
Wavelet Transform (WT): Analyzes the data in both the time and frequency domains simultaneously. This makes it especially effective at capturing localized events in time-series data, such as sudden spikes or anomalies.

Scalings

Standard Scaler (Z-score normalization): Standardizes the data by subtracting the mean and dividing by the standard deviation, producing features with mean=0 and standard deviation=1.
Min-Max Scaler: Scales each feature to the [0, 1] range. Useful when you want to preserve the relative distances between data points.

Dimensionality Reduction

PCA (Principal Component Analysis): A linear method that projects the data into fewer dimensions while preserving as much variance as possible.
t-SNE (t-Distributed Stochastic Neighbor Embedding): A non-linear method that maps high-dimensional data into 2D by preserving local neighborhood relationships. Especially good at revealing cluster structures.

The 8 Combinations

Plot 1: FFT+Abs+Log  +  Standard Scaler  +  PCA
Plot 2: FFT+Abs+Log  +  Standard Scaler  +  t-SNE
Plot 3: FFT+Abs+Log  +  MinMax Scaler    +  PCA
Plot 4: FFT+Abs+Log  +  MinMax Scaler    +  t-SNE
Plot 5: WT           +  Standard Scaler  +  PCA
Plot 6: WT           +  Standard Scaler  +  t-SNE
Plot 7: WT           +  MinMax Scaler    +  PCA
Plot 8: WT           +  MinMax Scaler    +  t-SNE

Important

Not all 8 plots need to show separable clusters. Each plot represents a different method of analyzing the time-series data. If any one of the 8 plots shows separable clusters, it is a strong sign that the dataset is suitable for classification.

8.5.8. Common Patterns

Good Dataset:

- Tight, well-separated clusters
- Consistent within-class variance
- Clear boundaries between classes

Problematic Dataset:

- Overlapping clusters
- Outliers far from clusters
- One class scattered, others tight

Feature Extraction Issue:

- All classes overlap completely
- No structure visible
- Random-looking scatter

8.5.9. Actionable Insights

If classes overlap:

Try different feature extraction:

data_processing_feature_extraction:
  # Try FFT instead of raw
  feature_extraction_name: 'Generic_1024Input_FFTBIN_64Feature_8Frame'

Increase feature count:

data_processing_feature_extraction:
  feature_extraction_name: 'Generic_512Input_RAW_512Feature_1Frame'

Review data labeling for errors

If one class is scattered:

Check for mislabeled samples
Consider splitting into sub-classes
Need more training data for that class

If all classes overlap:

Feature extraction may be inappropriate
Data might not contain discriminative information
Consider domain expertise for better features

8.5.10. Frame Size Sweeping

Sometimes the default frame_size does not capture enough of the signal to produce meaningful GoF plots. In such cases, sweeping across multiple frame sizes can reveal the right setting.

Arc Fault Classification Example

The Arc Fault dataset (two classes: Arc and Normal) demonstrates this clearly. Starting with a small frame size and progressively increasing it:

frame_size=256   --> Poor: clusters lack purity, no class separation
frame_size=512   --> Still poor: significant overlap persists
frame_size=1024  --> Improving: WT-based plots (5-8) begin showing less overlap
frame_size=2048  --> Better: continued improvement in WT plots
frame_size=4096  --> Good: clear, well-separated clusters visible in
                     Plot 7 (WT + MinMax Scaler + PCA)

Why Larger Frame Sizes Help

The Arc Fault dataset has a high sampling frequency, meaning many data points are recorded per second. With a small frame size, each frame captures only a tiny slice of the signal and misses the broader pattern. Increasing the frame size allows more data points per frame, revealing the true structure of the data.

Recommendation: Start with a frame_size that matches the intended model input size. If plots are inconclusive, try 2x and 4x larger frame sizes. The frame size that produces the clearest separation in the GoF plots is a good indicator of the minimum signal length needed for reliable classification.

# Sweep example: run GoF at multiple frame sizes
data_processing_feature_extraction:
  gof_test: True
  frame_size: 1024   # Try 256, 512, 1024, 2048, 4096

8.5.11. Multi-Cluster Analysis

When a single class appears as multiple separate clusters in the GoF plots, it does not necessarily mean the data is bad. Multiple clusters per class can arise from real variations within the data collection process.

Motor Fault Four-Class Example: Sampling Frequency

Running GoF on the Motor Fault Four-Class Dataset (frame_size=256) revealed that each class formed roughly 10 separate clusters. Investigation showed that the dataset contained samples collected at 10 different sampling frequencies (10 Hz to 100 Hz). Filtering the dataset to a single frequency (40 Hz) reduced the cluster count from 10 per class to 4 clusters – one per actual class. The sampling frequency variation had introduced unwanted structure.

Motor Fault Six-Class Example: Equipment Variation

The Motor Fault Six-Class Dataset showed a similar multi-cluster pattern. However, filtering to a single sampling frequency (40 Hz) still left roughly 3 clusters per class. A deeper look revealed that the data had been collected from 3 different motors. Filtering to a single motor finally produced exactly 6 clusters, matching the actual number of classes.

Common Causes of Multi-Cluster Patterns

Different sampling frequencies in the dataset
Different environmental conditions during data collection
Different equipment or sensors used across collection sessions
Varying operational states within the same nominal class

What To Do

Examine metadata (frequency, sensor ID, conditions) for systematic variation.
Filter or stratify the data by the suspected variable and re-run GoF.
If filtering resolves the multi-cluster pattern, consider whether the model should be trained on the full mixed dataset or on a controlled subset.

8.5.12. Example: Motor Fault GoF Analysis

common:
  task_type: 'generic_timeseries_classification'
  target_device: 'F28P55'

dataset:
  dataset_name: 'motor_fault_classification_dsk'

data_processing_feature_extraction:
  feature_extraction_name: 'Input256_FFTBIN_16Feature_8Frame_3InputChannel_removeDC_2D1'
  variables: 3
  gof_test: True
  frame_size: 256

training:
  enable: False  # GoF only, skip training

Expected Good Results:

6 fault classes showing clear separation:
- Normal: tight cluster, well separated
- Contaminated: distinct from normal
- Erosion: some overlap with flaking (similar faults)
- Flaking: some overlap with erosion
- No Lubrication: well separated
- Localized Fault: distinct signature

8.5.13. GoF Without Training

Run GoF analysis only (no model training):

data_processing_feature_extraction:
  gof_test: True

training:
  enable: False

testing:
  enable: False

compilation:
  enable: False

This is useful for:

Rapid dataset evaluation
Feature extraction comparison
Data quality assessment

8.5.14. Comparing Feature Extraction

Run GoF with different feature extraction to compare:

Configuration 1:

data_processing_feature_extraction:
  feature_extraction_name: 'Generic_1024Input_FFTBIN_64Feature_8Frame'
  gof_test: True

Configuration 2:

data_processing_feature_extraction:
  feature_extraction_name: 'Generic_512Input_RAW_512Feature_1Frame'
  gof_test: True

Compare the visualizations to see which gives better separability.

8.5.15. Best Practices

Always run GoF first: Before long training runs
Compare multiple feature extractions: Find the best approach
Investigate overlapping classes: May need more/different data
Use domain knowledge: Understand why classes separate (or don’t)
Document findings: GoF results inform model expectations

8.5.16. Limitations

GoF is a linear analysis; neural networks can learn non-linear boundaries
Good GoF doesn’t guarantee good model accuracy
Poor GoF may still yield acceptable models with enough complexity
2D projections can hide separability in higher dimensions

Use GoF as a guide, not a definitive answer.

8.5.17. Next Steps

Learn about Feature Extraction options
See Post-Training Analysis for model evaluation
Proceed to training if GoF looks good