8.4. Feature Extraction
Feature extraction transforms raw sensor data into a representation that helps the neural network learn patterns more effectively.
8.4.1. Overview
Why use feature extraction?
Reduced input size: Compress long time series
Better patterns: Transform to domain where patterns are clearer
Faster inference: Smaller inputs mean faster models
Domain knowledge: Incorporate signal processing expertise
8.4.2. Feature Extraction Pipeline
Raw data flows through two stages: data processing transforms, then feature extraction transforms:
Raw Signal → Data Processing → Feature Extraction → Model Input
(data_proc_transforms) (feat_ext_transform)
e.g. SimpleWindow, e.g. FFT_FE, BINNING,
Downsample ABS, LOG_DB, CONCAT
8.4.3. Configuration Parameters
The data_processing_feature_extraction section supports the following
parameters. There are two usage modes: using a preset name, or defining
a custom pipeline.
Core Parameters:
Option |
Description |
|---|---|
|
Preset name (e.g., |
|
List of data processing transforms applied before feature extraction.
Common values: |
|
List of feature extraction transforms applied in order. Common values
include: |
|
Number of input channels/variables. Supports three formats:
an integer (select first N columns), a list of column indices
(e.g., |
|
Number of samples per frame (e.g., |
|
Number of output features per frame after transform
(e.g., |
|
Number of frames to concatenate (e.g., |
|
Stride between frames as a fraction (e.g., |
Signal Processing Parameters:
Option |
Description |
|---|---|
|
Original sampling rate of the input signal. Used with the
|
|
Target sampling rate after downsampling. Used with the
|
|
Scaling factor applied to input data (e.g., |
|
Controls the overlap between consecutive frames of data. Without
offset (value |
|
Number of frames to skip between selected frames (e.g., |
|
Enable bin normalization ( |
|
Feature stacking mode: |
|
Minimum frequency bin index to include. |
|
Fraction of the signal bandwidth to analyse (e.g., |
Logarithmic Transform Parameters:
Option |
Description |
|---|---|
|
Multiplier for logarithmic scaling (e.g., |
|
Base for logarithm (e.g., |
|
Minimum threshold to avoid log(0) (e.g., |
Fixed-Point (Q15) Parameters:
Option |
Description |
|---|---|
|
Scale factor for Q15 fixed-point quantization (e.g., |
Data Augmentation and Testing:
Option |
Description |
|---|---|
|
Dictionary mapping class names to |
|
Run Goodness of Fit test on extracted features ( |
Output Control:
Option |
Description |
|---|---|
|
Store extracted feature data to disk ( |
|
When set to |
|
Use neural network for feature extraction ( |
Forecasting-Specific Parameters:
Option |
Description |
|---|---|
|
Number of future timesteps to predict (e.g., |
|
List of column indices or names for the target variable(s) to forecast
(e.g., |
8.4.4. Preset System
Tiny ML Tensorlab provides predefined feature extraction presets. When using
a preset, simply specify the feature_extraction_name and variables:
data_processing_feature_extraction:
feature_extraction_name: 'Generic_1024Input_FFTBIN_64Feature_8Frame'
variables: 1
Preset Naming Convention:
Generic_<InputSize>Input_<Transform>_<Features>Feature_<Frames>Frame
Example: Generic_1024Input_FFTBIN_64Feature_8Frame
- Input: 1024 samples
- Transform: FFT with binning
- Features: 64 frequency bins
- Frames: 8 temporal frames
- Total: 64 x 8 = 512 features to model
8.4.5. Available Presets
FFT-Based Presets:
Best for frequency-domain patterns (vibration, arc faults):
Preset |
Features |
Use Case |
|---|---|---|
|
512 |
General purpose |
|
256 |
Smaller input |
|
256 |
Full spectrum |
Raw Time-Domain Presets:
Best for waveform shape patterns:
Preset |
Features |
Use Case |
|---|---|---|
|
512 |
Full waveform |
|
256 |
Shorter window |
|
128 |
Compact input |
Application-Specific Presets:
Preset |
Application |
|---|---|
|
Motor fault (3-axis) |
|
Arc fault detection |
|
PIR detection |
8.4.6. Data Processing Transforms
The data_proc_transforms parameter specifies preprocessing steps applied
to raw data before feature extraction.
Important
Data processing transforms (data_proc_transforms) are only applied on
the training side (on PC), whereas feature extraction transforms
(feat_ext_transform) are applied on both the training side (on PC) and
the inference side (on device).
SimpleWindow
Segments continuous data into fixed-size windows:
data_processing_feature_extraction:
data_proc_transforms: ['SimpleWindow']
frame_size: 256
stride_size: 0.01
variables: 1
Downsample
Reduces the sampling rate of input data:
data_processing_feature_extraction:
data_proc_transforms: ['Downsample', 'SimpleWindow']
sampling_rate: 313000
new_sr: 3130
frame_size: 256
stride_size: 0.01
variables: 1
Multiple transforms can be chained in order:
data_processing_feature_extraction:
data_proc_transforms:
- SimpleWindow
- Downsample
frame_size: 256
sampling_rate: 100
new_sr: 1
variables: 1
8.4.7. Feature Extraction Transforms
The feat_ext_transform parameter defines the feature extraction pipeline
as an ordered list of transforms. Each step processes the output of the
previous step.
Common Transform Steps:
Transform |
Description |
|---|---|
|
Compute FFT (Fast Fourier Transform) |
|
Keep only positive frequency half of FFT |
|
Apply windowing function |
|
Group frequency bins to reduce feature count |
|
Normalize features |
|
Take absolute value |
|
Convert to logarithmic (dB) scale |
|
Remove DC component |
|
Concatenate frames into final feature vector |
|
Takes the mean of the wave within each frame; optionally removes the
DC component if |
|
Convert floating-point to fixed-point Q15 format with saturation |
|
Fixed-point Q15 FFT (for MCU deployment) |
|
Q15 bit-shift scaling (amplify or attenuate) |
|
Q15 magnitude of complex FFT output |
|
Q15 binning with optional normalization |
|
ECG-specific normalization |
Example: FFT with Binning Pipeline:
data_processing_feature_extraction:
feat_ext_transform: ['FFT_FE', 'FFT_POS_HALF', 'DC_REMOVE', 'ABS', 'BINNING', 'LOG_DB', 'CONCAT']
frame_size: 1024
feature_size_per_frame: 64
num_frame_concat: 4
variables: 1
Example: FFT without Binning Pipeline:
data_processing_feature_extraction:
feat_ext_transform: ['FFT_FE', 'FFT_POS_HALF', 'DC_REMOVE', 'ABS', 'LOG_DB', 'CONCAT']
frame_size: 256
feature_size_per_frame: 128
num_frame_concat: 1
variables: 6
Example: Fixed-Point Q15 Pipeline (for MCU deployment):
data_processing_feature_extraction:
feat_ext_transform: ['FFT_Q15', 'Q15_SCALE', 'Q15_MAG', 'DC_REMOVE', 'BIN_Q15', 'CONCAT']
frame_size: 256
feature_size_per_frame: 16
num_frame_concat: 8
q15_scale_factor: 5
normalize_bin: True
variables: 1
8.4.8. Custom Feature Extraction
For advanced use cases, use a Custom_* feature extraction name and specify
the transform pipeline manually:
data_processing_feature_extraction:
data_proc_transforms: ['SimpleWindow']
feature_extraction_name: 'Custom_Default'
feat_ext_transform: ['FFT_FE', 'FFT_POS_HALF', 'WINDOWING', 'BINNING', 'NORMALIZE', 'ABS', 'LOG_DB', 'CONCAT']
frame_size: 32
feature_size_per_frame: 8
num_frame_concat: 8
variables: 5
You can also configure additional parameters for fine-grained control:
data_processing_feature_extraction:
data_proc_transforms: []
feature_extraction_name: 'Custom_MotorFault'
feat_ext_transform: ['FFT_FE', 'FFT_POS_HALF', 'DC_REMOVE', 'ABS', 'BINNING', 'LOG_DB', 'CONCAT']
frame_size: 1024
feature_size_per_frame: 64
num_frame_concat: 4
normalize_bin: 1
stacking: '1D'
offset: 0
scale: 1
frame_skip: 1
log_mul: 20
log_base: 10
log_threshold: 1e-100
variables: 3
8.4.9. Multi-Channel Data
For sensors with multiple axes (e.g., 3-axis accelerometer), set
variables to the number of channels:
data_processing_feature_extraction:
feature_extraction_name: 'Input256_FFTBIN_16Feature_8Frame_3InputChannel_removeDC_2D1'
variables: 3
The variables parameter supports three formats:
Integer: Select first N columns (e.g.,
variables: 3)List of indices: Select specific columns (e.g.,
variables: [0, 2, 4])List of names: Select columns by name (e.g.,
variables: ['accel_x', 'accel_y', 'accel_z'])
8.4.10. Forecasting Configuration
Forecasting tasks require specific additional parameters:
data_processing_feature_extraction:
data_proc_transforms:
- SimpleWindow
frame_size: 32
stride_size: 0.1
forecast_horizon: 2
variables: 1
target_variables:
- 0
Note
SimpleWindow must be specified in data_proc_transforms for
forecasting tasks.
8.4.11. Data Augmentation
Use gain_variations to augment training data with gain variations per class:
data_processing_feature_extraction:
data_proc_transforms:
- Downsample
- SimpleWindow
gain_variations:
arc: [0.9, 1.1]
normal: [0.8, 1.2]
sampling_rate: 313000
new_sr: 3130
frame_size: 256
stride_size: 0.01
variables: 1
8.4.12. Choosing the Right Preset
Decision Tree:
Is the pattern in frequency content?
|-- Yes --> Use FFT-based preset
| |-- Need full spectrum? --> FFT_FullBandwidth
| |-- Reduce features? --> FFTBIN
|-- No --> Use RAW preset
|-- Need temporal context? --> Multi-frame
|-- Single snapshot? --> 1Frame
Common Choices by Application:
Application |
Recommended Preset |
|---|---|
Arc fault detection |
|
Motor bearing fault |
|
ECG classification |
|
Vibration anomaly |
|
Simple waveforms |
|
PIR detection |
|
8.4.13. Performance Impact
Feature extraction affects model size and speed:
Features |
Model Input |
Model Size |
Inference Time |
|---|---|---|---|
128 |
Small |
Smaller |
Faster |
256 |
Medium |
Medium |
Medium |
512 |
Large |
Larger |
Slower |
Trade-off:
More features = more information = potentially better accuracy
Fewer features = faster inference = fits smaller devices
8.4.14. On-Device Feature Extraction
Feature extraction runs on the MCU before inference. The compilation process generates C code for the feature extraction pipeline configured in your YAML.
Memory Usage:
Feature extraction buffers add to memory requirements:
Input buffer: frame_size x variables x sizeof(data_type)
FFT buffer: frame_size x sizeof(data_type)
Output buffer: feature_size_per_frame x num_frame_concat x sizeof(data_type)
8.4.15. Example Configurations
Arc Fault Classification (using preset):
data_processing_feature_extraction:
feature_extraction_name: 'FFT1024Input_256Feature_1Frame_Full_Bandwidth'
variables: 1
Motor Bearing Fault (using preset with override):
data_processing_feature_extraction:
feature_extraction_name: 'Input256_FFTBIN_16Feature_8Frame_3InputChannel_removeDC_2D1'
variables: 3
feature_size_per_frame: 4
Anomaly Detection with Downsampling:
data_processing_feature_extraction:
data_proc_transforms:
- SimpleWindow
- Downsample
frame_size: 1024
sampling_rate: 100
new_sr: 1
variables: 1
Regression with Simple Windowing:
data_processing_feature_extraction:
data_proc_transforms:
- SimpleWindow
frame_size: 512
stride_size: 0.1
variables: 6
Forecasting (PMSM Rotor Temperature):
data_processing_feature_extraction:
data_proc_transforms:
- SimpleWindow
frame_size: 3
stride_size: 0.4
forecast_horizon: 1
variables: 6
target_variables:
- 5
Goodness of Fit Testing:
Enable the gof_test parameter to run Goodness of Fit analysis on
extracted features:
data_processing_feature_extraction:
feature_extraction_name: 'Input256_FFTBIN_16Feature_8Frame_3InputChannel_removeDC_2D1'
gof_test: True
variables: 3
PCA Visualization of Extracted Features:
PCA (Principal Component Analysis) helps visualize how well the extracted features separate your classes. Well-separated clusters indicate good feature extraction.
PCA visualization of extracted features on training data
PCA visualization of extracted features on validation data
Interpreting PCA plots:
Tight clusters: Features represent the class well
Well-separated clusters: Good class separability
Overlapping clusters: May need different feature extraction
Scattered points: High variance, potentially noisy data
8.4.16. Stacking Modes
The stacking parameter controls how extracted features from multiple input
channels are organized before being fed to the model. This is particularly
relevant for multi-channel sensors (e.g., 3-axis accelerometer).
1D Stacking (stacking: '1D')
Features from all channels are concatenated into a single flat sequence.
Channel 1: [a1, a2, a3]
Channel 2: [b1, b2, b3]
Channel 3: [c1, c2, c3]
1D result: [a1, a2, a3, b1, b2, b3, c1, c2, c3]
2D1 Stacking (stacking: '2D1') – Default
Features are arranged in a 2D matrix with one row per channel.
Channel 1: [a1, a2, a3]
Channel 2: [b1, b2, b3]
Channel 3: [c1, c2, c3]
2D1 result:
[
[a1, a2, a3],
[b1, b2, b3],
[c1, c2, c3]
]
The default stacking mode is '2D1'. Choose '1D' when you want a flat
feature vector (e.g., for fully-connected models), and '2D1' when you want
to preserve the per-channel structure (e.g., for convolutional models).
data_processing_feature_extraction:
stacking: '2D1' # or '1D'
variables: 3
8.4.17. Gain Variation Augmentation
Gain variation is a data augmentation technique that multiplies the raw data of each class by a random gain factor drawn from a uniform distribution. This helps the model become robust to amplitude variations in the input signal.
How it works:
For each class, you specify a
[min, max]gain range.During data processing, a random number is drawn from the uniform distribution
U(min, max)for each sample.The raw data of the corresponding class is multiplied by this random gain factor.
Format:
gain_variations: {class1_name: [min, max], class2_name: [min, max], ...}
Example – Gain variation with Downsample and SimpleWindow:
data_processing_feature_extraction:
data_proc_transforms:
- Downsample
- SimpleWindow
gain_variations:
arc: [0.9, 1.1]
normal: [0.8, 1.2]
sampling_rate: 313000
new_sr: 3130
frame_size: 256
stride_size: 0.01
variables: 1
Note
Gain variation can be used alongside Downsample or SimpleWindow
data processing transforms. The gain is applied to the raw data before
the feature extraction stage.
8.4.18. Q15 Fixed-Point Transforms
The Q15 transform family enables fixed-point feature extraction suitable for
on-device (MCU) deployment where floating-point operations may be expensive or
unavailable. These transforms operate in Q15 fixed-point format, a 16-bit
signed integer representation where the range [-1, +1] maps to
[-32768, +32767].
Available Q15 Transforms:
Transform |
Description |
|---|---|
|
Converts the input waveform from floating-point to fixed-point Q15
format ( |
|
Performs a fixed-point Real FFT (RFFT) on the Q15 input frame to
obtain its frequency-domain representation. This is the fixed-point
equivalent of |
|
Applies bit-shift-based scaling on the Q15 signal using the
|
|
Computes the magnitude of complex Q15 FFT output samples to obtain
the real-valued amplitude spectrum. This is the fixed-point
equivalent of computing |
|
Groups adjacent Q15 samples into fixed-size bins and computes the
average per bin (optionally normalized when |
Typical Q15 Pipeline:
data_processing_feature_extraction:
feature_extraction_name: 'Custom_Q15_Pipeline'
feat_ext_transform: ['FFT_Q15', 'Q15_SCALE', 'Q15_MAG', 'DC_REMOVE', 'BIN_Q15', 'CONCAT']
frame_size: 256
feature_size_per_frame: 16
num_frame_concat: 8
q15_scale_factor: 5
normalize_bin: True
variables: 1
Note
The Q15 transforms are designed to mirror the floating-point pipeline on the MCU. When using Q15 feature extraction during training, the same fixed-point operations will be replicated on the target device, ensuring consistent behavior between training and inference.
8.4.19. Frame Offset (Overlap Control)
The offset parameter controls the overlap between consecutive frames
during data windowing.
Without offset (offset: 0):
Frames are created consecutively with no overlap. Each frame starts immediately after the previous frame ends (step size of 1 frame).
Frame 1: [sample_0 ... sample_N]
Frame 2: [sample_N+1 ... sample_2N]
Frame 3: [sample_2N+1 ... sample_3N]
With offset (offset: n):
Frames overlap by adding a fractional step size of 1/n where n is
the offset value. This creates more training samples from the same data.
offset: 2 --> step = 1/2 --> 50% overlap
offset: 4 --> step = 1/4 --> 75% overlap
For example, with offset: 2 and frame_size: 256:
Frame 1: [sample_0 ... sample_255]
Frame 2: [sample_128 ... sample_383] (50% overlap)
Frame 3: [sample_256 ... sample_511]
data_processing_feature_extraction:
offset: 2
frame_size: 256
variables: 3
8.4.20. Analysis Bandwidth
The analysis_bandwidth parameter specifies the fraction of the signal
bandwidth to analyse during feature extraction.
analysis_bandwidth: 1– analyse the full bandwidth (default).analysis_bandwidth: 0.5– analyse only the lower half of the spectrum.
This is useful when only a specific portion of the frequency spectrum contains relevant information for the task.
data_processing_feature_extraction:
analysis_bandwidth: 1
frame_size: 1024
feat_ext_transform: ['FFT_FE', 'FFT_POS_HALF', 'DC_REMOVE', 'ABS', 'BINNING', 'LOG_DB', 'CONCAT']
variables: 1
8.4.21. Feature Extraction Only Mode
Set dont_train_just_feat_ext: True to run only the data processing and
feature extraction stages without proceeding to model training. This is useful
for:
Inspecting extracted features and PCA plots before committing to training.
Iterating quickly on the feature extraction configuration.
Exporting extracted feature data (combine with
store_feat_ext_data: True).
data_processing_feature_extraction:
dont_train_just_feat_ext: True
store_feat_ext_data: True
feature_extraction_name: 'Input256_FFTBIN_16Feature_8Frame_3InputChannel_removeDC_2D1'
variables: 3
8.4.22. Evaluating Feature Extraction Quality
Choosing the right feature extraction pipeline is critical for model performance. Tiny ML Tensorlab generates PCA (Principal Component Analysis) plots on the processed data before feeding it to model training. These plots provide a visual indication of how well the chosen feature extraction separates the data classes.
Key Principle:
If the clusters in the PCA plot are visually separable for a classification task, it will be a relatively easier job for the AI model to learn the decision boundaries. Conversely, overlapping clusters indicate that the model will have a harder time distinguishing between classes.
Different feature extractions work differently per dataset. There is no single best feature extraction – it depends on the nature of the data.
Example – Motor Fault Dataset Comparison:
The following table compares four feature extraction presets on the same motor bearing fault classification dataset (6 classes, 3-axis vibration sensor), trained with the same model and hyperparameters:
Feature Extraction Preset |
Method |
Accuracy |
F1-Score |
|---|---|---|---|
|
FFTBIN |
100.0% |
1.000 |
|
FFTBIN (1D) |
99.995% |
1.000 |
|
FFT |
97.9% |
0.979 |
|
RAW |
92.3% |
0.922 |
Observations:
FFTBIN (FFT with binning) achieved 100% accuracy – the PCA clusters were clearly separable.
FFT without binning achieved 97.9% – clusters had some overlap.
RAW time-domain features achieved 92.3% – the model had a harder time distinguishing classes without frequency-domain transformation.
This demonstrates that the same dataset can respond very differently to different feature extraction methods. Always compare multiple feature extraction approaches and inspect the PCA plots to find the best configuration for your specific dataset.
Tip
Use dont_train_just_feat_ext: True to quickly iterate through different
feature extraction configurations and compare PCA plots before running full
training.
8.4.23. Best Practices
Match to signal characteristics: FFT for periodic, raw for transient
Start with standard presets: Customize only if needed
Consider device constraints: Fewer features for smaller devices
Test multiple options: Compare accuracy with different presets
Use domain knowledge: Understand what patterns you’re looking for
8.4.24. Next Steps
See Goodness of Fit to analyze dataset quality
Learn about Quantization for model compression
Explore Time Series Classification for classification