8.4. Feature Extraction

Feature extraction transforms raw sensor data into a representation that helps the neural network learn patterns more effectively.

8.4.1. Overview

Why use feature extraction?

  • Reduced input size: Compress long time series

  • Better patterns: Transform to domain where patterns are clearer

  • Faster inference: Smaller inputs mean faster models

  • Domain knowledge: Incorporate signal processing expertise

8.4.2. Feature Extraction Pipeline

Raw data flows through two stages: data processing transforms, then feature extraction transforms:

Raw Signal → Data Processing → Feature Extraction → Model Input
             (data_proc_transforms)   (feat_ext_transform)
             e.g. SimpleWindow,       e.g. FFT_FE, BINNING,
                  Downsample               ABS, LOG_DB, CONCAT

8.4.3. Configuration Parameters

The data_processing_feature_extraction section supports the following parameters. There are two usage modes: using a preset name, or defining a custom pipeline.

Core Parameters:

Option

Description

feature_extraction_name

Preset name (e.g., 'Generic_1024Input_FFTBIN_64Feature_8Frame') or a custom name starting with 'Custom_' (e.g., 'Custom_Default', 'Custom_ArcFault'). When using a preset, the transform pipeline is predefined. When using a Custom_* name, you must specify feat_ext_transform and related parameters.

data_proc_transforms

List of data processing transforms applied before feature extraction. Common values: ['SimpleWindow'], ['Downsample'], ['SimpleWindow', 'Downsample'], ['Downsample', 'SimpleWindow'], or [] (empty).

feat_ext_transform

List of feature extraction transforms applied in order. Common values include: 'FFT_FE', 'FFT_POS_HALF', 'WINDOWING', 'BINNING', 'NORMALIZE', 'ABS', 'LOG_DB', 'DC_REMOVE', 'CONCAT', 'RAW_FE', 'TO_Q15', 'FFT_Q15', 'Q15_SCALE', 'Q15_MAG', 'BIN_Q15', 'ECG_NORMALIZE'.

variables

Number of input channels/variables. Supports three formats: an integer (select first N columns), a list of column indices (e.g., [0, 2, 4]), or a list of column names (e.g., ['accel_x', 'accel_y', 'accel_z']).

frame_size

Number of samples per frame (e.g., 128, 256, 512, 1024).

feature_size_per_frame

Number of output features per frame after transform (e.g., 8, 16, 32, 64, 128).

num_frame_concat

Number of frames to concatenate (e.g., 1, 4, 8). Total features = feature_size_per_frame x num_frame_concat.

stride_size

Stride between frames as a fraction (e.g., 0.01, 0.1, 0.25, 0.5, 1).

Signal Processing Parameters:

Option

Description

sampling_rate

Original sampling rate of the input signal. Used with the Downsample data processing transform.

new_sr

Target sampling rate after downsampling. Used with the Downsample data processing transform.

scale

Scaling factor applied to input data (e.g., 0.00390625 for 1/256).

offset

Controls the overlap between consecutive frames of data. Without offset (value 0): frames are created consecutively with no overlap (step size of 1). With offset: frames overlap by adding a fractional step size 1/n where n is the offset value. For example, offset: 2 means each frame overlaps by 50%.

frame_skip

Number of frames to skip between selected frames (e.g., 1, 8).

normalize_bin

Enable bin normalization (True/1).

stacking

Feature stacking mode: '2D1' or '1D'.

min_bin

Minimum frequency bin index to include.

analysis_bandwidth

Fraction of the signal bandwidth to analyse (e.g., 1 for full bandwidth). Useful when only a portion of the frequency spectrum is relevant to the task.

Logarithmic Transform Parameters:

Option

Description

log_mul

Multiplier for logarithmic scaling (e.g., 20 for dB).

log_base

Base for logarithm (e.g., 10).

log_threshold

Minimum threshold to avoid log(0) (e.g., 1e-100).

Fixed-Point (Q15) Parameters:

Option

Description

q15_scale_factor

Scale factor for Q15 fixed-point quantization (e.g., 4, 5).

Data Augmentation and Testing:

Option

Description

gain_variations

Dictionary mapping class names to [min_gain, max_gain] ranges for data augmentation. Example: {arc: [0.9, 1.1], normal: [0.8, 1.2]}.

gof_test

Run Goodness of Fit test on extracted features (True/False).

Output Control:

Option

Description

store_feat_ext_data

Store extracted feature data to disk (True/False).

dont_train_just_feat_ext

When set to True, the pipeline will only perform data processing and feature extraction without proceeding to model training. This is useful for inspecting extracted features, generating PCA plots, and iterating on the feature extraction configuration before committing to a full training run (True/False, default: False).

nn_for_feature_extraction

Use neural network for feature extraction (True/False).

Forecasting-Specific Parameters:

Option

Description

forecast_horizon

Number of future timesteps to predict (e.g., 1, 2).

target_variables

List of column indices or names for the target variable(s) to forecast (e.g., [0], [5], ['temperature']).

8.4.4. Preset System

Tiny ML Tensorlab provides predefined feature extraction presets. When using a preset, simply specify the feature_extraction_name and variables:

data_processing_feature_extraction:
  feature_extraction_name: 'Generic_1024Input_FFTBIN_64Feature_8Frame'
  variables: 1

Preset Naming Convention:

Generic_<InputSize>Input_<Transform>_<Features>Feature_<Frames>Frame

Example: Generic_1024Input_FFTBIN_64Feature_8Frame
- Input: 1024 samples
- Transform: FFT with binning
- Features: 64 frequency bins
- Frames: 8 temporal frames
- Total: 64 x 8 = 512 features to model

8.4.5. Available Presets

FFT-Based Presets:

Best for frequency-domain patterns (vibration, arc faults):

Preset

Features

Use Case

Generic_1024Input_FFTBIN_64Feature_8Frame

512

General purpose

Generic_512Input_FFTBIN_32Feature_8Frame

256

Smaller input

FFT1024Input_256Feature_1Frame_Full_Bandwidth

256

Full spectrum

Raw Time-Domain Presets:

Best for waveform shape patterns:

Preset

Features

Use Case

Generic_512Input_RAW_512Feature_1Frame

512

Full waveform

Generic_256Input_RAW_256Feature_1Frame

256

Shorter window

Generic_128Input_RAW_128Feature_1Frame

128

Compact input

Application-Specific Presets:

Preset

Application

Input256_FFTBIN_16Feature_8Frame_3InputChannel_removeDC_2D1

Motor fault (3-axis)

FFT1024Input_256Feature_1Frame_Full_Bandwidth

Arc fault detection

PIRDetection_125Input_25Feature_25Frame_1InputChannel_2D

PIR detection

8.4.6. Data Processing Transforms

The data_proc_transforms parameter specifies preprocessing steps applied to raw data before feature extraction.

Important

Data processing transforms (data_proc_transforms) are only applied on the training side (on PC), whereas feature extraction transforms (feat_ext_transform) are applied on both the training side (on PC) and the inference side (on device).

SimpleWindow

Segments continuous data into fixed-size windows:

data_processing_feature_extraction:
  data_proc_transforms: ['SimpleWindow']
  frame_size: 256
  stride_size: 0.01
  variables: 1

Downsample

Reduces the sampling rate of input data:

data_processing_feature_extraction:
  data_proc_transforms: ['Downsample', 'SimpleWindow']
  sampling_rate: 313000
  new_sr: 3130
  frame_size: 256
  stride_size: 0.01
  variables: 1

Multiple transforms can be chained in order:

data_processing_feature_extraction:
  data_proc_transforms:
  - SimpleWindow
  - Downsample
  frame_size: 256
  sampling_rate: 100
  new_sr: 1
  variables: 1

8.4.7. Feature Extraction Transforms

The feat_ext_transform parameter defines the feature extraction pipeline as an ordered list of transforms. Each step processes the output of the previous step.

Common Transform Steps:

Transform

Description

FFT_FE

Compute FFT (Fast Fourier Transform)

FFT_POS_HALF

Keep only positive frequency half of FFT

WINDOWING

Apply windowing function

BINNING

Group frequency bins to reduce feature count

NORMALIZE

Normalize features

ABS

Take absolute value

LOG_DB

Convert to logarithmic (dB) scale

DC_REMOVE

Remove DC component

CONCAT

Concatenate frames into final feature vector

RAW_FE

Takes the mean of the wave within each frame; optionally removes the DC component if dc_remove is True

TO_Q15

Convert floating-point to fixed-point Q15 format with saturation

FFT_Q15

Fixed-point Q15 FFT (for MCU deployment)

Q15_SCALE

Q15 bit-shift scaling (amplify or attenuate)

Q15_MAG

Q15 magnitude of complex FFT output

BIN_Q15

Q15 binning with optional normalization

ECG_NORMALIZE

ECG-specific normalization

Example: FFT with Binning Pipeline:

data_processing_feature_extraction:
  feat_ext_transform: ['FFT_FE', 'FFT_POS_HALF', 'DC_REMOVE', 'ABS', 'BINNING', 'LOG_DB', 'CONCAT']
  frame_size: 1024
  feature_size_per_frame: 64
  num_frame_concat: 4
  variables: 1

Example: FFT without Binning Pipeline:

data_processing_feature_extraction:
  feat_ext_transform: ['FFT_FE', 'FFT_POS_HALF', 'DC_REMOVE', 'ABS', 'LOG_DB', 'CONCAT']
  frame_size: 256
  feature_size_per_frame: 128
  num_frame_concat: 1
  variables: 6

Example: Fixed-Point Q15 Pipeline (for MCU deployment):

data_processing_feature_extraction:
  feat_ext_transform: ['FFT_Q15', 'Q15_SCALE', 'Q15_MAG', 'DC_REMOVE', 'BIN_Q15', 'CONCAT']
  frame_size: 256
  feature_size_per_frame: 16
  num_frame_concat: 8
  q15_scale_factor: 5
  normalize_bin: True
  variables: 1

8.4.8. Custom Feature Extraction

For advanced use cases, use a Custom_* feature extraction name and specify the transform pipeline manually:

data_processing_feature_extraction:
  data_proc_transforms: ['SimpleWindow']
  feature_extraction_name: 'Custom_Default'
  feat_ext_transform: ['FFT_FE', 'FFT_POS_HALF', 'WINDOWING', 'BINNING', 'NORMALIZE', 'ABS', 'LOG_DB', 'CONCAT']
  frame_size: 32
  feature_size_per_frame: 8
  num_frame_concat: 8
  variables: 5

You can also configure additional parameters for fine-grained control:

data_processing_feature_extraction:
  data_proc_transforms: []
  feature_extraction_name: 'Custom_MotorFault'
  feat_ext_transform: ['FFT_FE', 'FFT_POS_HALF', 'DC_REMOVE', 'ABS', 'BINNING', 'LOG_DB', 'CONCAT']
  frame_size: 1024
  feature_size_per_frame: 64
  num_frame_concat: 4
  normalize_bin: 1
  stacking: '1D'
  offset: 0
  scale: 1
  frame_skip: 1
  log_mul: 20
  log_base: 10
  log_threshold: 1e-100
  variables: 3

8.4.9. Multi-Channel Data

For sensors with multiple axes (e.g., 3-axis accelerometer), set variables to the number of channels:

data_processing_feature_extraction:
  feature_extraction_name: 'Input256_FFTBIN_16Feature_8Frame_3InputChannel_removeDC_2D1'
  variables: 3

The variables parameter supports three formats:

  • Integer: Select first N columns (e.g., variables: 3)

  • List of indices: Select specific columns (e.g., variables: [0, 2, 4])

  • List of names: Select columns by name (e.g., variables: ['accel_x', 'accel_y', 'accel_z'])

8.4.10. Forecasting Configuration

Forecasting tasks require specific additional parameters:

data_processing_feature_extraction:
  data_proc_transforms:
  - SimpleWindow
  frame_size: 32
  stride_size: 0.1
  forecast_horizon: 2
  variables: 1
  target_variables:
  - 0

Note

SimpleWindow must be specified in data_proc_transforms for forecasting tasks.

8.4.11. Data Augmentation

Use gain_variations to augment training data with gain variations per class:

data_processing_feature_extraction:
  data_proc_transforms:
  - Downsample
  - SimpleWindow
  gain_variations:
    arc: [0.9, 1.1]
    normal: [0.8, 1.2]
  sampling_rate: 313000
  new_sr: 3130
  frame_size: 256
  stride_size: 0.01
  variables: 1

8.4.12. Choosing the Right Preset

Decision Tree:

Is the pattern in frequency content?
|-- Yes --> Use FFT-based preset
|   |-- Need full spectrum? --> FFT_FullBandwidth
|   |-- Reduce features? --> FFTBIN
|-- No --> Use RAW preset
    |-- Need temporal context? --> Multi-frame
    |-- Single snapshot? --> 1Frame

Common Choices by Application:

Application

Recommended Preset

Arc fault detection

FFT1024Input_256Feature_1Frame_Full_Bandwidth

Motor bearing fault

Input256_FFTBIN_16Feature_8Frame_3InputChannel_removeDC_2D1

ECG classification

ECG2500Input_Roundoff_1Frame

Vibration anomaly

Generic_1024Input_FFTBIN_64Feature_8Frame

Simple waveforms

Generic_512Input_FFTBIN_32Feature_8Frame

PIR detection

PIRDetection_125Input_25Feature_25Frame_1InputChannel_2D

8.4.13. Performance Impact

Feature extraction affects model size and speed:

Features

Model Input

Model Size

Inference Time

128

Small

Smaller

Faster

256

Medium

Medium

Medium

512

Large

Larger

Slower

Trade-off:

  • More features = more information = potentially better accuracy

  • Fewer features = faster inference = fits smaller devices

8.4.14. On-Device Feature Extraction

Feature extraction runs on the MCU before inference. The compilation process generates C code for the feature extraction pipeline configured in your YAML.

Memory Usage:

Feature extraction buffers add to memory requirements:

Input buffer:  frame_size x variables x sizeof(data_type)
FFT buffer:    frame_size x sizeof(data_type)
Output buffer: feature_size_per_frame x num_frame_concat x sizeof(data_type)

8.4.15. Example Configurations

Arc Fault Classification (using preset):

data_processing_feature_extraction:
  feature_extraction_name: 'FFT1024Input_256Feature_1Frame_Full_Bandwidth'
  variables: 1

Motor Bearing Fault (using preset with override):

data_processing_feature_extraction:
  feature_extraction_name: 'Input256_FFTBIN_16Feature_8Frame_3InputChannel_removeDC_2D1'
  variables: 3
  feature_size_per_frame: 4

Anomaly Detection with Downsampling:

data_processing_feature_extraction:
  data_proc_transforms:
  - SimpleWindow
  - Downsample
  frame_size: 1024
  sampling_rate: 100
  new_sr: 1
  variables: 1

Regression with Simple Windowing:

data_processing_feature_extraction:
  data_proc_transforms:
  - SimpleWindow
  frame_size: 512
  stride_size: 0.1
  variables: 6

Forecasting (PMSM Rotor Temperature):

data_processing_feature_extraction:
  data_proc_transforms:
  - SimpleWindow
  frame_size: 3
  stride_size: 0.4
  forecast_horizon: 1
  variables: 6
  target_variables:
  - 5

Goodness of Fit Testing:

Enable the gof_test parameter to run Goodness of Fit analysis on extracted features:

data_processing_feature_extraction:
  feature_extraction_name: 'Input256_FFTBIN_16Feature_8Frame_3InputChannel_removeDC_2D1'
  gof_test: True
  variables: 3

PCA Visualization of Extracted Features:

PCA (Principal Component Analysis) helps visualize how well the extracted features separate your classes. Well-separated clusters indicate good feature extraction.

PCA on Training Data

PCA visualization of extracted features on training data

PCA on Validation Data

PCA visualization of extracted features on validation data

Interpreting PCA plots:

  • Tight clusters: Features represent the class well

  • Well-separated clusters: Good class separability

  • Overlapping clusters: May need different feature extraction

  • Scattered points: High variance, potentially noisy data

8.4.16. Stacking Modes

The stacking parameter controls how extracted features from multiple input channels are organized before being fed to the model. This is particularly relevant for multi-channel sensors (e.g., 3-axis accelerometer).

1D Stacking (stacking: '1D')

Features from all channels are concatenated into a single flat sequence.

Channel 1: [a1, a2, a3]
Channel 2: [b1, b2, b3]
Channel 3: [c1, c2, c3]

1D result: [a1, a2, a3, b1, b2, b3, c1, c2, c3]

2D1 Stacking (stacking: '2D1') – Default

Features are arranged in a 2D matrix with one row per channel.

Channel 1: [a1, a2, a3]
Channel 2: [b1, b2, b3]
Channel 3: [c1, c2, c3]

2D1 result:
[
  [a1, a2, a3],
  [b1, b2, b3],
  [c1, c2, c3]
]

The default stacking mode is '2D1'. Choose '1D' when you want a flat feature vector (e.g., for fully-connected models), and '2D1' when you want to preserve the per-channel structure (e.g., for convolutional models).

data_processing_feature_extraction:
  stacking: '2D1'   # or '1D'
  variables: 3

8.4.17. Gain Variation Augmentation

Gain variation is a data augmentation technique that multiplies the raw data of each class by a random gain factor drawn from a uniform distribution. This helps the model become robust to amplitude variations in the input signal.

How it works:

  1. For each class, you specify a [min, max] gain range.

  2. During data processing, a random number is drawn from the uniform distribution U(min, max) for each sample.

  3. The raw data of the corresponding class is multiplied by this random gain factor.

Format:

gain_variations: {class1_name: [min, max], class2_name: [min, max], ...}

Example – Gain variation with Downsample and SimpleWindow:

data_processing_feature_extraction:
  data_proc_transforms:
  - Downsample
  - SimpleWindow
  gain_variations:
    arc: [0.9, 1.1]
    normal: [0.8, 1.2]
  sampling_rate: 313000
  new_sr: 3130
  frame_size: 256
  stride_size: 0.01
  variables: 1

Note

Gain variation can be used alongside Downsample or SimpleWindow data processing transforms. The gain is applied to the raw data before the feature extraction stage.

8.4.18. Q15 Fixed-Point Transforms

The Q15 transform family enables fixed-point feature extraction suitable for on-device (MCU) deployment where floating-point operations may be expensive or unavailable. These transforms operate in Q15 fixed-point format, a 16-bit signed integer representation where the range [-1, +1] maps to [-32768, +32767].

Available Q15 Transforms:

Transform

Description

TO_Q15

Converts the input waveform from floating-point to fixed-point Q15 format (-1 maps to -32768, +1 maps to +32767). Values outside this range are saturated (clipped) to the Q15 bounds.

FFT_Q15

Performs a fixed-point Real FFT (RFFT) on the Q15 input frame to obtain its frequency-domain representation. This is the fixed-point equivalent of FFT_FE.

Q15_SCALE

Applies bit-shift-based scaling on the Q15 signal using the q15_scale_factor parameter. Positive values left-shift (amplify) the signal, while negative values right-shift (attenuate) it. This is a power-of-two multiplication or division.

Q15_MAG

Computes the magnitude of complex Q15 FFT output samples to obtain the real-valued amplitude spectrum. This is the fixed-point equivalent of computing ABS on complex FFT output.

BIN_Q15

Groups adjacent Q15 samples into fixed-size bins and computes the average per bin (optionally normalized when normalize_bin is set). Results are clipped to the 16-bit range. This is the fixed-point equivalent of BINNING.

Typical Q15 Pipeline:

data_processing_feature_extraction:
  feature_extraction_name: 'Custom_Q15_Pipeline'
  feat_ext_transform: ['FFT_Q15', 'Q15_SCALE', 'Q15_MAG', 'DC_REMOVE', 'BIN_Q15', 'CONCAT']
  frame_size: 256
  feature_size_per_frame: 16
  num_frame_concat: 8
  q15_scale_factor: 5
  normalize_bin: True
  variables: 1

Note

The Q15 transforms are designed to mirror the floating-point pipeline on the MCU. When using Q15 feature extraction during training, the same fixed-point operations will be replicated on the target device, ensuring consistent behavior between training and inference.

8.4.19. Frame Offset (Overlap Control)

The offset parameter controls the overlap between consecutive frames during data windowing.

Without offset (offset: 0):

Frames are created consecutively with no overlap. Each frame starts immediately after the previous frame ends (step size of 1 frame).

Frame 1: [sample_0 ... sample_N]
Frame 2: [sample_N+1 ... sample_2N]
Frame 3: [sample_2N+1 ... sample_3N]

With offset (offset: n):

Frames overlap by adding a fractional step size of 1/n where n is the offset value. This creates more training samples from the same data.

offset: 2  -->  step = 1/2  -->  50% overlap
offset: 4  -->  step = 1/4  -->  75% overlap

For example, with offset: 2 and frame_size: 256:

Frame 1: [sample_0   ... sample_255]
Frame 2: [sample_128 ... sample_383]   (50% overlap)
Frame 3: [sample_256 ... sample_511]
data_processing_feature_extraction:
  offset: 2
  frame_size: 256
  variables: 3

8.4.20. Analysis Bandwidth

The analysis_bandwidth parameter specifies the fraction of the signal bandwidth to analyse during feature extraction.

  • analysis_bandwidth: 1 – analyse the full bandwidth (default).

  • analysis_bandwidth: 0.5 – analyse only the lower half of the spectrum.

This is useful when only a specific portion of the frequency spectrum contains relevant information for the task.

data_processing_feature_extraction:
  analysis_bandwidth: 1
  frame_size: 1024
  feat_ext_transform: ['FFT_FE', 'FFT_POS_HALF', 'DC_REMOVE', 'ABS', 'BINNING', 'LOG_DB', 'CONCAT']
  variables: 1

8.4.21. Feature Extraction Only Mode

Set dont_train_just_feat_ext: True to run only the data processing and feature extraction stages without proceeding to model training. This is useful for:

  • Inspecting extracted features and PCA plots before committing to training.

  • Iterating quickly on the feature extraction configuration.

  • Exporting extracted feature data (combine with store_feat_ext_data: True).

data_processing_feature_extraction:
  dont_train_just_feat_ext: True
  store_feat_ext_data: True
  feature_extraction_name: 'Input256_FFTBIN_16Feature_8Frame_3InputChannel_removeDC_2D1'
  variables: 3

8.4.22. Evaluating Feature Extraction Quality

Choosing the right feature extraction pipeline is critical for model performance. Tiny ML Tensorlab generates PCA (Principal Component Analysis) plots on the processed data before feeding it to model training. These plots provide a visual indication of how well the chosen feature extraction separates the data classes.

Key Principle:

If the clusters in the PCA plot are visually separable for a classification task, it will be a relatively easier job for the AI model to learn the decision boundaries. Conversely, overlapping clusters indicate that the model will have a harder time distinguishing between classes.

Different feature extractions work differently per dataset. There is no single best feature extraction – it depends on the nature of the data.

Example – Motor Fault Dataset Comparison:

The following table compares four feature extraction presets on the same motor bearing fault classification dataset (6 classes, 3-axis vibration sensor), trained with the same model and hyperparameters:

Feature Extraction Preset

Method

Accuracy

F1-Score

Input256_FFTBIN_16Feature_8Frame_3InputChannel_removeDC_2D1

FFTBIN

100.0%

1.000

Input256_FFTBIN_16Feature_8Frame_3InputChannel_removeDC_1D

FFTBIN (1D)

99.995%

1.000

Input256_FFT_128Feature_1Frame_3InputChannel_removeDC_2D1

FFT

97.9%

0.979

Input128_RAW_128Feature_1Frame_3InputChannel_removeDC_2D1

RAW

92.3%

0.922

Observations:

  • FFTBIN (FFT with binning) achieved 100% accuracy – the PCA clusters were clearly separable.

  • FFT without binning achieved 97.9% – clusters had some overlap.

  • RAW time-domain features achieved 92.3% – the model had a harder time distinguishing classes without frequency-domain transformation.

This demonstrates that the same dataset can respond very differently to different feature extraction methods. Always compare multiple feature extraction approaches and inspect the PCA plots to find the best configuration for your specific dataset.

Tip

Use dont_train_just_feat_ext: True to quickly iterate through different feature extraction configurations and compare PCA plots before running full training.

8.4.23. Best Practices

  1. Match to signal characteristics: FFT for periodic, raw for transient

  2. Start with standard presets: Customize only if needed

  3. Consider device constraints: Fewer features for smaller devices

  4. Test multiple options: Compare accuracy with different presets

  5. Use domain knowledge: Understand what patterns you’re looking for

8.4.24. Next Steps