5.2. Regression Dataset Format
This guide explains how to format datasets for time series regression tasks.
5.2.1. Directory Structure
Regression datasets use a flat files/ folder structure (not class folders):
my_dataset/
├── files/ # MUST be named "files"
│ ├── datafile1.csv
│ ├── datafile2.csv
│ └── datafileN.csv
└── annotations/ # Required for regression
├── instances_train_list.txt
└── instances_val_list.txt
Important
The data directory must be named files/, not classes/ or anything else.
5.2.2. Data File Format
Critical: The target value must be in the last column.
feature1,feature2,feature3,...,target
0.5,18.2,45.5,...,0.187
0.6,18.5,45.6,...,0.245
...
Example: Motor Torque Prediction
current_d,current_q,voltage_d,voltage_q,motor_speed,pm_temp,target_torque
-0.450,0.032,18.805,1.499,0.002,24.55,0.187
-0.325,0.045,18.818,1.542,0.003,24.54,0.245
-0.440,0.028,18.876,1.456,0.002,24.54,0.176
Columns 1-6: Input features
Column 7 (last): Target value to predict
5.2.3. Time Column Handling
Any column containing “time” is automatically dropped:
Time,feature1,feature2,target
0.001,0.5,18.2,0.187
0.002,0.6,18.5,0.245
The “Time” column will be removed automatically.
5.2.4. Annotation Files (Required)
Unlike classification, regression requires annotation files:
instances_train_list.txt:
datafile1.csv
datafile3.csv
datafile5.csv
instances_val_list.txt:
datafile2.csv
datafile4.csv
instances_test_list.txt (optional):
datafile6.csv
If you don’t provide annotations, ModelMaker will auto-generate them.
5.2.5. Configuration
dataset:
enable: True
dataset_name: 'my_regression_data'
input_data_path: '/path/to/my_dataset'
data_dir: 'files'
annotation_dir: 'annotations'
data_processing_feature_extraction:
data_proc_transforms: ['SimpleWindow'] # Required!
frame_size: 128
stride_size: 0.25
variables: 6 # Input columns (excluding target)
Important
SimpleWindow transform is required for regression tasks.
5.2.6. Target Processing
The target value (last column) is processed as follows:
Each window of
frame_sizerows is extractedThe target value is averaged across the window
This averaged value becomes the label for that window
Example with frame_size=4:
Window 1: rows 0-3, targets [0.18, 0.24, 0.17, 0.19] → avg = 0.195
Window 2: rows 2-5, targets [0.17, 0.19, 0.22, 0.20] → avg = 0.195
5.2.7. Complete Example
Dataset structure:
torque_measurement/
├── files/
│ ├── experiment_001.csv
│ ├── experiment_002.csv
│ ├── experiment_003.csv
│ └── experiment_004.csv
└── annotations/
├── instances_train_list.txt # experiment_001.csv, experiment_002.csv
└── instances_val_list.txt # experiment_003.csv, experiment_004.csv
experiment_001.csv:
i_d,i_q,u_d,u_q,speed,temp,torque
-0.45,0.03,18.80,1.49,0.002,24.5,0.187
-0.32,0.04,18.81,1.54,0.003,24.5,0.245
...
config.yaml:
common:
task_type: 'generic_timeseries_regression'
target_device: 'F28P55'
dataset:
dataset_name: 'torque_measurement'
input_data_path: '/data/torque_measurement'
data_processing_feature_extraction:
data_proc_transforms: ['SimpleWindow']
frame_size: 128
stride_size: 0.25
variables: 6
training:
model_name: 'REGR_1k_NPU'
training_epochs: 100
5.2.8. Troubleshooting
“Target column not found” error
The target variable must be in the last column of every CSV file. If your target
is not the last column, either reorder the columns or specify target_variables
explicitly in the configuration. Double-check that no extra trailing delimiter is
creating a phantom empty column after your intended target.
“Insufficient sequence length” error
Each data file must contain enough rows to produce at least one window. The minimum row count per file is:
frame_size + (stride as absolute rows)
For example, with frame_size=128 and stride_size=0.25 (which translates to a
stride of 32 rows), you need at least 128 rows per file. If files are shorter, either
increase their length or reduce frame_size.
“Annotation file missing” error
Unlike classification (where annotations are optional), regression requires an
annotations/ directory containing at least instances_train_list.txt and
instances_val_list.txt. If you omit the annotations folder entirely, ModelMaker
will attempt to auto-generate splits, but providing explicit splits is recommended
for reproducibility.
“Data dimension mismatch” error
All CSV files in the files/ directory must have the same number of columns.
Verify that:
No files have extra or missing columns
The
variablesparameter in the config matches the actual number of input feature columns (i.e., total columns minus the target column, minus any auto-dropped time column)Delimiters are consistent across all files
“Time column” gotcha
Warning
Do not name any feature column with the word “time” (case-insensitive).
Any column whose header contains “time” (e.g., Time, Timestamp,
TIME (microsec)) is automatically dropped during data loading. If you
need a temporal feature, use a name like elapsed_sec or sample_index
instead.
5.2.9. Best Practices
Use consistent column ordering across all files in the dataset. Every CSV should have the same columns in the same order.
Avoid naming columns “time”. Use
timestamp,elapsed_sec, orsample_indexif you need a temporal reference column, since any column with “time” in its name is silently dropped.Ensure numerical-only data. All values (except the optional header row) must be numeric (integers or floats). String values, NaN entries, or missing fields will cause errors during data loading.
Include enough variety in the target range for the model to generalize. If all target values cluster in a narrow range, the model may fail to learn meaningful regression. Aim for training data that covers the full expected operating range of the target variable.
Remove outliers and NaN values before preparing the dataset. Extreme outliers can disproportionately affect MSE-based training.
Use descriptive filenames (e.g.,
motor_test_001.csv) to make annotation files easier to manage.Test with a small subset first to validate the dataset format before launching a full training run.
5.2.10. Common Issues
“Target not found” error
Ensure the target is in the last column of your CSV.
“No windows generated” error
Check that files have at least frame_size rows.
Poor regression performance
Try different
frame_sizevaluesEnsure input features are relevant to target
Normalize extreme values in your data