5.1. Classification Dataset Format
This guide explains how to format datasets for time series classification tasks.
5.1.1. Directory Structure
Classification datasets use a classes/ folder where each subfolder represents
a class:
my_dataset/
└── classes/
├── class_A/
│ ├── sample1.csv
│ ├── sample2.csv
│ └── sample3.csv
├── class_B/
│ ├── sample1.csv
│ └── sample2.csv
└── class_C/
└── sample1.csv
Key points:
Folder names become class labels
Each CSV file is one sample (or multiple samples if using windowing)
All files should have the same number of columns
5.1.2. Data File Format
Headerless Format (Simple)
Just numeric values, one measurement per row:
0.523
0.612
0.498
0.701
...
Headered Format (Recommended for Multi-Variable)
First row contains column names:
channel_x,channel_y,channel_z
0.523,0.112,-0.234
0.612,0.098,-0.198
0.498,0.145,-0.267
...
Time Column Handling
Any column containing “time” (case-insensitive) is automatically dropped:
Time,value1,value2
0.001,0.523,0.112
0.002,0.612,0.098
...
The “Time” column will be removed, leaving only value1 and value2.
5.1.3. Supported File Types
Extension |
Description |
|---|---|
|
Comma-separated values (most common) |
|
Tab or space-separated text |
|
NumPy array (binary, faster loading) |
|
Pickled pandas DataFrame |
5.1.4. Annotations (Optional)
You can optionally provide train/val/test splits using annotation files:
my_dataset/
├── classes/
│ └── ...
└── annotations/
├── file_list.txt # All files (auto-generated if missing)
├── instances_train_list.txt # Training files
├── instances_val_list.txt # Validation files
└── instances_test_list.txt # Test files (optional)
File List Format
Each annotation file lists relative paths, one per line:
# instances_train_list.txt
class_A/sample1.csv
class_A/sample2.csv
class_B/sample1.csv
If annotations folder is missing, ModelMaker auto-generates splits using
split_factor from config.
5.1.5. Configuration
dataset:
enable: True
dataset_name: 'my_classification_data'
input_data_path: '/path/to/my_dataset' # or URL to .zip
data_dir: 'classes' # Default
annotation_dir: 'annotations' # Default (optional)
split_type: 'amongst_files'
split_factor: [0.6, 0.3, 0.1] # train, val, test
data_processing_feature_extraction:
variables: 3 # Number of data columns
Note
Zip file structure requirement: When using a zip file as
input_data_path, the zip must contain the classes/ directory
immediately inside it (at the top level). Do not add an extra directory
level such as dataset_name/classes/ inside the zip. The same applies to
the optional annotations/ directory.
5.1.6. Dataset Splitting Modes
The split_type parameter controls how ModelMaker divides data into train,
validation, and test sets when the annotations/ folder is not provided.
amongst_files (default)
Entire files are assigned to train, validation, or test sets. For example, with
10 files (each having 100 rows) and the default split_factor: [0.6, 0.3, 0.1]:
6 files go to training (each retains all 100 rows)
3 files go to validation
1 file goes to test
Use this mode when each file represents a distinct experiment or recording session.
within_files
Each file is split internally into train, validation, and test portions. For example, with 10 files (each having 100 rows):
All 10 files appear in every split
Training portion: first 60 rows of each file
Validation portion: next 30 rows of each file
Test portion: last 10 rows of each file
Use this mode when files contain long continuous sequences that can be safely split at arbitrary points.
dataset:
split_type: 'within_files' # or 'amongst_files' (default)
split_factor: [0.6, 0.3, 0.1] # train, val, test proportions
5.1.7. Example: 3-Class Vibration Data
Dataset structure:
vibration_dataset/
└── classes/
├── normal/
│ ├── run1.csv
│ ├── run2.csv
│ └── run3.csv
├── fault_type_A/
│ ├── fault1.csv
│ └── fault2.csv
└── fault_type_B/
└── fault1.csv
Sample file (normal/run1.csv):
accel_x,accel_y,accel_z
0.012,0.005,-0.982
0.015,0.008,-0.979
0.010,0.003,-0.985
...
Config:
dataset:
dataset_name: 'vibration_data'
input_data_path: '/data/vibration_dataset'
data_processing_feature_extraction:
feature_extraction_name: 'Generic_256Input_FFTBIN_16Feature_8Frame'
variables: 3 # x, y, z axes
5.1.8. Class Balancing
For best results, try to have similar sample counts per class.
If classes are imbalanced:
Collect more data for minority classes
Use data augmentation (gain variation)
Adjust training parameters
data_processing_feature_extraction:
gain_variations: {fault_type_A: [0.9, 1.1]} # Augment minority class
5.1.9. Common Issues
“Dimension mismatch” error
All files must have the same number of columns. Check for:
Extra header rows
Missing columns in some files
Different delimiters
“Empty file” error
Ensure files contain actual data, not just headers.
Class not detected
Check folder names don’t contain special characters
Ensure files exist in class folders
Verify file extensions are supported
5.1.10. Key Differences: Classification vs Regression vs Forecasting
The following table summarizes the key structural and behavioral differences across the three time series task types.
Aspect |
Classification |
Regression |
Forecasting |
|---|---|---|---|
Folder Structure |
|
|
|
Target Location |
Folder name (implicit) |
Last column of each file |
Specified via |
Target Type |
Discrete class label |
Continuous value (averaged over window) |
Future value at |
Annotations |
Optional (auto-generated) |
Required |
Required |
Feature Extraction |
Supported (FFT, wavelets, etc.) |
Supported ( |
Not supported (raw time series only) |
Loss Function |
CrossEntropyLoss |
MSELoss |
HuberLoss |
Evaluation Metrics |
Accuracy, F1-score |
MSE, R-squared |
SMAPE, R-squared |
|
Can be True |
Can be True |
Must be False |
|
Optional |
Mandatory |
Mandatory |
|
N/A |
N/A |
Required parameter |