5.1. Classification Dataset Format
This guide explains how to format datasets for time series classification tasks.
5.1.1. Directory Structure
Classification datasets use a classes/ folder where each subfolder represents
a class:
my_dataset/
└── classes/
├── class_A/
│ ├── sample1.csv
│ ├── sample2.csv
│ └── sample3.csv
├── class_B/
│ ├── sample1.csv
│ └── sample2.csv
└── class_C/
└── sample1.csv
Key points:
Folder names become class labels
Each CSV file is one sample (or multiple samples if using windowing)
All files should have the same number of columns
5.1.2. Data File Format
Headerless Format (Simple)
Just numeric values, one measurement per row:
0.523
0.612
0.498
0.701
...
Headered Format (Recommended for Multi-Variable)
First row contains column names:
channel_x,channel_y,channel_z
0.523,0.112,-0.234
0.612,0.098,-0.198
0.498,0.145,-0.267
...
Time Column Handling
Any column containing “time” (case-insensitive) is automatically dropped:
Time,value1,value2
0.001,0.523,0.112
0.002,0.612,0.098
...
The “Time” column will be removed, leaving only value1 and value2.
5.1.3. Supported File Types
Extension |
Description |
|---|---|
|
Comma-separated values (most common) |
|
Tab or space-separated text |
|
NumPy array (binary, faster loading) |
|
Pickled pandas DataFrame |
5.1.4. Annotations (Optional)
You can optionally provide train/val/test splits using annotation files:
my_dataset/
├── classes/
│ └── ...
└── annotations/
├── file_list.txt # All files (auto-generated if missing)
├── instances_train_list.txt # Training files
├── instances_val_list.txt # Validation files
└── instances_test_list.txt # Test files (optional)
File List Format
Each annotation file lists relative paths, one per line:
# instances_train_list.txt
class_A/sample1.csv
class_A/sample2.csv
class_B/sample1.csv
If annotations folder is missing, ModelMaker auto-generates splits using
split_factor from config.
5.1.5. Configuration
dataset:
enable: True
dataset_name: 'my_classification_data'
input_data_path: '/path/to/my_dataset' # or URL to .zip
data_dir: 'classes' # Default
annotation_dir: 'annotations' # Default (optional)
split_type: 'amongst_files'
split_factor: [0.6, 0.3, 0.1] # train, val, test
data_processing_feature_extraction:
variables: 3 # Number of data columns
5.1.6. Example: 3-Class Vibration Data
Dataset structure:
vibration_dataset/
└── classes/
├── normal/
│ ├── run1.csv
│ ├── run2.csv
│ └── run3.csv
├── fault_type_A/
│ ├── fault1.csv
│ └── fault2.csv
└── fault_type_B/
└── fault1.csv
Sample file (normal/run1.csv):
accel_x,accel_y,accel_z
0.012,0.005,-0.982
0.015,0.008,-0.979
0.010,0.003,-0.985
...
Config:
dataset:
dataset_name: 'vibration_data'
input_data_path: '/data/vibration_dataset'
data_processing_feature_extraction:
feature_extraction_name: 'Generic_256Input_FFTBIN_16Feature_8Frame'
variables: 3 # x, y, z axes
5.1.7. Class Balancing
For best results, try to have similar sample counts per class.
If classes are imbalanced:
Collect more data for minority classes
Use data augmentation (gain variation)
Adjust training parameters
data_processing_feature_extraction:
gain_variations: {fault_type_A: [0.9, 1.1]} # Augment minority class
5.1.8. Common Issues
“Dimension mismatch” error
All files must have the same number of columns. Check for:
Extra header rows
Missing columns in some files
Different delimiters
“Empty file” error
Ensure files contain actual data, not just headers.
Class not detected
Check folder names don’t contain special characters
Ensure files exist in class folders
Verify file extensions are supported