5.1. Classification Dataset Format

This guide explains how to format datasets for time series classification tasks.

5.1.1. Directory Structure

Classification datasets use a classes/ folder where each subfolder represents a class:

my_dataset/
└── classes/
    ├── class_A/
    │   ├── sample1.csv
    │   ├── sample2.csv
    │   └── sample3.csv
    ├── class_B/
    │   ├── sample1.csv
    │   └── sample2.csv
    └── class_C/
        └── sample1.csv

Key points:

  • Folder names become class labels

  • Each CSV file is one sample (or multiple samples if using windowing)

  • All files should have the same number of columns

5.1.2. Data File Format

Headerless Format (Simple)

Just numeric values, one measurement per row:

0.523
0.612
0.498
0.701
...

Headered Format (Recommended for Multi-Variable)

First row contains column names:

channel_x,channel_y,channel_z
0.523,0.112,-0.234
0.612,0.098,-0.198
0.498,0.145,-0.267
...

Time Column Handling

Any column containing “time” (case-insensitive) is automatically dropped:

Time,value1,value2
0.001,0.523,0.112
0.002,0.612,0.098
...

The “Time” column will be removed, leaving only value1 and value2.

5.1.3. Supported File Types

Extension

Description

.csv

Comma-separated values (most common)

.txt

Tab or space-separated text

.npy

NumPy array (binary, faster loading)

.pkl

Pickled pandas DataFrame

5.1.4. Annotations (Optional)

You can optionally provide train/val/test splits using annotation files:

my_dataset/
├── classes/
│   └── ...
└── annotations/
    ├── file_list.txt              # All files (auto-generated if missing)
    ├── instances_train_list.txt   # Training files
    ├── instances_val_list.txt     # Validation files
    └── instances_test_list.txt    # Test files (optional)

File List Format

Each annotation file lists relative paths, one per line:

# instances_train_list.txt
class_A/sample1.csv
class_A/sample2.csv
class_B/sample1.csv

If annotations folder is missing, ModelMaker auto-generates splits using split_factor from config.

5.1.5. Configuration

dataset:
  enable: True
  dataset_name: 'my_classification_data'
  input_data_path: '/path/to/my_dataset'  # or URL to .zip
  data_dir: 'classes'              # Default
  annotation_dir: 'annotations'    # Default (optional)
  split_type: 'amongst_files'
  split_factor: [0.6, 0.3, 0.1]    # train, val, test

data_processing_feature_extraction:
  variables: 3                     # Number of data columns

Note

Zip file structure requirement: When using a zip file as input_data_path, the zip must contain the classes/ directory immediately inside it (at the top level). Do not add an extra directory level such as dataset_name/classes/ inside the zip. The same applies to the optional annotations/ directory.

5.1.6. Dataset Splitting Modes

The split_type parameter controls how ModelMaker divides data into train, validation, and test sets when the annotations/ folder is not provided.

amongst_files (default)

Entire files are assigned to train, validation, or test sets. For example, with 10 files (each having 100 rows) and the default split_factor: [0.6, 0.3, 0.1]:

  • 6 files go to training (each retains all 100 rows)

  • 3 files go to validation

  • 1 file goes to test

Use this mode when each file represents a distinct experiment or recording session.

within_files

Each file is split internally into train, validation, and test portions. For example, with 10 files (each having 100 rows):

  • All 10 files appear in every split

  • Training portion: first 60 rows of each file

  • Validation portion: next 30 rows of each file

  • Test portion: last 10 rows of each file

Use this mode when files contain long continuous sequences that can be safely split at arbitrary points.

dataset:
  split_type: 'within_files'       # or 'amongst_files' (default)
  split_factor: [0.6, 0.3, 0.1]    # train, val, test proportions

5.1.7. Example: 3-Class Vibration Data

Dataset structure:

vibration_dataset/
└── classes/
    ├── normal/
    │   ├── run1.csv
    │   ├── run2.csv
    │   └── run3.csv
    ├── fault_type_A/
    │   ├── fault1.csv
    │   └── fault2.csv
    └── fault_type_B/
        └── fault1.csv

Sample file (normal/run1.csv):

accel_x,accel_y,accel_z
0.012,0.005,-0.982
0.015,0.008,-0.979
0.010,0.003,-0.985
...

Config:

dataset:
  dataset_name: 'vibration_data'
  input_data_path: '/data/vibration_dataset'

data_processing_feature_extraction:
  feature_extraction_name: 'Generic_256Input_FFTBIN_16Feature_8Frame'
  variables: 3    # x, y, z axes

5.1.8. Class Balancing

For best results, try to have similar sample counts per class.

If classes are imbalanced:

  • Collect more data for minority classes

  • Use data augmentation (gain variation)

  • Adjust training parameters

data_processing_feature_extraction:
  gain_variations: {fault_type_A: [0.9, 1.1]}  # Augment minority class

5.1.9. Common Issues

“Dimension mismatch” error

All files must have the same number of columns. Check for:

  • Extra header rows

  • Missing columns in some files

  • Different delimiters

“Empty file” error

Ensure files contain actual data, not just headers.

Class not detected

  • Check folder names don’t contain special characters

  • Ensure files exist in class folders

  • Verify file extensions are supported

5.1.10. Key Differences: Classification vs Regression vs Forecasting

The following table summarizes the key structural and behavioral differences across the three time series task types.

Aspect

Classification

Regression

Forecasting

Folder Structure

classes/{class_name}/

files/

files/

Target Location

Folder name (implicit)

Last column of each file

Specified via target_variables

Target Type

Discrete class label

Continuous value (averaged over window)

Future value at forecast_horizon steps

Annotations

Optional (auto-generated)

Required

Required

Feature Extraction

Supported (FFT, wavelets, etc.)

Supported (SimpleWindow required)

Not supported (raw time series only)

Loss Function

CrossEntropyLoss

MSELoss

HuberLoss

Evaluation Metrics

Accuracy, F1-score

MSE, R-squared

SMAPE, R-squared

output_int

Can be True

Can be True

Must be False

SimpleWindow

Optional

Mandatory

Mandatory

forecast_horizon

N/A

N/A

Required parameter