5. Bring Your Own Data

This section explains how to format your datasets for use with Tiny ML Tensorlab. The toolchain supports various data formats and automatically handles preprocessing.

Contents

5.6. Dataset Format Overview

Tiny ML Tensorlab uses different folder structures depending on the task type:

Classification Tasks

dataset_name/
├── classes/
│   ├── class1/
│   │   ├── file1.csv
│   │   └── file2.csv
│   ├── class2/
│   │   └── file1.csv
│   └── classN/
└── annotations/          # Optional - auto-generated if missing
    ├── instances_train_list.txt
    └── instances_val_list.txt

Anomaly Detection Tasks

dataset_name/
├── classes/
│   ├── Normal/           # Training data (normal operation only)
│   │   ├── file1.csv
│   │   └── file2.csv
│   └── Anomaly/          # Test-only data (fault/anomaly samples)
│       ├── file1.csv
│       └── file2.csv
└── annotations/          # Optional - auto-generated if missing

See Anomaly Detection Dataset Format for full details.

Regression & Forecasting Tasks

dataset_name/
├── files/               # MUST be named "files"
│   ├── datafile1.csv
│   └── datafileN.csv
└── annotations/         # Required for these tasks
    ├── instances_train_list.txt
    └── instances_val_list.txt

5.7. Supported File Formats

CSV files (.csv) - Most common, human-readable
Text files (.txt) - Same format as CSV
NumPy arrays (.npy) - Binary format, faster loading
Pickle files (.pkl) - Python serialized pandas DataFrames

5.8. Data Sources

You can provide your dataset as:

A local directory path
A local ZIP file path
A remote URL to a ZIP file (automatically downloaded)

Example:

dataset:
  dataset_name: my_dataset
  input_data_path: '/path/to/dataset'  # or URL