5. Bring Your Own Data
This section explains how to format your datasets for use with Tiny ML Tensorlab. The toolchain supports various data formats and automatically handles preprocessing.
Contents
- 5.1. Classification Dataset Format
- 5.1.1. Directory Structure
- 5.1.2. Data File Format
- 5.1.3. Supported File Types
- 5.1.4. Annotations (Optional)
- 5.1.5. Configuration
- 5.1.6. Dataset Splitting Modes
- 5.1.7. Example: 3-Class Vibration Data
- 5.1.8. Class Balancing
- 5.1.9. Common Issues
- 5.1.10. Key Differences: Classification vs Regression vs Forecasting
- 5.2. Regression Dataset Format
- 5.3. Forecasting Dataset Format
- 5.4. Anomaly Detection Dataset Format
- 5.5. Data Splitting
5.6. Dataset Format Overview
Tiny ML Tensorlab uses different folder structures depending on the task type:
Classification Tasks
dataset_name/
├── classes/
│ ├── class1/
│ │ ├── file1.csv
│ │ └── file2.csv
│ ├── class2/
│ │ └── file1.csv
│ └── classN/
└── annotations/ # Optional - auto-generated if missing
├── instances_train_list.txt
└── instances_val_list.txt
Anomaly Detection Tasks
dataset_name/
├── classes/
│ ├── Normal/ # Training data (normal operation only)
│ │ ├── file1.csv
│ │ └── file2.csv
│ └── Anomaly/ # Test-only data (fault/anomaly samples)
│ ├── file1.csv
│ └── file2.csv
└── annotations/ # Optional - auto-generated if missing
See Anomaly Detection Dataset Format for full details.
Regression & Forecasting Tasks
dataset_name/
├── files/ # MUST be named "files"
│ ├── datafile1.csv
│ └── datafileN.csv
└── annotations/ # Required for these tasks
├── instances_train_list.txt
└── instances_val_list.txt
5.7. Supported File Formats
CSV files (
.csv) - Most common, human-readableText files (
.txt) - Same format as CSVNumPy arrays (
.npy) - Binary format, faster loadingPickle files (
.pkl) - Python serialized pandas DataFrames
5.8. Data Sources
You can provide your dataset as:
A local directory path
A local ZIP file path
A remote URL to a ZIP file (automatically downloaded)
Example:
dataset:
dataset_name: my_dataset
input_data_path: '/path/to/dataset' # or URL