5. Bring Your Own Data
This section explains how to format your datasets for use with Tiny ML Tensorlab. The toolchain supports various data formats and automatically handles preprocessing.
Contents
5.5. Dataset Format Overview
Tiny ML Tensorlab uses different folder structures depending on the task type:
Classification Tasks
dataset_name/
├── classes/
│ ├── class1/
│ │ ├── file1.csv
│ │ └── file2.csv
│ ├── class2/
│ │ └── file1.csv
│ └── classN/
└── annotations/ # Optional - auto-generated if missing
├── instances_train_list.txt
└── instances_val_list.txt
Regression & Forecasting Tasks
dataset_name/
├── files/ # MUST be named "files"
│ ├── datafile1.csv
│ └── datafileN.csv
└── annotations/ # Required for these tasks
├── instances_train_list.txt
└── instances_val_list.txt
5.6. Supported File Formats
CSV files (
.csv) - Most common, human-readableText files (
.txt) - Same format as CSVNumPy arrays (
.npy) - Binary format, faster loadingPickle files (
.pkl) - Python serialized pandas DataFrames
5.7. Data Sources
You can provide your dataset as:
A local directory path
A local ZIP file path
A remote URL to a ZIP file (automatically downloaded)
Example:
dataset:
dataset_name: my_dataset
input_data_path: '/path/to/dataset' # or URL