5. Bring Your Own Data

This section explains how to format your datasets for use with Tiny ML Tensorlab. The toolchain supports various data formats and automatically handles preprocessing.

5.5. Dataset Format Overview

Tiny ML Tensorlab uses different folder structures depending on the task type:

Classification Tasks

dataset_name/
├── classes/
│   ├── class1/
│   │   ├── file1.csv
│   │   └── file2.csv
│   ├── class2/
│   │   └── file1.csv
│   └── classN/
└── annotations/          # Optional - auto-generated if missing
    ├── instances_train_list.txt
    └── instances_val_list.txt

Regression & Forecasting Tasks

dataset_name/
├── files/               # MUST be named "files"
│   ├── datafile1.csv
│   └── datafileN.csv
└── annotations/         # Required for these tasks
    ├── instances_train_list.txt
    └── instances_val_list.txt

5.6. Supported File Formats

  • CSV files (.csv) - Most common, human-readable

  • Text files (.txt) - Same format as CSV

  • NumPy arrays (.npy) - Binary format, faster loading

  • Pickle files (.pkl) - Python serialized pandas DataFrames

5.7. Data Sources

You can provide your dataset as:

  • A local directory path

  • A local ZIP file path

  • A remote URL to a ZIP file (automatically downloaded)

Example:

dataset:
  dataset_name: my_dataset
  input_data_path: '/path/to/dataset'  # or URL