5.3. Forecasting Dataset Format

This guide explains how to format datasets for time series forecasting tasks.

5.3.1. Directory Structure

Forecasting uses the same structure as regression:

my_dataset/
├── files/                          # MUST be named "files"
│   ├── sequence1.csv
│   ├── sequence2.csv
│   └── sequenceN.csv
└── annotations/                    # Required
    ├── instances_train_list.txt
    └── instances_val_list.txt

5.3.2. Data File Format

All variables (features) are in columns. You specify which to use as inputs and which to forecast via configuration.

Example: Temperature Forecasting

ambient,coolant,current,pm_temp
19.85,18.81,2.28,22.93
19.85,18.79,2.28,22.94
19.85,18.79,2.28,22.94
19.85,18.77,2.28,22.94
...

In this example:

  • Column 0 (ambient): Input feature

  • Column 1 (coolant): Input feature

  • Column 2 (current): Input feature

  • Column 3 (pm_temp): Target to forecast

5.3.3. Key Difference from Regression

Note

In regression, the target is the current value of the last column, averaged across the window. In forecasting, the target is a future value of the specified variable at forecast_horizon steps ahead. This means regression answers “what is the target value right now?” while forecasting answers “what will this variable be in the future?”

Regression: Target is a separate value for each window (in last column)

Forecasting: Target is a future value of an existing variable

Regression: Input [t0...t3] → Predict separate_target_avg
Forecasting: Input [t0, t1, t2] → Predict variable[t3]

Unlike regression where the target column is always the last column, forecasting requires you to explicitly specify which variable(s) to predict via the target_variables configuration parameter.

5.3.4. Configuration

dataset:
  enable: True
  dataset_name: 'my_forecast_data'
  input_data_path: '/path/to/my_dataset'

data_processing_feature_extraction:
  data_proc_transforms: ['SimpleWindow']
  frame_size: 3                    # Lookback (use 3 past values)
  forecast_horizon: 1              # Predict 1 step ahead
  stride_size: 0.4

  # Specify columns by index or name
  variables: [0, 3]                # Use columns 0 and 3 as inputs
  target_variables: [3]            # Forecast column 3

training:
  model_name: 'FCST_LSTM8'
  output_int: False                # Required for forecasting!

5.3.5. Variable Specification Options

Note

target_variables can be specified by column index (0-based, after any time column is removed) or by column name (if the CSV has a header row). You can also mix indices and names, though using one form consistently is recommended. When using indices, remember that any column containing “time” is dropped first, so indices refer to the columns after that removal.

By Column Index (0-based, after time column removal):

variables: [0, 3]           # Use columns 0 and 3
target_variables: [3]       # Forecast column 3

By Column Name (requires CSV header):

variables: ['ambient', 'pm_temp']
target_variables: ['pm_temp']

Multiple Targets (forecast several variables):

variables: [0, 1, 2, 3]
target_variables: [2, 3]     # Forecast columns 2 and 3

5.3.6. Windowing Behavior

With frame_size=3 and forecast_horizon=1:

Data: [v0, v1, v2, v3, v4, v5, v6, ...]

Window 1: Input [v0, v1, v2] → Output [v3]
Window 2: Input [v1, v2, v3] → Output [v4]
Window 3: Input [v2, v3, v4] → Output [v5]
...

5.3.7. Complete Example

Dataset structure:

pmsm_temp_forecast/
├── files/
│   ├── profile_10.csv
│   ├── profile_11.csv
│   └── profile_12.csv
└── annotations/
    ├── instances_train_list.txt
    └── instances_val_list.txt

profile_10.csv:

ambient,coolant,u_d,u_q,i_a,pm
19.850,18.815,1.499,0.032,2.281,22.936
19.850,18.793,1.542,-0.092,2.281,22.941
19.850,18.790,1.456,0.081,2.281,22.944
...

config.yaml:

common:
  task_type: 'generic_timeseries_forecasting'
  target_device: 'F28P55'

dataset:
  dataset_name: 'pmsm_temp'
  input_data_path: '/data/pmsm_temp_forecast'

data_processing_feature_extraction:
  data_proc_transforms: ['SimpleWindow']
  frame_size: 3
  forecast_horizon: 1
  stride_size: 0.4
  variables: ['ambient', 'pm']      # Use ambient and pm as inputs
  target_variables: ['pm']           # Forecast pm temperature

training:
  model_name: 'FCST_LSTM8'
  output_int: False

5.3.8. Important Notes

Warning

  • output_int must be False for forecasting

  • Feature extraction (FFT, wavelets) is not supported

  • The target variable should typically be included in input variables

5.3.9. Minimum Data Requirements

Each file must have at least:

frame_size + forecast_horizon

rows to generate at least one training sample.

5.3.10. Common Issues

“Insufficient sequence length” error

Files need at least frame_size + forecast_horizon rows.

Poor forecasting performance

  • Increase frame_size to capture more history

  • Try LSTM models for complex temporal patterns

  • Ensure sufficient training data