8.1. Neural Architecture Search

Neural Architecture Search (NAS) automatically discovers optimal model architectures for your specific task and device constraints.

8.1.1. Overview

NAS eliminates manual architecture design by:

Searching through possible layer configurations
Evaluating accuracy vs size trade-offs
Optimizing for memory (parameters) or compute (MACs/FLOPs)
Generating TINPU-compatible models directly from your dataset

This is especially valuable for MCUs where architecture choices significantly impact whether a model fits in memory.

Warning

NAS is very compute-intensive. A GPU is required for practical use. Each NAS epoch can take several minutes. Choose the number of epochs wisely to avoid excessive runtimes.

8.1.2. When to Use NAS

Use NAS when:

You don’t know the optimal model size for your task
You need to balance accuracy vs inference speed
You want to find the smallest model that meets accuracy requirements
Manual architecture tuning is time-consuming

Don’t use NAS when:

You have tight time constraints (NAS is slow)
A standard model already works well
You don’t have GPU access

8.1.3. Code Flow

Config Parsing: The YAML config is parsed at runtime. NAS parameters are read from the training section.
NAS Activation: If nas_enabled: True, the NAS module is invoked instead of using a static model architecture.
NAS Search: The NAS engine runs for the specified number of epochs (nas_epochs), optimizing for memory or compute as per nas_optimization_mode.
Model Selection: The best architecture found during search is selected and trained according to the rest of the training configuration.
Model Export: The trained model can be tested and optionally compiled for deployment.

8.1.4. Configuration

All NAS parameters are specified under the training section of the YAML configuration file.

Required Parameters:

Parameter	Type	Description	Example Values
`nas_enabled`	bool	Enable or disable NAS	`True`, `False`
`nas_epochs`	int	Number of epochs for the NAS search	`10`, `20`
`nas_optimization_mode`	str	Optimization target. `'Memory'` optimizes for parameters (read-only data on MCU). `'Compute'` optimizes for MACs/FLOPs (peak SRAM usage).	`'Memory'`, `'Compute'`
`nas_model_size`	str	Preset model size that determines search space complexity	`'s'`, `'m'`, `'l'`, `'xl'`, `'xxl'`

Customization Parameters (Optional):

Use these instead of nas_model_size for fine-grained control over the search space:

Parameter	Type	Description	Example Values
`nas_nodes_per_layer`	int	Number of nodes (operations) per layer in the DAG	`4`
`nas_layers`	int	Number of layers in the architecture. Minimum is 3.	`3`, `5`
`nas_init_channels`	int	Initial feature map channels for the first conv layer	`1`, `8`
`nas_init_channel_multiplier`	int	Channel multiplier for subsequent layers	`3`
`nas_fanout_concat`	int	Number of nodes per layer to concatenate for output	`4`

Note

Only nas_enabled, nas_epochs, nas_optimization_mode, and nas_model_size are required for preset mode. The customization parameters are optional and allow advanced users to define the NAS search space in detail.

8.1.5. Model Size Presets

When using NAS in preset mode, the nas_model_size parameter selects a predefined search space configuration. Each preset controls the complexity and size of architectures explored:

Preset	Layers	Nodes/Layer	Init Channels	Channel Multiplier	Fanout Concat
`s`	3	4	1	3	4
`m`	10	4	1	3	4
`l`	12	4	4	3	4
`xl`	20	4	4	3	4
`xxl`	20	6	8	3	4

These values are set automatically when you specify the preset via nas_model_size. For more control, use customization mode and set these parameters manually.

8.1.6. Usage

Preset Mode (Recommended):

training:
  nas_enabled: True
  nas_epochs: 10
  nas_optimization_mode: 'Memory'
  nas_model_size: 'm'

Customization Mode:

training:
  nas_enabled: True
  nas_epochs: 20
  nas_optimization_mode: 'Compute'
  # Customization mode parameters
  nas_nodes_per_layer: 4
  nas_layers: 5
  nas_init_channels: 8
  nas_init_channel_multiplier: 2
  nas_fanout_concat: 3

8.1.7. Running NAS

cd tinyml-modelzoo
./run_tinyml_modelzoo.sh examples/your_example/config_nas.yaml

cd tinyml-modelzoo
run_tinyml_modelzoo.bat examples\\your_example\\config_nas.yaml

8.1.8. Example: Full NAS Configuration

common:
  task_type: 'generic_timeseries_classification'
  target_device: 'F28P55'

dataset:
  dataset_name: 'dc_arc_fault_example_dsk'

data_processing_feature_extraction:
  feature_extraction_name: 'FFT1024Input_256Feature_1Frame_Full_Bandwidth'
  variables: 1

training:
  enable: True
  training_epochs: 15
  batch_size: 256
  nas_enabled: True
  nas_epochs: 10
  nas_optimization_mode: 'Memory'
  nas_model_size: 'm'

compilation:
  enable: True

8.1.9. Tips

Preset mode is recommended for most users and provides a good balance between search space and ease of use.
Customization mode is for advanced users who want fine-grained control over the architecture search space.
Increasing nas_epochs can improve search results but increases runtime.
Choose nas_optimization_mode based on deployment constraints: use 'Memory' for devices with limited flash/RAM, 'Compute' for latency-sensitive applications.
All NAS parameters can be adjusted in the YAML config without modifying code.
All other training parameters (batch size, learning rate, etc.) are compatible with NAS.

8.1.10. Best Practices

Start Simple: Try standard models first, use NAS only if needed
Use Preset Mode: Start with 's' or 'm' before trying larger presets
GPU Required: NAS without a GPU is impractical
Validate Results: Test the NAS-discovered model thoroughly before deployment
Compare with Standard Models: NAS results may not always beat hand-designed models

8.1.11. Search Algorithm

The NAS implementation uses a Differentiable Architecture Search (DARTS) approach based on gradient-based optimization. Instead of evaluating architectures individually, DARTS relaxes the discrete architecture choices into continuous parameters (called alphas) that are optimized jointly with model weights.

How DARTS works:

Architecture Parameterization: Each edge in the search DAG has a set of alpha parameters, one per candidate operation. A softmax over these alphas determines the mixture of operations.
Bilevel Optimization: Model weights and architecture parameters are optimized alternately — weights on training data, alphas on validation data.
Genotype Extraction: After search, the highest-alpha operation for each edge is selected, producing a discrete architecture (genotype).
Final Training: The selected architecture is retrained from scratch using standard training parameters.

Unrolled Optimization (advanced): Setting unrolled=True in the NAS engine uses second-order gradient approximation for architecture parameters, which can improve search quality at the cost of increased computation and memory.

Resource-Aware Penalties: The search includes penalties for model complexity. nas_optimization_mode: 'Memory' penalizes parameter count (flash/RAM usage on MCU), while 'Compute' penalizes MACs/FLOPs (inference latency).

8.1.12. Search Space

The NAS search space defines the candidate operations available at each edge in the architecture DAG. The default CNN search space (PRIMITIVES_CNN) includes:

PRIMITIVES_CNN = [
    'none',              # Zero operation (drop connection)
    'avg_pool_3x1',      # 3x1 average pooling
    'max_pool_3x1',      # 3x1 max pooling
    'skip_connect',      # Identity/skip connection
    'conv_bn_relu_3x1',  # 3x1 convolution + BatchNorm + ReLU
    'conv_bn_relu_5x1',  # 5x1 convolution + BatchNorm + ReLU
    'conv_bn_relu_7x1',  # 7x1 convolution + BatchNorm + ReLU
]

These 1D convolution primitives are designed for time series data processed by TI MCUs. The search finds the best combination of these operations for each layer of the network.

Architecture Structure:

Each NAS-discovered architecture consists of:

Normal cells: Preserve spatial dimensions, repeated throughout the network
Reduction cells: Downsample spatial dimensions between stages
Genotype: A named tuple describing the selected operations and their connections for both cell types

A genotype specifies, for each node in a cell, which operation to apply and which previous node to use as input. The outputs of selected nodes are concatenated to form the cell’s output (controlled by nas_fanout_concat).

8.1.13. NAS Framework Internals

For advanced users and developers, the NAS module is organized as follows:

tinyml_torchmodelopt/nas/
    ├── architect.py         # Bilevel optimization (Architect class)
    ├── genotypes.py         # Search space primitives and genotype defs
    ├── model.py             # Final model (fixed genotype)
    ├── model_search_cnn.py  # Search-phase model (with alpha params)
    ├── operations.py        # Primitive operation implementations
    ├── train_cnn_search.py  # NAS search training loop
    └── utils.py             # Metrics, parameter counting, checkpointing

Key Components:

Architect (architect.py): Manages bilevel optimization of architecture parameters. Supports standard and unrolled optimization with resource-aware penalties.
Search Model (model_search_cnn.py): The supernet with differentiable architecture parameters (alphas). Supports parsing of learned architecture into a discrete genotype.
Final Model (model.py): Instantiated with a fixed genotype for evaluation and deployment.
Operations (operations.py): Implements all primitive operations (convolutions, pooling, skip connections) used in the search space.

Direct API Usage (advanced):

from tinyml_torchmodelopt.nas.train_cnn_search import search_and_get_model

# Run NAS search and get the best model
final_model = search_and_get_model(args)

# Save for deployment
torch.save(final_model.state_dict(), 'nas_model.pth')

8.1.14. References

The NAS implementation is based on the following research:

Liu, H., Simonyan, K., & Yang, Y. (2019). DARTS: Differentiable Architecture Search. ICLR 2019.
Ye, P., et al. (2022). β-DARTS: Beta-Decay Regularization for Differentiable Architecture Search.
Bender, G., Liu, H., Chen, B., Chu, G., Cheng, S., Kindermans, P.-J., & Le, Q. V. (2020). Balanced One-shot Neural Architecture Optimization.

8.1.15. Next Steps

Learn about Quantization for model compression
Explore Feature Extraction options
Deploy your model: NPU Device Deployment