8.1. Neural Architecture Search
Neural Architecture Search (NAS) automatically discovers optimal model architectures for your specific task and device constraints.
8.1.1. Overview
NAS eliminates manual architecture design by:
Searching through possible layer configurations
Evaluating accuracy vs size trade-offs
Optimizing for memory (parameters) or compute (MACs/FLOPs)
Generating TINPU-compatible models directly from your dataset
This is especially valuable for MCUs where architecture choices significantly impact whether a model fits in memory.
Warning
NAS is very compute-intensive. A GPU is required for practical use. Each NAS epoch can take several minutes. Choose the number of epochs wisely to avoid excessive runtimes.
8.1.2. When to Use NAS
Use NAS when:
You don’t know the optimal model size for your task
You need to balance accuracy vs inference speed
You want to find the smallest model that meets accuracy requirements
Manual architecture tuning is time-consuming
Don’t use NAS when:
You have tight time constraints (NAS is slow)
A standard model already works well
You don’t have GPU access
8.1.3. Code Flow
Config Parsing: The YAML config is parsed at runtime. NAS parameters are read from the
trainingsection.NAS Activation: If
nas_enabled: True, the NAS module is invoked instead of using a static model architecture.NAS Search: The NAS engine runs for the specified number of epochs (
nas_epochs), optimizing for memory or compute as pernas_optimization_mode.Model Selection: The best architecture found during search is selected and trained according to the rest of the training configuration.
Model Export: The trained model can be tested and optionally compiled for deployment.
8.1.4. Configuration
All NAS parameters are specified under the training section of the
YAML configuration file.
Required Parameters:
Parameter |
Type |
Description |
Example Values |
|---|---|---|---|
|
bool |
Enable or disable NAS |
|
|
int |
Number of epochs for the NAS search |
|
|
str |
Optimization target. |
|
|
str |
Preset model size that determines search space complexity |
|
Customization Parameters (Optional):
Use these instead of nas_model_size for fine-grained control over the
search space:
Parameter |
Type |
Description |
Example Values |
|---|---|---|---|
|
int |
Number of nodes (operations) per layer in the DAG |
|
|
int |
Number of layers in the architecture. Minimum is 3. |
|
|
int |
Initial feature map channels for the first conv layer |
|
|
int |
Channel multiplier for subsequent layers |
|
|
int |
Number of nodes per layer to concatenate for output |
|
Note
Only nas_enabled, nas_epochs, nas_optimization_mode, and
nas_model_size are required for preset mode. The customization
parameters are optional and allow advanced users to define the NAS
search space in detail.
8.1.5. Model Size Presets
When using NAS in preset mode, the nas_model_size parameter selects
a predefined search space configuration. Each preset controls the
complexity and size of architectures explored:
Preset |
Layers |
Nodes/Layer |
Init Channels |
Channel Multiplier |
Fanout Concat |
|---|---|---|---|---|---|
|
3 |
4 |
1 |
3 |
4 |
|
10 |
4 |
1 |
3 |
4 |
|
12 |
4 |
4 |
3 |
4 |
|
20 |
4 |
4 |
3 |
4 |
|
20 |
6 |
8 |
3 |
4 |
These values are set automatically when you specify the preset via
nas_model_size. For more control, use customization mode and set
these parameters manually.
8.1.6. Usage
Preset Mode (Recommended):
training:
nas_enabled: True
nas_epochs: 10
nas_optimization_mode: 'Memory'
nas_model_size: 'm'
Customization Mode:
training:
nas_enabled: True
nas_epochs: 20
nas_optimization_mode: 'Compute'
# Customization mode parameters
nas_nodes_per_layer: 4
nas_layers: 5
nas_init_channels: 8
nas_init_channel_multiplier: 2
nas_fanout_concat: 3
8.1.7. Running NAS
cd tinyml-modelzoo
./run_tinyml_modelzoo.sh examples/your_example/config_nas.yaml
cd tinyml-modelzoo
run_tinyml_modelzoo.bat examples\\your_example\\config_nas.yaml
8.1.8. Example: Full NAS Configuration
common:
task_type: 'generic_timeseries_classification'
target_device: 'F28P55'
dataset:
dataset_name: 'dc_arc_fault_example_dsk'
data_processing_feature_extraction:
feature_extraction_name: 'FFT1024Input_256Feature_1Frame_Full_Bandwidth'
variables: 1
training:
enable: True
training_epochs: 15
batch_size: 256
nas_enabled: True
nas_epochs: 10
nas_optimization_mode: 'Memory'
nas_model_size: 'm'
compilation:
enable: True
8.1.9. Tips
Preset mode is recommended for most users and provides a good balance between search space and ease of use.
Customization mode is for advanced users who want fine-grained control over the architecture search space.
Increasing
nas_epochscan improve search results but increases runtime.Choose
nas_optimization_modebased on deployment constraints: use'Memory'for devices with limited flash/RAM,'Compute'for latency-sensitive applications.All NAS parameters can be adjusted in the YAML config without modifying code.
All other training parameters (batch size, learning rate, etc.) are compatible with NAS.
8.1.10. Best Practices
Start Simple: Try standard models first, use NAS only if needed
Use Preset Mode: Start with
's'or'm'before trying larger presetsGPU Required: NAS without a GPU is impractical
Validate Results: Test the NAS-discovered model thoroughly before deployment
Compare with Standard Models: NAS results may not always beat hand-designed models
8.1.11. Search Algorithm
The NAS implementation uses a Differentiable Architecture Search (DARTS)
approach based on gradient-based optimization. Instead of evaluating
architectures individually, DARTS relaxes the discrete architecture choices
into continuous parameters (called alphas) that are optimized jointly
with model weights.
How DARTS works:
Architecture Parameterization: Each edge in the search DAG has a set of
alphaparameters, one per candidate operation. A softmax over these alphas determines the mixture of operations.Bilevel Optimization: Model weights and architecture parameters are optimized alternately — weights on training data, alphas on validation data.
Genotype Extraction: After search, the highest-alpha operation for each edge is selected, producing a discrete architecture (genotype).
Final Training: The selected architecture is retrained from scratch using standard training parameters.
Unrolled Optimization (advanced): Setting unrolled=True in the NAS
engine uses second-order gradient approximation for architecture parameters,
which can improve search quality at the cost of increased computation and
memory.
Resource-Aware Penalties: The search includes penalties for model
complexity. nas_optimization_mode: 'Memory' penalizes parameter count
(flash/RAM usage on MCU), while 'Compute' penalizes MACs/FLOPs
(inference latency).
8.1.12. Search Space
The NAS search space defines the candidate operations available at each
edge in the architecture DAG. The default CNN search space
(PRIMITIVES_CNN) includes:
PRIMITIVES_CNN = [
'none', # Zero operation (drop connection)
'avg_pool_3x1', # 3x1 average pooling
'max_pool_3x1', # 3x1 max pooling
'skip_connect', # Identity/skip connection
'conv_bn_relu_3x1', # 3x1 convolution + BatchNorm + ReLU
'conv_bn_relu_5x1', # 5x1 convolution + BatchNorm + ReLU
'conv_bn_relu_7x1', # 7x1 convolution + BatchNorm + ReLU
]
These 1D convolution primitives are designed for time series data processed by TI MCUs. The search finds the best combination of these operations for each layer of the network.
Architecture Structure:
Each NAS-discovered architecture consists of:
Normal cells: Preserve spatial dimensions, repeated throughout the network
Reduction cells: Downsample spatial dimensions between stages
Genotype: A named tuple describing the selected operations and their connections for both cell types
A genotype specifies, for each node in a cell, which operation to apply
and which previous node to use as input. The outputs of selected nodes
are concatenated to form the cell’s output (controlled by
nas_fanout_concat).
8.1.13. NAS Framework Internals
For advanced users and developers, the NAS module is organized as follows:
tinyml_torchmodelopt/nas/
├── architect.py # Bilevel optimization (Architect class)
├── genotypes.py # Search space primitives and genotype defs
├── model.py # Final model (fixed genotype)
├── model_search_cnn.py # Search-phase model (with alpha params)
├── operations.py # Primitive operation implementations
├── train_cnn_search.py # NAS search training loop
└── utils.py # Metrics, parameter counting, checkpointing
Key Components:
Architect (
architect.py): Manages bilevel optimization of architecture parameters. Supports standard and unrolled optimization with resource-aware penalties.Search Model (
model_search_cnn.py): The supernet with differentiable architecture parameters (alphas). Supports parsing of learned architecture into a discrete genotype.Final Model (
model.py): Instantiated with a fixed genotype for evaluation and deployment.Operations (
operations.py): Implements all primitive operations (convolutions, pooling, skip connections) used in the search space.
Direct API Usage (advanced):
from tinyml_torchmodelopt.nas.train_cnn_search import search_and_get_model
# Run NAS search and get the best model
final_model = search_and_get_model(args)
# Save for deployment
torch.save(final_model.state_dict(), 'nas_model.pth')
8.1.14. References
The NAS implementation is based on the following research:
Liu, H., Simonyan, K., & Yang, Y. (2019). DARTS: Differentiable Architecture Search. ICLR 2019.
Ye, P., et al. (2022). β-DARTS: Beta-Decay Regularization for Differentiable Architecture Search.
Bender, G., Liu, H., Chen, B., Chu, G., Cheng, S., Kindermans, P.-J., & Le, Q. V. (2020). Balanced One-shot Neural Architecture Optimization.
8.1.15. Next Steps
Learn about Quantization for model compression
Explore Feature Extraction options
Deploy your model: NPU Device Deployment