Co-Designing Neural Architectures And Hardware Accelerators For Maximum Inference Efficiency
Keywords:
Neural Architecture Search, Hardware Co-Design, FPGA Accelerator, Inference Efficiency, Differentiable NAS, Hardware-Aware AI, Edge Accelerator.Abstract
The traditional independent (sequential) design of neural networks away from target accelerator hardware can lead to significant potential for efficiency loss. Models tuned for benchmark accuracy may perform poorly in terms of latency, energy, or area utilization on accelerator hardware due to differences between the architecture of the neural network and that of the hardware. This paper introduces NeuHardCo, a framework for simultaneous neural architecture and hardware accelerator design that searches both the model architecture and the accelerator configuration spaces to identify the best architectural-hardware pair. NeuHardCo utilizes a novel dual-agent differentiable NAS, in which the parameters are jointly optimized with the hardware parameter configuration using analytical models that represent the costs of accelerator configurations derived from cycle-accurate simulations. While targeting an FPGA-based accelerator and evaluating against standard ImageNet and CIFAR-100 test sets, NeuHardCo achieved 11.6 ms inference latency at 93.9% accuracy (1.8x lower latency than handcrafted architecture-to-GPU pairs) while additionally removing 31% of the resource usage compared to executing standard NAS networks on a reference hardware accelerator.




