Co-Designing Neural Architectures And Hardware Accelerators For Maximum Inference Efficiency

Harshini. R; Hadasha Nobel  Tune; Dawakit Lepcha; Otamirzaev Muzaffar Bakhodir  ugli

doi:10.51483/IJAIML.6.4s.2026.476-480

Authors

Harshini. R Assistant Professor, Department of Computer Science, Meenakshi College of Arts and Science, Meenakshi Academy of Higher Education and Research, Chennai, Tamil Nadu, India.
Hadasha Nobel Tune Assistant Professor, Department of Commerce, Meenakshi College of Arts and Science, Meenakshi Academy of Higher Education and Research, Chennai, Tamil Nadu, India.
Dawakit Lepcha Assistant Professor, Kalinga University, Naya Raipur, Chhattisgarh, India.
Otamirzaev Muzaffar Bakhodir ugli Turan International University, Namangan, Uzbekistan.

DOI:

https://doi.org/10.51483/IJAIML.6.4s.2026.476-480

Keywords:

Neural Architecture Search, Hardware Co-Design, FPGA Accelerator, Inference Efficiency, Differentiable NAS, Hardware-Aware AI, Edge Accelerator.

Abstract

The traditional independent (sequential) design of neural networks away from target accelerator hardware can lead to significant potential for efficiency loss. Models tuned for benchmark accuracy may perform poorly in terms of latency, energy, or area utilization on accelerator hardware due to differences between the architecture of the neural network and that of the hardware. This paper introduces NeuHardCo, a framework for simultaneous neural architecture and hardware accelerator design that searches both the model architecture and the accelerator configuration spaces to identify the best architectural-hardware pair. NeuHardCo utilizes a novel dual-agent differentiable NAS, in which the parameters are jointly optimized with the hardware parameter configuration using analytical models that represent the costs of accelerator configurations derived from cycle-accurate simulations. While targeting an FPGA-based accelerator and evaluating against standard ImageNet and CIFAR-100 test sets, NeuHardCo achieved 11.6 ms inference latency at 93.9% accuracy (1.8x lower latency than handcrafted architecture-to-GPU pairs) while additionally removing 31% of the resource usage compared to executing standard NAS networks on a reference hardware accelerator.

Co-Designing Neural Architectures And Hardware Accelerators For Maximum Inference Efficiency

Authors

DOI:

Keywords:

Abstract

Downloads

Published

How to Cite

Issue

Section

License

Make a Submission

INDEXING

Information

Keywords