Energy Aware Neural Pruning Algorithms For Sustainable Large Scale Model Deployment
Keywords:
Green AI, Model Compression, Structural Pruning, Sustainable Deep Learning, Hardware Efficiency, Neural Architecture Optimization.Abstract
The recent explosion in the size of deep learning models has posed new computational and economic difficulties, mainly due to their large carbon footprint and energy expenditure during inference on the edge and in the cloud. This work solves the above challenges by proposing a framework that leverages energy-efficient neural pruning methods that aim at accelerating sustainable learning models. The problem addressed is the non-linear connection between the number of parameters and the power consumption of architectures, in which conventional magnitude pruning approaches fall short in considering the energy cost incurred by the architecture. To solve the above problem, the solution considers energy consumption per layer as part of the selection process. The experimental findings reveal that the energy-prioritized structural pruning technique results in a reduction of the total power usage of 42.6% in standard transformers and large-scale convolutional neural networks without compromising an accuracy benchmark of 98.4%. Furthermore, it is statistically validated that in comparison to the traditional magnitude-based pruning technique, the new paradigm ensures that there is a reduction in hardware memory constraints by 31.5% and inference latency by 24.8%. In conclusion, the research findings validate the fact that focusing on hardware energy parameters in structural pruning will drive the AI deployment paradigm towards a green engineering standard suitable for embedded IoT and high-performance server clusters.




