Efficient Model Compression for Resource-Constrained AI Systems Using Complexity-Driven Pruning
Keywords:
Model Compression, Network Pruning, AI, IoT, Resource-Constrained Systems, Efficient Deep Learning.Abstract
The rapid advancement of edge AI has enabled intelligent Internet of Things (IoT) applications such as smart healthcare, industrial automation, and agriculture. However, deploying deep learning (DL) models on resource-constrained devices is challenging due to high computation, memory, and energy demands. To address this, the research uses a dataset of 6784 layer-level records capturing computational cost, memory usage, and operational impact, and proposes a complexity-driven pruning-based model compression framework for efficient deployment. Unlike traditional approaches that rely on iterative training, pruning, and retraining cycles, the proposed Kookaburra Optimized Stacked Long Short-Term Memory (KO-StackedLSTM) Network performs layer-wise complexity analysis to selectively remove less significant filters, reducing computational overhead without expensive fine-tuning. The KO-StackedLSTM uses a bio-inspired Kookaburra optimization to remove redundant parameters and employs a stacked LSTM structure to improve temporal learning with efficient, accurate inference. To further enhance performance, Min–Max normalization is applied for improved data scaling and convergence, while PCA reduces input dimensionality, preserving essential features and minimizing processing cost. Additionally, the research introduces three adaptive compression modes: FLOPs-aware (FA), parameter-aware (PA), and memory-aware (MA) to enable flexible optimization based on specific resource constraints. It also presents a trade-off analysis between resource use and performance, offering practical insights for real-world deployment. The model achieves a high accuracy of 96.38% with an FLOPs by 83.40% using Python, demonstrating its effectiveness for efficient AI deployment in resource-constrained environments. Overall, the research provides a scalable and efficient solution for real-time inference under limited resources.s




