Scalable Sparse Model Training Via Computationally Frugal Gradient Approximation Techniques

Authors

  • B. Saraswati Assistant Professor, Department of Computer Science, Meenakshi College of Arts and Science, Meenakshi Academy of Higher Education and Research, Chennai, Tamil Nadu, India.
  • K. Kanchana Assistant Professor, Department of Commerce, Meenakshi College of Arts and Science, Meenakshi Academy of Higher Education and Research, Chennai, Tamil Nadu, India.
  • Mahmudov Kahramon Shuhratjon Ugli Turan International University, Namangan, Uzbekistan.
  • Dr. Nishtha Sharma Assistant Professor, Kalinga University, Naya Raipur, Chhattisgarh, India.

Keywords:

Sparse Gradient Training, Gradient Compression, Top-K Sparsification, Error Feedback, Distributed Training, Memory Efficiency, Scalable Deep Learning.

Abstract

Training large neural networks with dense full-precision gradients requires prohibitive memory and compute resources; hence, it is not scalable on commodity or interconnected cluster hardware with narrow links. Sparse gradient methods attempt to enable scalable training by only transmitting and summing the significant information in gradients yet suffer from low accuracy at higher sparsity levels or impractically large memory requirements for error buffers. This paper proposes FruGrad, a computationally frugal gradient approximation framework enabling scalable sparse model training. FruGrad consists of three components: (i) an adaptive top-K gradient selector with momentum-corrected error feedback; (ii) a structured block-sparsity mask whose structure matches hardware memory mapping and thus enables cache-efficient aggregation; and (iii) a schedule that adapts sparsity during training by dynamically increasing gradient compression as training progresses. Extensive experiments on ResNet-50 (ImageNet), BERT-base (GLUE), and GPT-2 (WikiText-103) show that FruGrad gains up to a 2.7X speedup, reduces peak memory to 44% that of dense training, and still achieves 99.1% of baseline model accuracy with 94% gradient sparsity.

Downloads

Published

2026-06-01

How to Cite

Saraswati, B., Kanchana, K., Ugli, M. K. S., & Sharma, D. N. (2026). Scalable Sparse Model Training Via Computationally Frugal Gradient Approximation Techniques. International Journal of Artificial Intelligence and Machine Learning, 6(4s), 486–491. Retrieved from https://svedbergopen.com/index.php/ijaiml/article/view/479