Scalable Sparse Model Training Via Computationally Frugal Gradient Approximation Techniques

B. Saraswati; K. Kanchana; Mahmudov Kahramon Shuhratjon  Ugli; Dr. Nishtha Sharma

doi:10.51483/IJAIML.6.4s.2026.486-491

Authors

B. Saraswati Assistant Professor, Department of Computer Science, Meenakshi College of Arts and Science, Meenakshi Academy of Higher Education and Research, Chennai, Tamil Nadu, India.
K. Kanchana Assistant Professor, Department of Commerce, Meenakshi College of Arts and Science, Meenakshi Academy of Higher Education and Research, Chennai, Tamil Nadu, India.
Mahmudov Kahramon Shuhratjon Ugli Turan International University, Namangan, Uzbekistan.
Dr. Nishtha Sharma Assistant Professor, Kalinga University, Naya Raipur, Chhattisgarh, India.

DOI:

https://doi.org/10.51483/IJAIML.6.4s.2026.486-491

Keywords:

Sparse Gradient Training, Gradient Compression, Top-K Sparsification, Error Feedback, Distributed Training, Memory Efficiency, Scalable Deep Learning.

Abstract

Training large neural networks with dense full-precision gradients requires prohibitive memory and compute resources; hence, it is not scalable on commodity or interconnected cluster hardware with narrow links. Sparse gradient methods attempt to enable scalable training by only transmitting and summing the significant information in gradients yet suffer from low accuracy at higher sparsity levels or impractically large memory requirements for error buffers. This paper proposes FruGrad, a computationally frugal gradient approximation framework enabling scalable sparse model training. FruGrad consists of three components: (i) an adaptive top-K gradient selector with momentum-corrected error feedback; (ii) a structured block-sparsity mask whose structure matches hardware memory mapping and thus enables cache-efficient aggregation; and (iii) a schedule that adapts sparsity during training by dynamically increasing gradient compression as training progresses. Extensive experiments on ResNet-50 (ImageNet), BERT-base (GLUE), and GPT-2 (WikiText-103) show that FruGrad gains up to a 2.7X speedup, reduces peak memory to 44% that of dense training, and still achieves 99.1% of baseline model accuracy with 94% gradient sparsity.

Scalable Sparse Model Training Via Computationally Frugal Gradient Approximation Techniques

Authors

DOI:

Keywords:

Abstract

Downloads

Published

How to Cite

Issue

Section

License

Make a Submission

INDEXING

Information

Keywords