Asynchronous Weight Update Algorithms for Reduced Communication Overhead in Distributed ML
Keywords:
Distributed Machine Learning, Asynchronous Weight Update, Communication Overhead, Adaptive Scheduling, Cloud-Edge Architecture.Abstract
This research work tackles important limitations associated with traditional synchronous distributed machine learning such as excessive communication overhead, synchronization delays, and the issue of stragglers by designing an asynchronous weight update mechanism combined with adaptive communication scheduling to improve training in heterogeneous cloud-edge and IoT environments. The design involves the use of decentralized architecture in which several computing worker nodes operate in parallel processing mini-batches of data using stochastic optimization techniques. The design incorporates an asynchronous parameter synchronization technique driven by a dynamic node coefficient for controlling the rate of communication depending on network traffic, computing speed, and gradient importance. Compensation for stale gradients and gradient clipping are incorporated to maintain convergence stability. The suggested asynchronous system has been able to reduce the communication overhead from 100% to 58%, training latency from 420 ms to 265 ms, and convergence time from 48 to 31 epochs. The bandwidth consumption was reduced from 14.6 GB to 8.1 GB, and scalability efficiency increased to 93%. Additionally, the classification accuracy was improved to 96.2%, precision to 95.1%, and fault tolerance efficiency to 92%. It can be said that integrating adaptive asynchronous updates with intelligent communication control results in an efficient and scalable solution for reducing network cost without affecting prediction accuracy.




