Inference-Time Energy Minimization Through Learnable Numerical Precision In Activation Computation

Authors

  • N. Nivetha Assistant Professor, Department of Computer Science, Meenakshi College of Arts and Science, Meenakshi Academy of Higher Education and Research, Tamil Nadu, India.
  • R. Manjula Assistant Professor, Department of Commerce, Meenakshi College of Arts and Science, Meenakshi Academy of Higher Education and Research, Tamil Nadu, India.
  • Sayfiddinova Muniskhon Fakhriddin kizi Turan International University, Namangan, Uzbekistan.
  • Dr. Abhishek Sharma Assistant Professor, Kalinga University, Naya Raipur, Chhattisgarh, India.

Keywords:

Learnable Precision, Activation Quantization, Inference Energy, Per-Channel Precision, Binary Gating, Energy-Efficient Inference, Mixed-Precision Neural Networks.

Abstract

The total energy spent on inference for a neural network is comprised mainly of arithmetic operations performed on activation tensors, which are non-linearly dependent on the numerical precision employed. Fixed precision quantization makes use of fixed-bit widths for activation operations, thus ignoring the varying demands on precision (spatially and channel-wise) within a single layer. In this work, propose LearnPrec, a framework for minimizing inference-time energy using learnable precision for activations.  introduce the per-activation channel precision selector, a small binary network, which, together with end-to-end learning with a combined accuracy energy objective, decides upon the use of 8-bit or 4-bit computation per activation channel, independent of all others. The precision selector operates during inference time and makes a binary decision per activation channel for each input batch of data in a fashion that allows fine-grained energy saving while leaving the model weights unchanged. On MobileNetV3, EfficientNet-B2 and DeiT-Small using ImageNet-1K, CIFAR-100 and Oxford Pets datasets, LearnPrec manages to cut inference energy to 19% of FP32 baseline while preserving 93.5% accuracy (vs. INT8 48% energy 93.1% accuracy and INT4 31% energy 91.4% accuracy fixed precision baseline).

Downloads

Published

2026-06-01

How to Cite

Nivetha, N., Manjula, R., Fakhriddin kizi, S. M., & Sharma, D. A. (2026). Inference-Time Energy Minimization Through Learnable Numerical Precision In Activation Computation . International Journal of Artificial Intelligence and Machine Learning, 6(4s), 505–509. Retrieved from https://svedbergopen.com/index.php/ijaiml/article/view/482