Intent Aware Policy Optimization Algorithms For Human Agent Cooperative Tasks

Authors

  • Dr. Ponmurugan Panneerselvam Professor & Dean-Doctoral Studies & IPR, Department of Research, Meenakshi Academy of Higher Education and Research, Tamil Nadu, India.
  • Dr. V.P. Nithya Associate Professor, Dept. Of CSE, Vimal Jyothi Engineering College, Kannur, Tamil Nadu, India.
  • Dr.V. Aruna Assistant Professor, Department of Management Studies, St. Joseph’s Institute of Technology, OMR, Chennai, Tamil Nadu, India.
  • B. Damodaran Associate Professor, Department of Psychology, Meenakshi College of Arts and Science, Meenakshi Academy of Higher Education and Research, Tamil Nadu, India.
  • Roohee Khan Assistant Professor, Kalinga University, Naya Raipur, Chhattisgarh, India.

Keywords:

human-agent cooperation, intent-aware reinforcement learning, policy optimization, task efficiency, cooperation efficiency, multi-agent systems, human satisfaction

Abstract

Cooperative tasks between humans and agents are becoming more common in areas like collaborative robotics, industry automation, and rescue operations. Reinforcement learning techniques usually train the agents' policies independently of human intentions, leading to unintended behavior, redundant task handling, and decreased collaboration efficiency. In this paper, we present an Intent-Aware Policy Optimization (IAPO) system, where real-time human intention prediction and adaptive multi-agent reinforcement learning work together to improve the level of cooperation and task performance. Our IAPO system is comprised of three modules, namely, the Intent Recognition Module, the Cooperative Task Scheduler, and the Policy Optimization Module. The first two modules generate task priorities according to the human intentions, while the latter optimizes the agents' policies by using the proposed intent-aware reward function. Experimental evaluation was carried out on simulated dynamic environments, where both collaborative tasks and multiple agents and humans took part. Four criteria, including the task completion rate (TCR), cooperation efficiency (CE), policy convergence time (PCT), and human satisfaction index (HSI), were used to compare our approach with the baseline approaches. The obtained results show that our framework outperforms the baselines, obtaining a TCR of 93%, CE of 88%, PCT of 120 iterations, and HSI of 4.5. The results reveal that human intention can be integrated into the policy optimization process to improve the quantitative results and qualitative cooperation between humans and agents. This is a useful framework that provides flexibility, explainability, and adaptability when it comes to use. Further research should consider applying this framework to many different scenarios related to people and robots, including learning on the Internet.

Downloads

Published

2026-05-24

How to Cite

Panneerselvam, D. P., Nithya, D. V., Aruna, D., Damodaran, B., & Khan, R. (2026). Intent Aware Policy Optimization Algorithms For Human Agent Cooperative Tasks. International Journal of Artificial Intelligence and Machine Learning, 6(3s), 75–83. Retrieved from https://svedbergopen.com/index.php/ijaiml/article/view/289