Dynamic Portfolio Optimization Algorithm Using Deep Q Networks for Corporate Financial Decision Systems

Authors

  • Dr.M. Sangeetha Professor, Department of Information Technology, K.S.Rangasamy College of Technology, Tiruchengode, India.
  • Dr. Sapna Bawankar Assistant Professor, Kalinga University, Naya Raipur, Chhattisgarh, India.
  • Abdumutalliev Abdulakhad Abdusamad Ugli Turan International University, Namangan, Uzbekistan.
  • Maksuda Dusmetova Senior Teacher, Urgench State Pedagogical Institute Urgench, Uzbekistan.
  • Nigora Uralova PhD, Senior Lecturer, Department of Romance-Germanic Languages, Jizzakh State Pedagogical University, Jizzakh, Uzbekistan.
  • Kattakul Kinjaev Lecturer, Department of finance and tourism, Termez University of Economics and Service, Termez, Uzbekistan.

Keywords:

Deep Q-Networks, Reinforcement Learning, Portfolio Optimization, Financial Decision Systems, Markov Decision Process, Sharpe Ratio Maximization, Dueling Network Architecture, Experience Replay.

Abstract

This paper introduces DQN-PO, a new Deep Q-Network Portfolio Optimization framework designed specifically for corporate financial systems dealing with the challenges of high frequency, multi-asset trading environments. Standard Mean-Variance Optimization suffers from non-stationarity and transaction costs issues; DQN-PO resolves these problems by formulating the process of portfolio rebalancing as an MDP with continuous actions. The novel method uses a state space composed of 64 features representing price momentum, technical measures, macroeconomic data, and relationships between assets within the portfolio. Overfitting is prevented by implementing dual networks, prioritized experience replay, and double Q-learning strategies. In tests on 10 S&P 500 sectors over 2019-2023 (covering the COVID-19 market turmoil period), DQN-PO shows a Sharpe ratio of 2.14 and a 38.6% annual return compared to Markowitz MVO (1.43 Sharpe), risk parity (1.31 Sharpe), and baseline reinforcement learning with PPO agent (1.72 Sharpe). The maximum drawdown is controlled at −8.3% against −14.2% for Markowitz MVO. Ablation experiments show a significant contribution of dueling architecture (+11% Sharpe), prioritized experience replay (+8%), and multi-factor states (+19%).

Downloads

Published

2026-05-12

How to Cite

Sangeetha, D., Bawankar, D. S., Ugli, A. A. A., Dusmetova, M., Uralova, N., & Kinjaev, K. (2026). Dynamic Portfolio Optimization Algorithm Using Deep Q Networks for Corporate Financial Decision Systems. International Journal of Artificial Intelligence and Machine Learning, 6(2s), 261–268. Retrieved from https://svedbergopen.com/index.php/ijaiml/article/view/203

Similar Articles

<< < 3 4 5 6 7 8 9 10 11 12 > >> 

You may also start an advanced similarity search for this article.