Dynamic Portfolio Optimization Algorithm Using Deep Q Networks for Corporate Financial Decision Systems
Keywords:
Deep Q-Networks, Reinforcement Learning, Portfolio Optimization, Financial Decision Systems, Markov Decision Process, Sharpe Ratio Maximization, Dueling Network Architecture, Experience Replay.Abstract
This paper introduces DQN-PO, a new Deep Q-Network Portfolio Optimization framework designed specifically for corporate financial systems dealing with the challenges of high frequency, multi-asset trading environments. Standard Mean-Variance Optimization suffers from non-stationarity and transaction costs issues; DQN-PO resolves these problems by formulating the process of portfolio rebalancing as an MDP with continuous actions. The novel method uses a state space composed of 64 features representing price momentum, technical measures, macroeconomic data, and relationships between assets within the portfolio. Overfitting is prevented by implementing dual networks, prioritized experience replay, and double Q-learning strategies. In tests on 10 S&P 500 sectors over 2019-2023 (covering the COVID-19 market turmoil period), DQN-PO shows a Sharpe ratio of 2.14 and a 38.6% annual return compared to Markowitz MVO (1.43 Sharpe), risk parity (1.31 Sharpe), and baseline reinforcement learning with PPO agent (1.72 Sharpe). The maximum drawdown is controlled at −8.3% against −14.2% for Markowitz MVO. Ablation experiments show a significant contribution of dueling architecture (+11% Sharpe), prioritized experience replay (+8%), and multi-factor states (+19%).




