Reinforcement Learning for Trading Strategies: A Reproducible Comparison with Classical Baselines

Andy Hou

Volume 1, Issue 1

Abstract

This paper presents a hands-on study comparing modern Reinforcement Learning (RL) approaches (DQN, PPO) against classical algorithmic trading baselines (momentum, mean reversion) on financial time series (equities and crypto). Beyond algorithmic comparison, we describe a production-style backtesting and evaluation pipeline, measure execution realism (transaction costs, slippage, market impact), and provide reproducible benchmarks. All code, trained policies, and experiment logs will be provided in an anonymized artifact for review; links are omitted to preserve double-blindness. Our aim is to provide both practical SWE artifacts (scalable backtester, CI, experiment orchestration) and rigorous empirical analysis that are useful for quantitative analysis in future research.

Keywords

Reinforcement Learning, Algorithmic Trading, DQN, PPO, Momentum Trading, Mean Reversion, Backtesting, Financial Time Series, Transaction Costs, Slippage, Reproducible Research, Quantitative Finance, Crypto Trading, Equity Markets

Corresponding Author

Andy Hou, Independent Researcher, University of Washington, Paul G. Allen School of Computer Science & Engineering, USA.

Citation

Hou, A. (2026). Reinforcement Learning for Trading Strategies: A Reproducible Comparison with Classical Baselines. J Digit Assets Monet Res. 1(1), 01-30.

Available Soon