Researchers from the Delft University of Technology and the University of Edinburgh have conducted a comprehensive benchmark study on deep reinforcement learning (RL) algorithms for the container stowage planning problem (CSPP). This research, led by Yunqi Huang and colleagues, aims to address the critical role of CSPP in maritime transportation and terminal operations, where efficiency directly impacts the broader supply chain.
The study highlights the complexity of CSPP, which has traditionally relied heavily on human expertise. While reinforcement learning has emerged as a promising approach to tackle this problem, systematic comparisons across different algorithms have been limited. To bridge this gap, the researchers developed a Gym environment that captures the fundamental features of CSPP. They extended this environment to include crane scheduling, exploring both multi-agent and single-agent formulations.
Within this framework, the team evaluated five prominent RL algorithms: DQN, QR-DQN, A2C, PPO, and TRPO. The evaluation was conducted under multiple scenarios of varying complexity. The results revealed significant performance gaps among the algorithms as the complexity of the scenarios increased. This underscores the importance of carefully selecting the appropriate algorithm and problem formulation for CSPP.
The study not only benchmarks multiple RL methods for CSPP but also provides a reusable Gym environment that includes crane scheduling. This offers a solid foundation for future research and practical deployment in maritime logistics. By systematically comparing different RL algorithms, the researchers aim to guide practitioners and academics in choosing the most effective approaches for optimizing container stowage planning, ultimately enhancing the efficiency and reliability of global supply chains. Read the original research paper here.

