This page covers our full corridor-network study, presented at the ATRD Symposium (preprint on arXiv below). An earlier, more limited version of this work — covering a single corridor, a two-corridor sequence, and a simple split, without zero-shot multi-corridor transfer — appeared at AIAA SciTech 2026; see citation below.
*Aadarsh did this work as a collaborator at MIT
As autonomous aircraft are introduced at scale and traffic density increases, centralized management becomes insufficient to coordinate the large numbers of crewed and uncrewed aircraft. Dedicated Advanced Air Mobility (AAM) corridors have been proposed for organizing high-density autonomous traffic flows. The desire to scalably provide autonomous aircraft flexibility in trajectory planning motivates the development of decentralized approaches to traffic management in AAM corridors.
In this work, we extend a multi-agent reinforcement learning (MARL) framework for aircraft navigation and self-separation to corridor networks. We test policies trained in a single-corridor setting on increasingly complex multi-corridor networks with combinations of merges and splits in a zero-shot manner. Experimental results demonstrate that learned behaviors transfer well to scenarios with varying traffic density, network geometry, and heterogeneous vehicle performance, without needing centralized coordination or model retraining.
We formulate the multi-aircraft system as a decentralized partially observable Markov decision process (Dec-POMDP). Each agent observes a local neighborhood and selects actions based on state observations and graph-structured interaction information — no centralized coordinator, no global state.
Two upstream corridors merge into one downstream corridor. 3 corridors total. (40 agents shown)
Two sequential merging points with compounded interaction density. 5 corridors total. (10 agents shown)
Aircraft diverge into parallel corridors then reconverge. 6 corridors total. (40 agents shown)
Merges, splits, and sequential chains totaling 18 corridors, 3 entries, 2 exits, 8 routes. (10 agents shown)
Traffic demand is varied with 10, 20, 30, and 40 simultaneous aircraft. The heterogeneous case assigns 3 agents a reduced maximum speed of 140 knots (20% slower) while remaining agents retain 175 knots.
Table 1: Performance metrics for the combined corridor network (18 corridors). The learned decentralized policy maintains strong performance across all traffic levels.
| | # Aircraft | | Conformance C% (↑) | | Completion S% (↑) | | Avg. Speed (kts) (↑) | | Tactical Intervention I% (↓) |
|---|---|---|---|---|
| 10 | 98% | 99% | 171.4 | 3.7% |
| 20 | 98% | 99% | 171.9 | 4.2% |
| 30 | 97% | 98% | 172.8 | 3.6% |
| 40 | 96% | 97% | 173.2 | 3.3% |
Faster agents learn to opportunistically overtake slower aircraft in inter-corridor gaps — adaptive speed harmonization emerges without explicit reward engineering. The presence of slower aircraft does not lead to persistent congestion; agents learn to extend their paths outside corridors to avoid boundary violations while overtaking.
3 of 10 agents capped at 140 knots (20% slower) overtaken opportunistically by faster agents.
Presented at the 3rd Annual Northeast Systems and Control Workshop, Princeton University, May 2026.
Scan to visit this project page
jaroan.github.io/jasminejerrya/AAM_Corridor_MARL.html
For any questions, please contact Jasmine.
The earlier, more limited version of this work (single corridor, two-corridor sequence, and a simple split) appeared at AIAA SciTech 2026:
The authors thank the MIT SuperCloud and Lincoln Laboratory Supercomputing Center for high-performance computing resources. This work was supported in part by NASA under grant #80NSSC23M0220 and the University Leadership Initiative (grants #80NSSC21M0071 and #80NSSC20M0163). J.J. Aloor was also supported in part by a MathWorks Fellowship. This research was also sponsored by the Department of the Air Force Artificial Intelligence Accelerator and was accomplished under Cooperative Agreement Number FA8750-19-2-1000. The views and conclusions contained in this document are those of the authors and should not be interpreted as representing the official policies, either expressed or implied, of the Department of the Air Force or the U.S. Government. The U.S. Government is authorized to reproduce and distribute reprints for Government purposes notwithstanding any copyright notation herein.