Multi-agent systems are trained to maximize shared cost objectives, which typically reflect system-level efficiency. However, in the
resource-constrained environments of mobility and transportation systems, efficiency may be achieved at the expense of fairness
— certain agents may incur significantly greater costs or lower rewards compared to others. Tasks could be distributed inequitably,
leading to some agents receiving an unfair advantage while others incur disproportionately high costs. It is, therefore, important to
consider the tradeoffs between efficiency and fairness in such settings.
We consider the problem of fair multi-agent navigation for a group of decentralized agents using multi-agent reinforcement
learning (MARL). We consider the reciprocal of the coefficient of variation of the distances traveled by different agents as a measure
of fairness and investigate whether agents can learn to be fair without significantly sacrificing efficiency (i.e., increasing the total
distance traveled). We find that by training agents using min-max fair distance goal assignments along with a reward term that
incentivizes fairness as they move towards their goals, the agents (1) learn a fair assignment of goals and (2) achieve almost perfect
goal coverage in navigation scenarios using only local observations. For goal coverage scenarios, we find that, on average, the proposed
model yields a 14% improvement in efficiency and a 5% improvement in fairness over a baseline model that is trained using random
assignments. Furthermore, a 21% improvement in fairness can be achieved by the proposed model as compared to a model trained on
optimally efficient assignments; this increase in fairness comes at the expense of only a 7% decrease in efficiency. Finally, we extend
our method to environments in which agents must complete coverage tasks in prescribed formations and show that it is possible to do
so without tailoring the models to specific formation shapes.
Our environment comprises agent, obstacle, and goal entities. Agents navigate to goals so that each agent reaches a unique goal while avoiding collisions.
Here, agents are assigned to goals randomly.
Each agent i is matched to a goal j so that the total cost Cij for all agents is minimized, which here corresponds to minimizing the total distance.
Min‐max fairness is a popular concept that reduces the worst‐case cost in the assignment. We determine a min‐max fair assignment by optimizing the objective min z where z represents the maximum cost assigned to any agent
Overview of the training: In the navigation scenario, we track the path of the agents as an episode progresses. Frame A: Each episode starts with entities initialized randomly. Agent 1’s observation vector and sensing radius are shown. Frames B and C: At every time step, for each agent, the fairness metric Ft is computed along with each agent’s rewards. The agents are assigned goals randomly or based on an optimal or fair distance cost. Frame D: Once an agent reaches the assigned goal, it is given a goal reward Rg and is flagged ”done” for that episode.
(a) RandAssign,NoFairRew (RA)
b) OptAssign,NoFairRew (OA)
(c) FairAssign,NoFairRew (FA)
(d) FairAssign,FairRew (FA+FR)
We calculate the following metrics to determine the performance of our method:
Table 1: Evaluation metrics calculated for the four navigation scenarios shown in Fig. 4. We see that the model that is trained with fair goal assignments and fair rewards (FA+FR) has the best balance of fairness and efficiency (distance traveled).
| Model . | | Fairness, 𝓕 (↑ better) . | | Success, S% (↑ better) . | | Episode fraction, T (↓ better) . | | Distance, D (↓ better) . |
---|---|---|---|---|
RA | 6.50 | 66.7 | 1.00 | 10.81 |
OA | 3.45 | 100.0 | 0.68 | 9.36 |
FA | 4.81 | 100.0 | 0.64 | 9.96 |
FA+FR | 6.14 | 100.0 | 0.62 | 9.82 |
The authors would like to thank the
MIT SuperCloud
and the Lincoln Laboratory Supercomputing Center for providing
high-performance computing resources that have contributed to
the research results reported within this paper.
This work was supported in part by NASA under grant #80NSSC23M0220 and the University Leadership Initiative (grants #80NSSC21M0071 and #80NSSC20M0163),
but this article solely reflects the opinions and conclusions of its authors and not any NASA entity. J.J. Aloor was also supported in part by a Mathworks Fellowship.
This website template was edited heavily following the base from Michaël Gharbi and
Matthew Tannick.