Multi-agent systems are trained to maximize shared cost objectives, which typically reflect system-level efficiency. However, in the
                        resource-constrained environments of mobility and transportation systems, efficiency may be achieved at the expense of fairness
                        — certain agents may incur significantly greater costs or lower rewards compared to others. Tasks could be distributed inequitably,
                        leading to some agents receiving an unfair advantage while others incur disproportionately high costs. It is, therefore, important to
                        consider the tradeoffs between efficiency and fairness in such settings.
                        We consider the problem of fair multi-agent navigation for a group of decentralized agents using multi-agent reinforcement
                        learning (MARL). We consider the reciprocal of the coefficient of variation of the distances traveled by different agents as a measure
                        of fairness and investigate whether agents can learn to be fair without significantly sacrificing efficiency (i.e., increasing the total
                        distance traveled). We find that by training agents using min-max fair distance goal assignments along with a reward term that
                        incentivizes fairness as they move towards their goals, the agents (1) learn a fair assignment of goals and (2) achieve almost perfect
                        goal coverage in navigation scenarios using only local observations. For goal coverage scenarios, we find that, on average, the proposed
                        model yields a 14% improvement in efficiency and a 5% improvement in fairness over a baseline model that is trained using random
                        assignments. Furthermore, a 21% improvement in fairness can be achieved by the proposed model as compared to a model trained on
                        optimally efficient assignments; this increase in fairness comes at the expense of only a 7% decrease in efficiency. Finally, we extend
                        our method to environments in which agents must complete coverage tasks in prescribed formations and show that it is possible to do
                        so without tailoring the models to specific formation shapes.
                        
                    
                        Our environment comprises agent, obstacle, and goal entities. Agents navigate to goals so that each agent reaches a unique goal while avoiding collisions.
                    
		
                                    Here, agents are assigned to goals randomly.
                                    Each agent i is matched to a goal j so that the total cost Cij for all agents is minimized, which here corresponds to minimizing the total distance.
                                    Min‐max fairness is a popular concept that reduces the worst‐case cost in the assignment. We determine a min‐max fair assignment by optimizing the objective min z where z represents the maximum cost assigned to any agent
Overview of the training: In the navigation scenario, we track the path of the agents as an episode progresses. Frame A: Each episode starts with entities initialized randomly. Agent 1’s observation vector and sensing radius are shown. Frames B and C: At every time step, for each agent, the fairness metric Ft is computed along with each agent’s rewards. The agents are assigned goals randomly or based on an optimal or fair distance cost. Frame D: Once an agent reaches the assigned goal, it is given a goal reward Rg and is flagged ”done” for that episode.
                                        (a) RandAssign,NoFairRew (RA)
                                        b) OptAssign,NoFairRew (OA)
                                        (c) FairAssign,NoFairRew (FA)
                                        (d) FairAssign,FairRew (FA+FR)
We calculate the following metrics to determine the performance of our method:
Table 1: Evaluation metrics calculated for the four navigation scenarios shown in Fig. 4. We see that the model that is trained with fair goal assignments and fair rewards (FA+FR) has the best balance of fairness and efficiency (distance traveled).
| | Model . | | Fairness, 𝓕 (↑ better) . | | Success, S% (↑ better) . | | Episode fraction, T (↓ better) . | | Distance, D (↓ better) . | 
|---|---|---|---|---|
| RA | 6.50 | 66.7 | 1.00 | 10.81 | 
| OA | 3.45 | 100.0 | 0.68 | 9.36 | 
| FA | 4.81 | 100.0 | 0.64 | 9.96 | 
| FA+FR | 6.14 | 100.0 | 0.62 | 9.82 | 
For any questions, please contact Jasmine.
                    
The authors would like to thank the 
                    MIT SuperCloud 
                    and the Lincoln Laboratory Supercomputing Center for providing 
                    high-performance computing resources that have contributed to 
                    the research results reported within this paper. 
                    This work was supported in part by NASA under grant #80NSSC23M0220 and the University Leadership Initiative (grants #80NSSC21M0071 and #80NSSC20M0163),
                    but this article solely reflects the opinions and conclusions of its authors and not any NASA entity. J.J. Aloor was also supported in part by a Mathworks Fellowship.
                    
                    
 This website template was edited heavily following the base from Michaël Gharbi and
                    Matthew Tannick.