This work is a holistic investigation of the question of how and why reinforcement learning (RL) agents fail to develop planning communication in a cooperative setting. It is also a guide to anyone trying to build a system that learns to use planning communication patterns. We examine the state of the art in multi-agent systems and emergent communication for the existence of communication patterns that share information about future actions and plans. To do this, we strategically formulate hypotheses that question various parts of the systems’ motivation or capability and test them. In doing so, we identify elements related to the task, the reward function, the algorithm used and its architecture that prevent or promote planning communication. The major takeaway is that model-free RL algorithms are inherently ill-suited to producing planning communication, because the learning process is itself the solution to the presented planning problem. A converged model-free algorithm has therefore no further incentive to plan. The second takeaway is that it is difficult to formulate toy tasks that yield a relevant advantage for planning communication over just signaling partially observable states, because to do so the observed situation must not lead to apparent strategies. With CoMaze, played with an unknown partner, we propose our best guess for a task that motivates the emergence of planning communication over simple signaling structures. The third takeaway is that RL algorithms can have difficulties to determine the connection between earlier communication and later success, which makes it hard to learn to talk about plans. We present a hierarchical architecture to avoid this problem in recursive networks and discuss other possible solutions. We take the lessons learned in this investigation to build an application where agents learn to communicate and execute a longer movement plan produced by a theoretical modelbased algorithm. In building this application step by step, we show the effect of different design decisions on the properties of the resulting communication. This includes the ordering of meaning in learned utterances as well as their relative or absolute relation to state observations. Eventually, we present a resolution of the classical cooperative or hierarchical relationship between communicating agents to achieve more productive task-sharing in executing plans as well as micromanagement tasks. Overall, this work should offer both insight and tools to shape RL algorithms towards planning communication.
@phdthesis{doi:10.17170/kobra-202303077591, author ={Ossenkopf, Marie}, title ={Learning planning communication in cooperative multi-agent settings}, keywords ={004 and Mehragentensystem and Kommunikation and Algorithmus}, copyright ={https://rightsstatements.org/page/InC/1.0/}, language ={en}, school={Kassel, Universität Kassel, Fachbereich Elektrotechnik / Informatik}, year ={2023} }