Between prudence and paranoia: Theory of Mind gone right, and wrong
Nitay Alon, Lion Schulz, Peter Dayan, & Joseph M. Barnby
Agents need to be on their toes when interacting with competitive others in order to avoid being duped. Too much vigilance out of context can, however, be detrimental and lead to paranoia. Here, we offer a formal account of this phenomenon through the lens of theory of mind. We simulate agents of different depths of mentalization and show how, if aligned well, deep recursive mentalisation gives rise to both successful deception as well as reasonable skepticism. However, we also show how, if theory of mind is too sophisticated, agents become paranoid, losing trust and reward in the process. We discuss our findings in light of computational psychiatry and AI safety.
We explore the effects of high level of opponent mentalisation using the iterated ultimatum game. In this game, a sender and a receiver interact in a sequential manner. The sender offers the receiver a partition of an endowment, and the receiver either agrees or rejects it. Using the interactive partially-obsrevable Markov decision-problem framework we recursively solve the task for various degrees of opponent mentalisation. We consider two types of senders – a random one and a threshold one. The latter is characterized by its threshold and its recursive mentalisation level (Depth of mentalisation; DoM).
As hypothesized, threshold senders with high DoM deceive receivers with low DoM by pretending to be a random sender. In turn, this makes the high DoM receiver sceptical towards random-like behaviour. While this is beneficial when interacting with a sophisticated sender, it slows down its ability to identify the random sender correctly. Moreover, it affects its ability to exploit the naïve sender.
Our work offers lessons to several fields: To the computational cognitive science, and psychiatry communities, we offer a computational account of a process contributing to paranoia, and a possible factor underlying general psychopathology. To the AI community, we show how theory of mind needs careful calibration to foster a working and trusting partnership between agents. This calibration is particularly important for systems that act in an increasingly social manner, like LLMs - whose capacity for theory of mind is being actively debated. As a result, our work has key implications for AI safety and human-computer interaction.