Animals and artificial agents face uncertainty about their environment. This uncertainty can stem from ignorance, forgetting, or unsignalled changes. Exploration is thus critical. Appropriately balancing exploration off against exploitation is a notoriously difficult task, and no general solution exists for all but the simplest problems (Gittins and Jones, 1979). Humans and other animals, however, are very capable of efficient exploration, sometimes doing so near-optimally (Wilson et al., 2014). Despite the wealth of experimental evidence, little is known about the computational mechanisms which generate exploratory choices in the brain.
A venerable idea from reinforcement learning is that exploratory choices can be planned offline (Sutton, 1991), which in animals could happen during periods of quiet wakefulness and sleep. One promising candidate for supporting such offline planning is hippocampal replay. Indeed, a recent theory (Mattar and Daw, 2018) suggested that hippocampal replay is an optimised scheme for scheduling planning computations in the brain. This idea has been very successful in explaining a range of experimental data examining replay prioritisation in humans and other animals.
Despite its success, the theory nonetheless makes a simplifying assumption that the environment with which the agent interacts is fully known. As such, the patterns of replay it predicts result in pure exploitation of the assumed knowledge about a given task. In our work (Antonov and Dayan, 2023), we extend the theory to the case of partial observability by explicitly handling uncertainty the agent has about its environment. This allows us to examine how replay prioritisation should be affected by uncertainty and subjective beliefs about a task (Fig. 1). We generate testable predictions for future studies which predict that replay might play a role in guiding directed exploration.