The Power and Pitfalls of Learning Without Feedback

Self-reinforcing learning can help to understand new things, but also to strengthen false beliefs

October 15, 2024

Imagine a child visiting a farm and seeing sheep and goats for the first time. Their parent points out which is what, helping the child learn to distinguish between the two. But what happens when the child would not have that guidance on a returning visit? Will they still be able to tell them apart? Neuroscientist Franziska Bröker is studying how both humans and machines learn without supervision — like a child on their own — and has uncovered a puzzle: unsupervised learning can either help or hinder progress, depending on certain conditions.
 

In the world of machine learning, algorithms thrive on unsupervised data. They analyze large volumes of information without explicit labels, and yet still manage to learn useful patterns. This success has sparked the question: If machines can learn so well this way, why do humans struggle in similar situations? According to recent studies, the answer may lie in how we make predictions and reinforce them in the absence of feedback. In other words, the results depend on how closely our inner understanding of a task matches what the task really requires.

The research shows that humans, like machines, use predictions to make sense of new information. For example, if someone believes that woolliness is the key difference between sheep and goats, they might erroneously classify a woolly goat as a sheep. When no one is around to correct this mistake, their wrong prediction gets reinforced, making it even harder to learn the correct difference. This “self-reinforcement” process can lead to a snowball effect: if their initial guess is right, learning improves—but if it’s wrong, they may get stuck in a loop of false beliefs.

This phenomenon doesn't just apply to animal identification. From learning to play a musical instrument to mastering a new language, the same dynamic can be seen. Without guidance or feedback, people often reinforce incorrect methods, making their mistakes harder to fix later on. The research suggests that unsupervised learning works best when a person's initial understanding is already somewhat aligned with the task. For harder tasks—like learning complex language rules or difficult motor skills—feedback is essential to avoid these traps.

In the end, the mixed results of unsupervised learning tell a bigger story: it's not about whether learning without feedback works, but about when and how. As both humans and machines continue to learn in more complex environments, understanding these nuances could lead to better teaching methods, more effective training tools, and perhaps even smarter algorithms that can correct themselves better like we do.

Expertise and unsupervised learning

While laboratory studies reveal various outcomes of unsupervised learning, understanding its implications in real-world learning scenarios requires examining expertise acquisition, which stems from extensive learning with varying degrees of supervision. For instance, radiologists receive structured feedback early in their careers but gradually lose access to explicit supervisory guidance. If unsupervised learning alone could foster expertise, we would expect consistent improvement, but evidence suggests otherwise.

Critics argue that experience is not necessarily predictive of expertise, as it can reflect only seniority without substantial skill improvement. Biases, such as confirmation bias, can further distort unsupervised learning by favoring information that aligns with preconceptions, ultimately hindering progress. Instead, regular feedback on decisions appears necessary for steady improvement. This aligns with the representation-to-task alignment framework, which posits that initial feedback helps learners build accurate mental representations before they can self-regulate their learning effectively.

For instance, in motor skill learning, removing feedback early in the learning process can lead to performance declines, whereas withdrawing it later—when the learner's predictions are more accurate—helps maintain or even improve performance. This highlights that expertise requires not just experience, but also timely supervision during critical learning phases.

Self-reinforcement in unsupervised learning

Unsupervised learning is often driven by self-reinforcement mechanisms, where learners use their own predictions instead of external validation. This form of learning is well-explored in perceptual and category learning, where Hebbian models demonstrate how unsupervised learning can either enhance or degrade performance depending on how well the learner's representations align with the task. These models have successfully accounted for semi-supervised categorization, such as children's acquisition of linguistic labels, suggesting that self-reinforcement can shape learning trajectories.

However, the absence of feedback, particularly in expertise acquisition, can lead to the perpetuation of incorrect predictions, as demonstrated in stereotyping. Without external correction, individuals may reinforce their own incorrect predictions, a phenomenon modeled by constructivist coding hypotheses. This can lead to persistent errors, as actions are treated as validated even when no feedback is provided, emphasizing the role of selective feedback in moderating unsupervised learning.

Internal feedback and neural mechanisms

Self-reinforcement requires internal learning signals that operate independently of external supervision. While the neural basis of learning through external feedback, such as rewards and punishments, is well understood, the mechanisms that govern self-generated feedback are less clear. Emerging research suggests that brain areas active during external feedback processing are also engaged during inferred feedback, such as when a learner reinforces their own choices. Confidence in one’s decisions, even in the absence of external feedback, appears to be a key driver of self-reinforcement and subjective rewards can shape learning trajectories by reinforcing past decisions.

These internal feedback mechanisms can lead learners into "learning traps," where they cease exploring alternative strategies and focus solely on exploiting their past choices. Neuroimaging studies reveal that preferences are updated only for remembered choices, further supporting the role of internal feedback in guiding unsupervised learning. Additionally, neural replay—a process where the brain reactivates past experiences during rest—has been linked to self-reinforcement, highlighting its role in refining mental representations without external guidance.

Finding the right balance

The expertise literature, alongside controlled studies of unsupervised learning, supports the idea that self-reinforcement can either enhance or hinder performance based on the alignment between a learner's mental representations and the task at hand. While unsupervised learning has potential, it is not a silver bullet. Instead, its effectiveness depends on the complex interplay between existing knowledge, internal signals and task structure.

Future research should further explore the relationship between unsupervised self-reinforcement and external supervisory signals, especially in real-world learning contexts. This includes investigating how these mechanisms interact in human learning, which may involve a unified learning system rather than the distinct, task-specific algorithms often employed in artificial intelligence. Integrating insights from neuroscience, psychology, and machine learning will help develop more comprehensive models of human learning, leading to instructional designs that better support lifelong learning and prevent overconfidence in erroneous conclusions.

Ultimately, understanding the dynamics of unsupervised learning, including its potential pitfalls, will enhance educational approaches and support the development of expertise in various fields. By balancing self-reinforcement with critical external feedback, we can optimize learning systems to foster deep, lasting expertise while avoiding the traps of unsupervised overconfidence.

 

Go to Editor View