Monday, January 8, 2024

Developmental changes in exploration resemble stochastic optimization

Giron, A.P., Ciranka, S., Schulz, E. et al. Developmental changes in exploration resemble stochastic optimization. Nat Hum Behav 7, 1955–1967 (2023). https://doi.org/10.1038/s41562-023-01662-1

Abstract

Human development is often described as a ‘cooling off’ process, analogous to stochastic optimization algorithms that implement a gradual reduction in randomness over time. Yet there is ambiguity in how to interpret this analogy, due to a lack of concrete empirical comparisons. Using data from n = 281 participants ages 5 to 55, we show that cooling off does not only apply to the single dimension of randomness. Rather, human development resembles an optimization process of multiple learning parameters, for example, reward generalization, uncertainty-directed exploration and random temperature. Rapid changes in parameters occur during childhood, but these changes plateau and converge to efficient values in adulthood. We show that while the developmental trajectory of human parameters is strikingly similar to several stochastic optimization algorithms, there are important differences in convergence. None of the optimization algorithms tested were able to discover reliably better regions of the strategy space than adult participants on this task.

Main

Human development has fascinated researchers of both biological and artificial intelligence alike. As the only known process that reliably produces human-level intelligence1, there is broad interest in characterizing the developmental trajectory of human learning2,3,4 and understanding why we observe specific patterns of change5.

One influential hypothesis describes human development as a ‘cooling off’ process4,6,7, comparable to simulated annealing (SA)8,9. SA is a stochastic optimization algorithm named in analogy to a piece of metal that becomes harder to manipulate as it cools off. Initialized with high temperature, SA starts off highly flexible and likely to consider worse solutions as it explores the optimization landscape. But as the temperature cools down, the algorithm becomes increasingly greedy and more narrowly favours only local improvements, eventually converging on an (approximately) optimal solution. Algorithms with similar cooling mechanisms, such as stochastic gradient descent and its discrete counterpart stochastic hill climbing (SHC) are abundant in machine learning and have played a pivotal role in the rise of deep learning10,11,12,13.

This analogy of stochastic optimization applied to human development is quite alluring: young children start off highly stochastic and flexible in generating hypotheses4,14,15,16 and selecting actions17, which gradually tapers off over the lifespan. This allows children to catch information that adults overlook18, and learn unusual causal relationships adults might never consider4,15. Yet this high variability also results in large deviations from reward-maximizing behaviour3,19,20,21,22, with gradual improvements during development. Adults, in turn, are well calibrated to their environment and quickly solve familiar problems, but at the cost of flexibility, since they experience difficulty adapting to novel circumstances23,24,25,26.

While intuitively appealing, the implications and possible boundaries of the optimization analogy remain ambiguous without a clear definition of the process and a specification of what is being optimized. As a consequence, there is a need for a direct empirical test of the similarities and differences between human development and algorithmic optimization.

Perhaps the most direct interpretation is to apply cooling off to the single dimension of random decision temperature, controlling the amount of noise when selecting actions or sampling hypotheses6,16,27, although alternative implementations are also possible28,29. Evidence from experimental studies suggest that young children are harder to predict than adults17,30, implying greater stochasticity, which is amplified in neurodevelopmental disorders such as attention deficit hyperactivity disorder28 and impulsivity31. However, this interpretation is only part of the story, since developmental differences in choice variability can be traced to changes affecting multiple aspects of learning and choice behaviour. Aside from a decrease in randomness, development is also related to changes in more systematic, uncertainty-directed exploration19,27,32, which is also reduced over the lifespan. Additionally, changes in how people generalize rewards to novel choices27 and the integration of new experiences33,34 affect how beliefs are formed and different actions are valued, also influencing choice variability. In sum, while decision noise certainly diminishes over the lifespan, this is only a single aspect of human development.

Alternatively, one could apply the cooling off metaphor to an optimization process in the space of learning strategies, which can be characterized across multiple dimensions of learning. Development might thus be framed as parameter optimization, which tunes the parameters of an individual’s learning strategy, starting off by making large tweaks in childhood, followed by gradually lesser and more-refined adjustments over the lifespan. In the stochastic optimization metaphor, training iterations of the algorithm become a proxy for age.

This interpretation connects the metaphor of stochastic optimization with Bayesian models of cognitive development, which share a common notion of gradual convergence35,36. In Bayesian models of development, individuals in early developmental stages possess broad priors and vague theories about the world, which become refined with experience35. Bayesian principles dictate that, over time, novel experiences will have a lesser impact on future beliefs or behaviour as one’s priors become more narrow36,37. Observed over the lifespan, this process will result in large changes to beliefs and behaviour early in childhood and smaller changes in later stages, implying a similar developmental pattern as the stochastic optimization metaphor. In sum, not only might the outcomes of behaviour be more stochastic during childhood, but the changes to the parameters governing behaviour might also be more stochastic in earlier developmental stages.

The whole article is at the link.

No comments:

Post a Comment