Thursday, August 22, 2019

A critique of pure learning and what artificial neural networks can learn from animal brains


The article linked in the tweet: Anthony M. Zador,  A critique of pure learning and what artificial neural networks can learn from animal brains, Nature Communication:
Abstract: Artificial neural networks (ANNs) have undergone a revolution, catalyzed by better supervised learning algorithms. However, in stark contrast to young animals (including humans), training such networks requires enormous numbers of labeled examples, leading to the belief that animals must rely instead mainly on unsupervised learning. Here we argue that most animal behavior is not the result of clever learning algorithms—supervised or unsupervised— but is encoded in the genome. Specifically, animals are born with highly structured brain connectivity, which enables them to learn very rapidly. Because the wiring diagram is far too complex to be specified explicitly in the genome, it must be compressed through a “genomic bottleneck”. The genomic bottleneck suggests a path toward ANNs capable of rapid learning
From the article, on learning:
In ANN research, the term “learning” has a technical usage that is different from its usage in neuroscience and psychology. In ANNs, learning refers to the process of extracting structure—statistical regularities—from input data, and encoding that structure into the parameters of the network. These network parameters contain all the information needed to specify the network. For example, a fully connected network with 𝑁 neurons might have one parameter (e.g., a threshold) associated with each neuron, and an additional 𝑁^2 parameters specifying the strengths of synaptic connections, for a total of 𝑁+𝑁^2 free parameters. Of course, as the number of neurons 𝑁 becomes large, the total parameter count in a fully connected ANN is dominated by the 𝑁^2 synaptic parameters.

There are three classic paradigms for extracting structure from data, and encoding that structure into network parameters (i.e., weights and thresholds). In supervised learning, the data consist of pairs—an input item (e.g., an image) and its label (e.g., the word “giraffe”)—and the goal is to find network parameters that generate the correct label for novel pairs. In unsupervised learning, the data have no labels; the goal is to discover statistical regularities in the data without explicit guidance about what kind of regularities to look for. For example, one could imagine that with enough examples of giraffes and elephants, one might eventually infer the existence of two classes of animals, without the need to have them explicitly labeled. Finally, in reinforcement learning, data are used to drive actions, and the success of those actions is evaluated based on a “reward” signal.

Much of the progress in ANNs has been in developing better tools for supervised learning. If a network has too many free parameters, the network risks “overfitting” data, i.e. it will generate the correct responses on the training set of labeled examples, but will fail to generalize to novel examples. In ANN research, this tension between the flexibility of a network (which scales with the number of neurons and connections) and the amount of data needed to train the network (more neurons and connections generally require more data) is called the “bias-variance tradeoff” (Fig. 1). A network with more flexibility is more powerful, but without sufficient training data the predictions that network makes on novel test examples might be wildly incorrect—far worse than the predictions of a simpler, less powerful network. To paraphrase “Spiderman”: With great power comes great responsibility (to obtain enough labeled training data). The bias-variance tradeoff explains why large networks require large amounts of labeled training data.
Much later:
In this view, supervised learning in ANNs should not be viewed as the analog of learning in animals. Instead, since most of the data that contribute an animal’s fitness are encoded by evolution into the genome, it would perhaps be just as accurate (or inaccurate) to rename it “supervised evolution.” Such a renaming would emphasize that “supervised learning” in ANNs is really recapitulating the extraction of statistical regularities that occurs in animals by both evolution and learning. In animals, there are two nested optimization processes: an outer “evolution” loop acting on a generational timescale, and an inner “learning” loop, which acts on the lifetime of a single individual. Supervised (artificial) evolution may be much faster than natural evolution, which succeeds only because it can benefit from the enormous amount of data represented by the life experiences of quadrillions of individuals over hundreds of millions of years.
And so:
The importance of innate mechanisms suggests that an ANN solving a new problem should attempt as much as possible to build on the solutions to previous related problems. Indeed, this idea is related to an active area of research in ANNs, “transfer learning,” in which connections pre-trained in the solution to one task are transferred to accelerate learning on a related task. For example, a network trained to classify objects such as elephants and giraffes might be used as a starting point for a network that distinguishes trees or cars. However, transfer learning differs from the innate mechanisms used in brains in an important way. Whereas in transfer learning the ANN’s entire connection matrix (or a significant fraction of it) is typically used as a starting point, in animal brains the amount of information “transferred” from generation to generation is smaller, because it must pass through the bottleneck of the genome. Passing the information through the genomic bottleneck may select for wiring and plasticity rules which are more generic, and which therefore are more likely to generalize well. For example, the wiring of the visual cortex is quite similar to that of the auditory cortex (although each area has idiosyncrasies). This suggests that the hypothesized canonical cortical circuit provides, with perhaps only minor variations, a foundation for the wide variety of tasks that mammals perform. Neuroscience suggests that there may exist more powerful mechanisms—a kind of generalization of transfer learning—which operate not only within a single sensory modality like vision, but across sensory modalities and even beyond.

A second observation from neuroscience follows from the fact that the genome doesn’t encode representations or behaviors directly or optimization principles directly. The genome encodes wiring rules and patterns, which then must instantiate behaviors and representations. It is these wiring rules that are the target of evolution. This suggests wiring topology and network architecture as a target for optimization in artificial systems. Classical ANNs largely ignored the details of network architecture, guided perhaps by theoretical results on the universality of fully connected three-layer networks. But of course, one of the major advances in the modern era of ANNs has been convolutional neural networks (CNNs), which use highly constrained wiring to exploit the fact that the visual world is translation invariant. The inspiration for this revolutionary technology was in part the structure of visual receptive fields. This is the kind of innate constraint that in animals would be expected to arise through evolution; there might be many others yet to be discovered. Other constraints on wiring and learning rules are sometimes imposed in ANNs through hyperparameters, and there is an extensive literature on hyperparameter optimization. At present, however, ANNs exploit only a tiny fraction of possible network architectures, raising the possibility that more powerful, cortically-inspired architectures remain to be discovered.
Might I make three suggestions?  First, read what David Hays and I wrote about the brain some years ago:
William Benzon and David Hays, Principles and Development of Natural Intelligence, Journal of Social and Biological Structures, Vol. 11, No. 8, July 1988, 293-322, https://www.academia.edu/235116/Principles_and_Development_of_Natural_Intelligence.
In connection with that article you should read this blog posts, which is built around out-takes from that article:
Vehicularization: A Control Principle in a Complex Modal Animal (w/ new note on King Kong), New Savanna, Nov. 7, 2019, https://new-savanna.blogspot.com/2012/08/vehicularization-control-principle-in.html.
Then take a look at some notes I did on what I call "attractor nets", network structures over dynamical systems where the edges correspond to the attractors in those systems and nodes are logical operators (and, or) over combinations of attractors in different systems:
Attractor Nets, Series I: Notes Toward a New Theory of Mind, Logic and Dynamics in Relational Networks, Working Paper, 52 pp., https://www.academia.edu/9012847/Attractor_Nets_Series_I_Notes_Toward_a_New_Theory_of_Mind,_Logic,_and_Dynamics_in_Relational_Networks.
Alas, as the title indicates, these are just notes. But it's the best I could do. They need the attention of someone with mathematical and computational skills that I lack.

No comments:

Post a Comment