Tuesday, April 12, 2022

Meet the Pioneers [TALENT SEARCH] & a home run on neural nets over neural nets

Bumping this to the top just to remind myself of Bhav Ashok's work.

From the Pioneer blog (H/t Tyler Cowen):

We’re excited to announce the winners of the first Pioneer Tournament.

In the short 3 months since its launch, Pioneer has garnered a global reach. Our first tournament featured applicants from 100 countries, ranging from 12 to 87 years old. Almost half of our players hailed from countries like India, UK, Canada, Nigeria, Germany, South Africa, Singapore, France, Turkey, and Kenya. Projects were spread across almost every industry -- AI research, physics, chemistry, cryptocurrency and more.

We started this company to find the curious outsiders of the world. We think we’re off to a good start.
They've announced 17 winners for this round, including Leonard Bogdonoff, who got an Emergent Ventures grant (New Savanna post here) from Tyler Cowen (who is an advisor to the program). The winners of this round are mostly tech-oriented and range in age from 16 to 29).

My favorite, which simply reflects my interests:
Bhav Ashok (25, Singapore & USA, @bhavashok)

Bhav came up with a way to train a neural network to compress other neural networks. The compression process produces a faster and smaller network without sacrificing accuracy. This has the potential to vastly improve response time in self-driving cars, satellite imaging, and mobile apps.

Noteworthy: By age 12, Bhav was programming and making money selling graphics add-ons for games. At 16, he started doing research in machine learning for bioinformatics. He left Singapore for Austin, where he created TexteDB and founded a company. Bhav later enrolled in the Masters program at CMU, earned the highest GPA in his program, founded a computer vision club, and worked on various research projects.
Here's an arXiv preprint for the compression scheme (arXiv:1709.06030v2 [cs.LG]):
N2N Learning: Network to Network Compression via Policy Gradient Reinforcement Learning
Anubhav Ashok, Nicholas Rhinehart, Fares Beainy, Kris M. Kitani
(Submitted on 18 Sep 2017 (v1), last revised 17 Dec 2017 (this version, v2))

While bigger and deeper neural network architectures continue to advance the state-of-the-art for many computer vision tasks, real-world adoption of these networks is impeded by hardware and speed constraints. Conventional model compression methods attempt to address this problem by modifying the architecture manually or using pre-defined heuristics. Since the space of all reduced architectures is very large, modifying the architecture of a deep neural network in this way is a difficult task. In this paper, we tackle this issue by introducing a principled method for learning reduced network architectures in a data-driven way using reinforcement learning. Our approach takes a larger `teacher' network as input and outputs a compressed `student' network derived from the `teacher' network. In the first stage of our method, a recurrent policy network aggressively removes layers from the large `teacher' model. In the second stage, another recurrent policy network carefully reduces the size of each remaining layer. The resulting network is then evaluated to obtain a reward -- a score based on the accuracy and compression of the network. Our approach uses this reward signal with policy gradients to train the policies to find a locally optimal student network. Our experiments show that we can achieve compression rates of more than 10x for models such as ResNet-34 while maintaining similar performance to the input `teacher' network. We also present a valuable transfer learning result which shows that policies which are pre-trained on smaller `teacher' networks can be used to rapidly speed up training on larger `teacher' networks.

For some comments on the idea, though not this version, see this page.

Why do I like this work? Because, to quote James Brown, it feels good. Why does it feel good? Because I think the neocortex does something like that, and does it perhaps two or three levels deep. And why do I think that?

Well, that would take a bit of explaining and a really good explanation would likely range into areas beyond my technical competence, both in neuroscience and computation. Still, take a look at a paper Dave Hays and I published .some years ago [1].  Here's the abstract:
The phenomena of natural intelligence can be grouped into five classes, and a specific principle of information processing, implemented in neural tissue, produces each class of phenomena. (1) The modal principle subserves feeling and is implemented in the reticular formation. (2) The diagonalization principle subserves coherence and is the basic principle, implemented in neocortex. (3) Action is subserved by the decision principle, which involves interlinked positive and negative feedback loops, and resides in modally differentiated cortex. (4) The problem of finitization resolves into a figural principle, implemented in secondary cortical areas; figurality resolves the conflict between pro-positional and Gestalt accounts of mental representations. (5) Finally, the phenomena of analysis reflect the action of the indexing principle, which is implemented through the neural mechanisms of language.

These principles have an intrinsic ordering (as given above) such that implementation of each principle presupposes the prior implementation of its predecessor. This ordering is preserved in phylogeny: (1) mode, vertebrates; (2) diagonalization, reptiles; (3) decision, mammals; (4) figural, primates; (5) indexing. Homo sapiens sapiens. The same ordering appears in human ontogeny and corresponds to Piaget's stages of intellectual development, and to stages of language acquisition.

Principles 2 through 5 are implemented in cortical tissue and its afferent and efferent linkages to subcortical tissue. Think of coherence (2) as being implemented in something like a conventional neural net. Action (3), finitization (4), and analysis (5) would then be implemented through compression over downstream networks.

So, Tyler Cowen's first class had at least one member in one of my sweet zones (graffiti) and Pioneer's first class has a member in another sweet zone (neural computation). When and where will the third turn up?

* * * * *

[1] William Benzon and David Hays, Principles and Development of Natural Intelligence, Journal of Social and Biological Structures, Vol. 11, No. 8, July 1988, 293-322. Academia.edu: https://www.academia.edu/235116/Principles_and_Development_of_Natural_Intelligence. SSRN: https://ssrn.com/abstract=1504212.

No comments:

Post a Comment