Sara Hooker, The Hardware Lottery, arXiv:2009.06489v1 [cs.CY] 14 Sep 2020
Abstract: Hardware, systems and algorithms research communities have historically had different incentive structures and fluctuating motivation to engage with each other explicitly. This historical treatment is odd given that hardware and software have frequently determined which research ideas succeed (and fail). This essay introduces the term hardware lottery to describe when a research idea wins because it is suited to the available software and hardware and not because the idea is superior to alternative research directions. Examples from early computer science history illustrate how hardware lotteries can delay research progress by casting successful ideas as failures. These lessons are particularly salient given the advent of domain specialized hardware which makes it increasingly costly to stray off of the beaten path of research ideas. This essay posits that the gains from progress in computing are likely to become even more uneven, with certain research directions moving into the fast-lane while progress on others is further obstructed.
From the article (pp. 2-3):
Today, in contrast to the necessary specialization in the very early days of computing, machine learning researchers tend to think of hardware, software and algorithm as three separate choices. This is largely due to a period in computer science history that radically changed the type of hardware that was made and incentivized hardware, software and machine learning research communities to evolve in isolation.
Hooker goes on to talk about “the general purpose era”, based on the so-called von Neumann architecture of a single processor accessing large banks of memory:
The predictable increases in compute and memory every two years meant hardware design became risk-adverse. Even for tasks which demanded higher performance, the benefits of moving to specialized hardware could be quickly eclipsed by the next generation of general purpose hardware with ever growing compute. (p. 3)
She goes on to point out that the algorithmic concepts underlying neural networks were invented between 1963 and 1989 but implementation was hampered by the single processor architecture, which was not well-suited to the kind of processing required by neural networks. Then in the early 2000s graphic processing units (GPUs), which had been introduced in the 1970s to accelerate graphics processing for video games, movies, and animation, were repurposed for processing neural networks. Rapid graphics processing requires the parallel processing of relatively simple instruction sequences and so do neural networks. Thus (p. 6): “A striking example of this jump in efficiency is a comparison of the now famous 2012 Google paper which used 16,000 CPU cores to classify cats (Le et al., 2012) to a paper published a mere year later that solved the same task with only two CPU cores and four GPUs (Coates et al., 2013).”
Languages built for single-processer architectures were not well-suited for use on GPUs, which required the development of new languages. Meanwhile, most mainstream research was focused on symbolic approaches, which were compatible with single processor architectures and existing languages, such as LISP and PROLOG.
Back on the hardware side, tensor processing units (TPUs) have more recently been developed for deep learning. And so it goes.
In many ways, hardware is catching up to the present state of machine learning research. Hardware is only economically viable if the lifetime of the use case lasts more than 3 years (Dean, 2020). Betting on ideas which have longevity is a key consideration for hardware developers. Thus, Co-design effort has focused almost entirely on optimizing an older generation of models with known commercial use cases. For example, matrix multiplies are a safe target to optimize for because they are here to stay — anchored by the widespread use and adoption of deep neural networks in production systems. (p. 7)
But (p. 7), “There is still a separate question of whether hardware innovation is versatile enough to unlock or keep pace with entirely new machine learning research directions.”
You see the problem. We are well past the era where one hardware model more or less suits all applications. Specialized hardware is better for various purposes, and requires suitably specialized software. But hardware development and more expensive and takes more time and so is biased toward potential commercial markets where development cost can be recouped through sales.
Hooker goes on to point out that, however successful artificial neural networks have been so far, they are quire different from biological networks (pp. 8-9):
... rather that there are clearly other models of intelligence which suggest it may not be the only way. It is possible that the next breakthrough will re- quire a fundamentally different way of modelling the world with a different combination of hardware, software and algorithm. We may very well be in the midst of a present day hardware lottery.
That is, current hardware may well stand in the way of further advances. After further discussion of both hardware and software issues, Hooker concludes (p. 11):
Today the hardware landscape is increasingly heterogeneous. This essay posits that the hardware lottery has not gone away, and the gap between the winners and losers will grow increasingly larger. In order to avoid future hardware lotteries, we need to make it easier to quantify the opportunity cost of settling for the hardware and software we have.
No comments:
Post a Comment