Saturday, July 9, 2022

Visualization and biology – here’s an opportunity for the picture-drawing and code-writing capabilities of AI tech

James Somers, I should have loved biology. He starts off by telling us how dry as dust high school biology was. FWIW, I thought it was OK, especially when we had an exam where, instead of answering multi-choice and short-answer questions, the biology classroom was setup with stations, each with an exhibit of some kind. Each exhibit had some questions associated with it. We students then went from one exhibit to another until we’d been through them all. But I digress.

After going on and on about what doesn’t work, he finally gets down to it:

I’ve never come across a subject so fractal in its complexity. It reminds me of computing that way. A day of programming might involve constructing an elaborate regular expression, investigating a file descriptor leak, debugging a race condition in the application you just wrote, and thinking through the interface of a module. Everywhere you look—the compiler, the shell, the CPU, the DOM—is an abstraction hiding lifetimes of work. Biology is like this, just much, much worse, because living systems aren’t intentionally designed. It’s all a big slop of global mutable state. Control is achieved by upregulating this thing while turning down the promoter of that thing’s repressor. You think you know how something works—like when I thought I had a handle on the neutrophil, an important front-line player in the innate immune system—only to learn that it comes in several flavors, and more are still being discovered, and some of them seem to do the opposite of the ones you thought you knew. Everything in biology is like this. It’s all exceptions to the rule.

But biology, like computing, has a bottom, and the bottom is not abstract. It’s physical. It’s shapes bumping into each other. In fact the great revelation of twentieth-century molecular biology was the coupling of structure to function. An aperiodic crystal that forms paired helices is the natural store of heredity because of its ability to curl up and unwind and double itself with complements. Hemoglobin, the first protein studied in full crystallographic detail, was shown to be an efficient store of energy because of how oxygen atoms snap into its body like Legos, each snap widening the remaining slots, so that it loads itself up practically at a gulp. Most proteins are like this. The ones that drive locomotion twist like little motors; the ones that contract muscles climb and compress each other. Cells, too, are constantly in conversation, and the language they speak is shape. It’s keys entering locks: a protein might straddle the cell membrane, and when a cytokine (that’s a kind of signaling molecule) docks with it, it changes its shape, so that its grip loosens on some other molecule on the interior side of the membrane, as though fumbling a football—that football might be a signal itself, on its way to the nucleus.

I think my understanding of biology was too flow-charty in high school. I knew that DNA → RNA → protein and that this was called “gene expression,” but I was confused on the basics, like, how did genes actually “turn on”? And once they were on, were they on for good? It’s clearer when you think physically.

What we need is physical understanding. Yes! Think like an engineer.

How do you develop a physical understanding of biology? I like pictures. One of my favorite books is called The Machinery of Life, by David Goodsell. It’s full of gorgeous hand-drawn illustrations. [...]

What makes the book work is that it’s basically a re-introduction to molecular biology with the following premise: the cell is a very fast and crowded place, full of little machines, most of them protein, which you understand by taking a close look. It does an especially terrific job through insets like the above relating things at different scales. “Imagine your room filled with grains of rice. That will give you an idea of the billion or so cells that make up your fingertip.”

Somers continues on that theme for a while and then turns a corner: If we’re going to improve things, we’re going to need more pictures, lots of them, good ones. From that it follows that we need better tools for making pictures:

But I wonder whether it should be easier for regular people to create useful illustrations. Consider how easy it is to write, tooling-wise: on the web, you are only ever one click away from a Markdown-enabled textarea that allows you to create and publish pretty, hyperlinked documents. Anyone with a keyboard can contribute a few sentences to Wikipedia or answer a question on Stack Exchange. Drawing, by contrast, is hard, and animating is at least an order of magnitude harder. And yet these media are essential for understanding biological processes.

So what do we do?

It’s telling that when I was recently on a Zoom with a PhD student who was explaining RNA-seq, he pulled out his iPad Pro and essentially made a Khan Academy lecture as he talked, drawing along the way. These tools need to become more common and cheaper.

But we also need more software like pattern brushes in Adobe Illustrator, BioRender, and CellPAINT to make it un-tedious to draw complex objects. We need more software like Molecular Maya, but simplified even further, à la Victor’s Stop Drawing Dead Fish, to make animating accessible to anyone who can gesture.

I’m wondering if there isn’t something these new AI drawing programs, like DALL-E, can contribute here?

Furthermore:

Of course we need to teach more people how to draw. It’s an underrated skill. And how to write vividly, as in the wonderful books above.

But biology is uniquely suited to simulation—it’s a world of machines that are too small to see. The trouble is, it requires too much specialized skill to create three-dimensional interactive simulations. We need a toolkit that’s like MockMechanics, or Minecraft, that maybe even is Minecraft, but focused on biology. Or something much better.

Again, I’m seeing a role for these new AI engines. They write code as well.

So, you train a drawing engine on a large library of carefully curated and captioned images of biological entities of all kinds and all scales. Do whatever you have to do to fine-tune it. You end up with a tool that allows the user to describe the kind of image they need and the device produces it. You probably want to make it user-customizable, see my working paper, PowerPoint Assistant: Augmenting End-User Software through Natural Language Interaction. At the same time you train a code-writing engine to simulate these biological entities and processes. 

You could revolutionize education, and beyond that. You could revolutionize thinking and communication about biology at all levels. Might do the same for computing as well, which is a very visual domain. See my encyclopedia article, Visual Thinking.

Alas, schools are not particularly good markets for innovative technology that improves education. Would industry buy it? Would they fund the development?

H/t Tyler Cowen.

No comments:

Post a Comment