NEW SAVANNA: “Junk” DNA is not Junk

Thursday, September 6, 2012

“Junk” DNA is not Junk

Here’s the opening paragraphs from an article in today’s New York Times:

Among the many mysteries of human biology is why complex diseases like diabetes, high blood pressure and psychiatric disorders are so difficult to predict and, often, to treat. An equally perplexing puzzle is why one individual gets a disease like cancer or depression, while an identical twin remains perfectly healthy.

Now scientists have discovered a vital clue to unraveling these riddles. The human genome is packed with at least four million gene switches that reside in bits of DNA that once were dismissed as “junk” but that turn out to play critical roles in controlling how cells, organs and other tissues behave. The discovery, considered a major medical and scientific breakthrough, has enormous implications for human health because many complex diseases appear to be caused by tiny changes in hundreds of gene switches.

The findings, which are the fruit of an immense federal project involving 440 scientists from 32 laboratories around the world, will have immediate applications for understanding how alterations in the non-gene parts of DNA contribute to human diseases, which may in turn lead to new drugs. They can also help explain how the environment can affect disease risk. In the case of identical twins, small changes in environmental exposure can slightly alter gene switches, with the result that one twin gets a disease and the other does not.

I don’t know quite what to make of that.

I’m not at all surprised that “junk” DNA turns out to have useful functions. I’ve pretty much assumed that for some time, two or three decades at least.

Of course, I’m not a biologist, so my assumptions don’t count for much. Still, I do wonder why it’s taken biologists so long to figure this out.

I don’t know just when I first read about junk DNA. Nor do I recall just when I first read about regulatory genes, nor when I first read that some so-called junk DNA seems to have a regulatory function. But some time back in there I also came to the conclusion that much or perhaps most of the junk DNA probably has some function. And, for all I know, some biologists thought that back then as well, whenever “then” was, exactly.

How could I have possibly reached such a conclusion? On the one hand it seemed unlikely that the vast majority of the genome was just dead weight. Why carry it around if it’s just junk? That doesn’t make adaptive sense.

But that’s the least of it. I didn’t see how one could possibly “build” an organism with nothing but genes coding for proteins. That’s like building a cathedral with nothing but recipes for making bricks and mortar, techniques for cutting stone, making colored glass and hundreds of other materials. Sure, you need the bricks and glass and all the rest, but you also need some notion of how it all goes together to make the cathedral. That’s what regulatory DNA seemed to be doing and, so it seemed to me, lots of regulation was called for.

But that’s not how I thought about it back then; that’s just a helpful analogy. What I actually thought about was language. I figured that protein-coding DNA was more or less analogous to content words, that is, words that referred to things in the world (rocks, paper, scissors, mountains, vegetables, nations, etc.). Correlatively the so-called junk DNA was analogous to function words (prepositions, pronouns, articles, conjunctions, modals, etc.), the words that indicate syntax, and to the otherwise hidden grammatical machinery that binds contentives into discourse.

There is, of course, a problem with this analogy. There are many more content words than function words in any language while junk DNA seems far more extensive than protein-coding DNA. Still, the basic point still holds, that you do need something to hold the pieces together in the right conformation.

In any event, it’s one thing to entertain such a vague conjecture and rather more actually to prove it.

Later in the article:

The thought before the start of the project, said Thomas Gingeras, an Encode researcher from Cold Spring Harbor Laboratory, was that only 5 to 10 percent of the DNA in a human being was actually being used.

The big surprise was not only that almost all of the DNA is used but also that a large proportion of it is gene switches. Before Encode, said Dr. John Stamatoyannopoulos, a University of Washington scientist who was part of the project, “if you had said half of the genome and probably more has instructions for turning genes on and off, I don’t think people would have believed you.”

By the time the National Human Genome Research Institute, part of the National Institutes of Health, embarked on Encode, major advances in DNA sequencing and computational biology had made it conceivable to try to understand the dark matter of human DNA. Even so, the analysis was daunting — the researchers generated 15 trillion bytes of raw data. Analyzing the data required the equivalent of more than 300 years of computer time.

Addendum: Here's a link to some of the technical articles on Encode. They're open access in Nature.