Feb 18, 2004
Malorye A. Branca
Bio-IT World Inc.
After years of frustration, genetic sleuths are finally homing in on genes for complex diseases, with the help of new, and some not so new, tools and strategies. Companies at the forefront are betting on a target bonanza.
Bill Smith's DNA has been studied by hundreds of scientists around the world, but not because of any intriguing genetic disease he harbors. Smith (name withheld for privacy) is a member of a "CEPH" family. Twenty years ago, about 60 large multigeneration families donated their DNA to the Centre d'Etude du Polymorphisme Humain (CEPH) in Paris, the brainchild of Nobel laureate Jean Dausset. Each family had three living generations, comprising eight children, two parents, and four grandparents. Most, like Smith's, were from Utah. The DNA of these 600 or so people was a critical resource in the mapping studies that identified the genes for cystic fibrosis, muscular dystrophy, and scores of other genetic disorders over the past two decades.
Smith signed on for CEPH when he was just 19. "I thought it sounded interesting," he says. "And if we could make a difference in helping medical science, why not do it?"
Now he has that chance again. Only this time, the scientific stakes are even higher.
Some University of Utah scientists have re-recruited most of the CEPH families for a novel project that involved developing an exhaustive list of their physical characteristics. "It was strange, but kind of fun," Smith says. "They looked at all kinds of things, like what you could taste and the shape of your body." Now, nonprofit GenData offers those data as part of a package of services.
GenData embodies one of several new approaches that are speeding the discovery of genes underlying the world's most common diseases. It's a noble cause, not only because of the physical toll these diseases take, but also because this field has been "in some disrepute for the last 10 years," as Lon Cardon, principal research fellow of the Wellcome Trust Centre for Human Genetics puts it.
Most common diseases, such as heart disease, diabetes, and stroke, involve a mysterious combination of genetic and environmental influences. As a result, they are intractable to the traditional gene- sleuthing approach: In linkage analysis, the DNA of sick and healthy family members is compared to uncover disease genes.
The advantage of family linkage studies is that the genomes of relatives are similar overall, so it is easier to spot rare differences between them. But the challenge in studying complex diseases is to find common genes underlying them. These diseases may in fact be collections of related disorders, sparked by different factors.
In the alternative approach of association studies, scientists track discrete small variations (single nucleotide polymorphisms, SNPs) across the genomes of large numbers of unrelated people. These variations need not be directly related to the disease — SNPs serve as physical markers of the disease gene, helping scientists deconstruct even the most complicated diseases.
In the late 1990s, when high-throughput genotyping finally came along and millions of SNPs began streaming into public databases, many more groups started doing case/control studies. But results have not lived up to expectations.
"Many associations are found, but few can be validated," says Anthony Brookes of the Center for Genomics and Bioinformatics at Sweden's Karolinska Institute. In response, science journals established much stricter criteria for publishing association studies (see "Freely associating." Nature Genetics 22, 1-2; 1999). To this day, many are "wary" of the studies, says Dennis Drayna, acting section chief at the National Institute on Deafness and Childhood Diseases. "I know at least one journal that just won't publish them at all!"
The reasons for this mess are numerous. There are tales of sloppy genotyping, bad study design, and mysterious associations that crop up in certain subgroups but not the broader population. But the most oft-cited problems are insufficient samples, bad markers, and flawed data analysis.
With millions of possible variations in any individual's genome, spurious associations are certain to occur. "Even if the study is done well, the law of probability means that not all the associations you find will be real," says David Goldstein, a geneticist at University College, London (UCL).
Many scientists are still arguing about statistical methods, clamoring for cheaper genotyping tools, and debating whether they should do association studies or stick with slower, but more fruitful, family linkage studies.
Meanwhile, several groups are stepping forward to claim they've already solved the problem. "We've turned the corner, and we finally know how to do these studies," says Drayna. The methodologies all vary in some ways between groups, but they all use larger collections of patient samples, collect better data about physical characteristics, and employ a multistep approach to finding associations between genes and complex diseases.
Getting It Right
As senior vice president of genetic research at GlaxoSmithKline (GSK), Allen Roses is leading the most aggressive genomics charge in the entire industry. Roses left a senior academic position at Duke University specifically to reinvent this field. "I spent 30 years doing linkage studies," he says. "Even when you finally find something, it's usually not a good drug target."
"At GlaxoWellcome, our goal was to patent the 30 to 50 genes most likely to be targets for major diseases," he says. Prior to the merger with GlaxoWellcome in 2001, SmithKlineBeecham "held the world's indoor record for number of disease genes patented," Roses says.
Roses and his team aren't fiddling around with high concepts: "We're only interested in targets with direct application to pharmaceuticals." That means genes that are medically important and whose function can be tested in a high-throughput screen.
They began by listing all the genes that are common targets for drugs, and came up with 1,600. To test four to five variants in each gene requires 6,500 SNPs. Next, they picked 12 major disease categories. In terms of the sample numbers, "We decided to go for overkill," Roses says. GSK is testing 500 people with each condition, and 500 without. After the first scan, they assume many of their observed associations are false, and dutifully repeat the scan in a different set of patients and healthy controls.
As Roses points out, each scan is "a lot of genotypes." He estimates they'll do about 80 million genotypes in the first pass for each of the dozen diseases. Last year, they delivered the first full-scan results for asthma, diabetes, and schizophrenia. Roses promptly shuttled the data off to the appropriate GSK therapeutic discovery centers, where they are being evaluated.
Underlying this big science is some excruciatingly detailed work. GSK had to hammer out clear definitions of each disease with its various collaborators. If a patient is diabetic, for example, "We've worked out everything, such as whether they have ankle jerks or not," Roses says. Those are a symptom of diabetic neuropathy, a condition that arises in a subset of patients. Attention to nomenclature is a critical part of classifying the physical features that are associated with each new genetic variation.
Finally, they needed a top-flight informatics system to deal with all the data. "The difference between us and everyone else is that we not only know what to do, but we have all the pieces — reagent, patients, controls, etc. — collected over more than six years to do it," he says.
Most people agree with him. "What Roses is doing is the ultimate goal," Cardon says. But having to pay about 10 cents a genotype, and $1 million or more for up-front instrument costs, few groups can afford this scale of endeavor.
Two exceptions are Sequenom and Perlegen, which own their own high-throughput genotyping platforms. But even they are dependent on collaborations to move beyond the gene-scan phase into real drug discovery. Perlegen (see "Taking Data Storage to Infinity — and Beyond," Jan. 2003 Bio·IT World, page 52) has collaborations with several large pharmaceutical firms, including three deals with Pfizer. Sequenom just inked its first discovery collaboration around its own targets. That deal, with Procter & Gamble, means "osteoporosis research does not cost Sequenom any more money," says Jay Lichter, Sequenom's executive vice president of business development.
Both companies say the "size matters" strategy is reaping big rewards. Perlegen chief scientific officer David Cox describes a large-scale screen for Pfizer, looking for genes that regulate high-density lipoprotein (HDL) levels. In both the initial and the validation screen, the cholesterol ester transport protein (CETP) gene popped up. It's a well-known target already, but Cox says the finding proves Perlegen's approach works and that "CETP is a good target for Pfizer." Pfizer already has a CETP inhibitor in development, and is embarking on a massive trial to test that drug in combination with its top-selling statin, Lipitor.
In its diabetes study, Sequenom is looking at about 2,000 patients. The company has done huge scans in a dozen diseases, using 28,000 to 84,000 SNPs each time. For eight of those 12 diseases, it has already done the replications, too. "Things replicate, and we see SNPs that have the same effect in multiple populations," Lichter says.
Celera Diagnostics is doing similar studies on the Luminex platform. Celera's advantage is a trove of SNP data generated during the race to sequence the human genome. The company has since been validating SNPs across a much larger number of individuals.
A "discovery" set of 500 patients and 500 controls "gives us the power to detect even relatively small but important effects," says Tom White, chief scientific officer at Celera Diagnostics. To speed things up, they quickly test any preliminary hits in a second population, even while data from the first scan are still being completed. The company recently released data on three novel genes linked to heart disease. Ultimately, it will "knock off one test, one gene at a time," White says. These could become targets for its parent company, Celera Genomics.
Iceland's deCODE Genetics' mixed approach to finding genes is unusual, and it is one that is getting renewed attention. The company starts with traditional linkage methods in its comprehensive data collection from Icelanders, whose health and genealogy has been well documented. That points the company to a starting point for the real study. "We end up with a [chromosome] region that is just five to six megabases, and do association studies there," says deCODE CEO Kari Steffanson. In those case/control studies, "We saturate the region with about 100 markers," he says.
This strategy points deCODE to the most important genes and helped the company map more than 20 key disease genes. Competitors caution that deCODE's genes may not be relevant in non-Icelandic populations. "One of the only validations they published was in Scottish subjects. That's still Viking stock," White says.
That charge evokes an expletive-laden response from Steffanson, who adds that the company has done multiple validations in other groups, including Americans. Besides, he says, with heavy sarcasm, "We Icelanders are an excellent animal model for humans," and "this is exactly the way you find common disease genes."
Drayna agrees that deCODE's approach is the right one. "The hybrid strategy is providing a light at the end of the tunnel," he says. His NIH group followed that path to find an intriguing gene that allows people to taste the chemical phenylthiocarbamide (see Kim, U.K. et al., Science 299, 1221-1225; 2003). His collaborators included Mark Leppert, one of the scientists from the University of Utah who helped found GenData. They did their initial "house-to-house" search for likely gene suspects using CEPH families, then tested their findings in unrelated people. The results pointed clearly to a tiny spot on chromosome 7q, and three SNPs in particular.
Tricks of the Trade
Whether they are doing linkage, association, or both, everyone agrees that all SNPs are not alike. Some are passed on in blocks. They "work as a pack of wolves," Cox says, "traveling together." Cardon is philosophical about it: "Yet again, we come to realize that the genome is not random," he says. Scientists beware: Markers picked from one region may not work as well as those from another.
That's why people are searching for tricks that can make marker selection easier.
The $100-million HapMap project, launched in 2003, aims to "elucidate the underlying genomic structure," says Cardon, a team member (see "The International HapMap Project." Nature 426, 789-796; 2003). Its scientists are analyzing DNA from several hundred people from Nigeria, Japan, China, the United States, and Europe. Data are already flowing into the public Web site, www.hapmap.org. "We're raising more questions than answers right now," Cardon says, although what they are seeing is "shedding light on all these irreproducible results." He expects a clearer view of that genomic structure within a year, which should help people pick better SNPs.
Another major development is the rise in biobanks, many modeled after the one established by deCODE six years ago.
Drayna's study is one of the first that has exploited the new data now offered by GenData. The company is led by Mark Leppert and Stephen Prescott, renowned scientists at the University of Utah, where more than 30 disease genes have been identified. The idea is to marry gene data from the CEPH families with exquisitely detailed medical information, and integrate other data from the state's Cancer Registry, vital statistics, the university, and the Huntsman Cancer Institute. About 180 physical characteristics were recorded in 47 CEPH family members. The Utah Population Database (UPDB) now contains more than 7 million records from a growing number of sources, bioinformatics tools, and sophisticated software to run it all.
Researchers can now gain access to the data, and some can recruit willing family members for studies. Prescott envisions that the data will be used to "study any disease of interest" and to understand "responsiveness to therapy of adverse events." The Utah researchers are themselves already using the UPDB to search for genes related to the complex skin disorder psoriasis.
Leaders in the race to find disease genes, like GSK and deCODE, emphasize the absolute necessity of having such detailed patient data. Each complex disease hides multiple subsets of causes. "Psoriasis is probably multiple diseases," Leppert says, "depending on where it occurs on the body."
Some experts, like Joe Terwilliger of the Columbia University Genome Center, worry that the technology is still just not up to the task. They believe the easy fruit has already fallen, and genes for common diseases will be extremely hard to find. "People just do these studies to justify buying expensive machines," says Terwilliger, who advises researchers to stick to basics, like family or founder population studies.
Others are far more optimistic. "The community is sorting out how to do this," says UCL's Goldstein.
With so many approaches, it will be fascinating to see in the end, who
was right about which are best. Equally interesting will be the final drug
target tally. According to GSK's vice president of discovery and pipeline
genetics, Eric Lai, speaking at a recent conference, "Once companies like
us and Sequenom find all the targets, there won't be any left." Such a
gene sweep would undoubtedly change the shape of the pharmaceutical industry,
as well as the future of medicine.
Copyright © 2004, Bio-IT World Inc.