Genetic Future : Why do genome-wide scans fail?
Official Now Comment on ScienceBlogs: Count: 1,000,000 Comment Contest! 1,014,040 ●
Latest Posts
Archives
Search this blog
Search
About
Contact
Subscribe
« First Arab genome sequenced | Main | David Goldstein on the failures of genome-wide association studies »
Why do genome-wide scans fail?
ScienceBlogs home
●
Last 24 Hours
●
Syndication Feeds (RSS)
●
Email Subscriptions
●
The SB Weekly Recap
Blogs in the Network
All Blogs
Category: genome-wide association studies
Profile I write about the genetic and evolutionary basis of human variation, and the companies trying to sell you information about your genome.
Recent Posts ●
Posted on: September 15, 2008 9:02 AM, by Daniel MacArthur
Reposted from the old Genetic Future domain. The successes of genome-wide association studies (GWAS) in identifying genetic risk factors for common diseases have been heavily publicised in the mainstream media - barely a week goes by these days that we don't hear about another genome scan that has identified new risk genes for diabetes, lupus, cardiac disease, or any of the other common ailments of Western civilisation.
Google co-founder at increased risk of Parkinson's, according to Some of this publicity is well-founded: for the first time in human history, we have the power 23andMe
to identify the precise genetic differences between human beings that contribute to variation
●
HapMap phase 3 data available in disease susceptibility. If we can document all of the factors, both genetic and for browsing
Advertisement
Top Five: Readers' Picks 1. Greg Laden's Blog : Minnesota Science Standards: Shifting Controversies 2. Gene Expression: Male skew; dude likes ladies 3. Tetrapod Zoology : The Long-necked seal, described 1751 4. Greg Laden's Blog : Hey, can I still get that $50 / hour for picking lettuce? 5. Living the Scientific Life (Scientist, Interrupted): Love, Sex and War in the Seychelles
environmental, that result in common disease we will be able to target early interventions to
●
David Goldstein on the failures the individuals who are most susceptible. Every GWAS success brings us closer to the longof genome-wide association awaited era of personalised medicine. studies
●
Why do genome-wide scans fail?
But while the media trumpet the successes of genome scans, little attention is paid to their
●
First Arab genome sequenced
failures. The fact remains that despite the hundreds of millions of dollars spent on genome-
Millionth comment party in Sydney on 17th September
wide association studies, most of the genetic variance in risk for most common diseases remains undiscovered. Indeed, some common diseases with a strong heritable component, such as bipolar disease, have remained almost completely resistant to GWAS.
●
●
●
●
●
10 hints on parsing Cheap personal genomics: the death-knell for the industry?
Where is this heritable risk hiding? It now seems likely that it's lurking in a number of different places, with the fraction of the risk in each category varying from disease to disease. This post serves as a generic list of the dark regions of the genome currently inaccessible to GWAS, with some discussion of the techniques that will likely prove useful in mapping risk variants in Is a personal genome sequence these areas. Cheap as chips: 23andMe slashes the price of personal genomics worth $350,000?
Alleles with small effect sizes The problem: The ability to simultaneously examine hundreds of thousands of variants http://scienceblogs.com/geneticfuture/2008/09/why_do_genomewide_scans_fail.php (1 of 8) [27/09/2008 14:49:28]
Search All Blogs
Genetic Future : Why do genome-wide scans fail?
Recent Comments
● zayzayem throughout the genome is both the strength and the weakness of the GWAS approach. The on Google co-founder at power of GWAS is that they provide a relatively unbiased examination of the entire genome increased risk of Parkinson's, for common risk variants; their weakness is that in doing so, they swamp the signal from true according to 23andMe
●
●
●
●
●
●
●
●
Henk Visscher on Google cofounder at increased risk of Parkinson's, according to 23andMe Steven Murphy on HapMap phase 3 data available for browsing Daniel MacArthur on David Goldstein on the failures of genome-wide association studies
risk variants with statistical noise from the vast numbers of markers that aren't associated with disease. To separate true signals from noise, researchers have to set an exceptionally high threshold that a marker needs to exceed before it is accepted as a likely disease-causing candidate. That reduces the problem of false positives, but it also means that any true disease markers with small effects are lost in the background noise. The solution: This seems to be one problem that will need to be solved, at least to some extent, with sheer brute force. By increasing the numbers of samples in their disease and control groups researchers will steadily dial down the statistical noise from non-associated markers until even disease genes with small effects stand out above the crowd. As the cost of genotyping (and sequencing) tumbles ever downward such an approach will become more and more feasible; however, the logistical challenge of collecting large numbers of carefully-
razib on David Goldstein on the ascertained patients will always be a serious obstacle. failures of genome-wide association studies Rare variants Hakon Hakonarson on David Goldstein on the failures of genome-wide association studies
The problem: Current genome scan technology relies heavily on the "common disease, common variant" (CDCV) assumption, which states that the genetic risk for common disease is mostly attributable to a relatively small number of common genetic variants. This is largely an assumption of convenience: firstly, our catalogue of human genetic variation (built up by
Jason Malloy on David Goldstein efforts such as the HapMap project) is largely restricted to common variants, since rare on the failures of genome-wide variants are much harder to identify; and secondly, chip-makers have restrictions on how association studies many different SNPs they can analyse on a single chip, so the natural tendency has been to razib on David Goldstein on the cram in the high-frequency variants that capture the largest proportion of genetic variation per probe. There is also some theoretical justification for this assumption based on models of failures of genome-wide human demographic history, but these models are themselves based on numerous association studies
assumptions, and the argument may not apply equally to all common human diseases.
autumnmist on David Goldstein on the failures of genome-wide In any case, everyone agrees that some non-trivial fraction of the genetic risk of common association studies
diseases will be the result of rare variants, and the latest results from GWAS in a variety of
●
razib on David Goldstein on the diseases have failed to provide unambiguous support for the CDCV hypothesis. Whatever the failures of genome-wide proportion of variance that turns out to be explained by rare variants, current GWAS association studies
technologies are essentially powerless to unravel it. Archives
●
September 2008
Blogs I read: Genetics Blogs: ●
John Hawks
●
Gene Expression
●
Gene Expression SB
●
evolgen
●
Popgen Ramblings
●
Eye on DNA
The solution: Increasing sample sizes may help a little, but the fundamental problem is the inability of current chips to tag rare variation. Short-term, the solution will be higher-density SNP chips incorporating lower frequency variants identified by large-scale sequencing projects like the 1000 Genomes Project. However, such approaches will have diminishing returns: as chip-makers lower the frequency of the variants on their chips, the number of probes that will have to be added to capture a reasonable fraction of total genetic variation will increase exponentially, with each new probe adding only a minute increase in power. Ultimately, the answer lies in large-scale sequencing, which will provide a complete catalogue of every variant in the genomes of both patients and controls. The problem here is not so much the sequencing itself - the costs of sequencing are currently plummeting due to massive investment in rapid sequencing technologies - but in the interpretation. Whole new analytical
http://scienceblogs.com/geneticfuture/2008/09/why_do_genomewide_scans_fail.php (2 of 8) [27/09/2008 14:49:28]
Genetic Future : Why do genome-wide scans fail? ●
genomeboy
●
The Personal Genome
●
Yann Klimentidis
●
techniques will be required to convert these data into useful information. Population differences
Discovering Biology in a Digital The problem: Over the last 50 to 100 thousand years modern humans have enthusiastically World colonised much of the world's landmass. Each wave of expansion has carried with it a fraction
of the genetic variation of its ancestral population, along with a few novel variants acquired
●
The Genetic Genealogist
●
business|bytes|genes|moleculesthrough mutation. In each new habitat encountered, natural selection has acted to increase
●
Mass Genomics
●
Thomas Mailund
●
ThinkGene
●
Genomicron
Corporate Blogs: ●
OpenHelix
●
23andMe
●
Navigenics
●
deCODEme
●
DNA Direct
●
CLC Bio NGS
●
Gene Sherpa
Bits and Pieces
the frequency of variants that provided an advantage, and cull those that were harmful, while the rest of the genome passively gained and lost genetic variation. The end result is a set of human populations that, while extremely similar across the genome as a whole, can carry quite different sets of genetic variants relevant to disease. In addition, the correlation between markers close together in the genome (known as linkage disequilibrium) can also differ between populations, so that a marker that is tightly correlated with a disease variant in one population may be only weakly associated in other groups. These differences have profound implications for disease gene mapping efforts. As a result of this variation, markers that are associated with disease in one population can never be assumed to show the same associations in other human groups (this will be especially true for rare variants, of course). Current GWAS have been dominated by subjects of Western European ancestry, and our understanding of genetic risk variants in non-European populations is almost non-existent. In addition, these differences mean that mixing people with different ancestries together in a disease cohort can seriously confound the identification of causative genes - in certain situations, such mixing can greatly increase the risk of false positive findings. The solution: For GWAS results to be universally applicable, they will need to be performed in cohorts from a wide range of populations. Data-sets such as the HapMap project, the Human Genome Diversity Panel and the powerful new 1000 Genomes Project will provide information about the patterns of genetic variation in diverse populations that is needed to design the assays for GWAS. A greater challenge will be collecting the large numbers of ancestryhomogeneous samples - both well-validated disease patients and healthy controls - required for GWAS approaches to be successful. This problem is likely to be particularly acute for African populations, where linkage disequilibrium is lower and genetic diversity much higher than in other regions (thus requiring larger numbers of markers and individuals to identify disease variants); and of course, in Africa and much of the rest of the world, local governments typically have much more pressing issues than genome scans to spend their limited health budgets on. Epistatic interactions The problem: Most current genetic approaches assume that genetic risk is additive - in other words, that the presence of two risk factors in an individual will increase risk by the sum of the two factors by themselves. However, there's no reason to expect that this will always be the case. Epistatic interactions, in which combined risk is greater (or less) than the sum of the risk from individual genes, are difficult to identify with genome scans and even harder to untangle. If epistasis is strong, then just a few genes - each with a weak effect by itself, well below the threshold of a scan - could in concert explain a large chunk of genetic risk. Such a situation would be largely invisible to current approaches. The solution: Large sample sizes, and clever analytical techniques. I'm not going to attempt a
http://scienceblogs.com/geneticfuture/2008/09/why_do_genomewide_scans_fail.php (3 of 8) [27/09/2008 14:49:28]
Genetic Future : Why do genome-wide scans fail?
more detailed answer as this area is well outside my knowledge zone - but fortunately, it's an active area of research (see, for instance, the Epistasis Blog). I'd welcome any comments from people who know more about epistasis than I do about the likely scope of this problem and the methods that will be used to resolve it. Copy number variation The problem: One of the great surprises of the last five years has been the discovery of widespread, large-scale insertions and deletions of DNA, known as copy number variations (CNVs), in even healthy genomes. CNVs are now known to account for a substantial fraction of human genetic variation, and have been shown to play a role in variation in human gene expression and in human evolution. It seems highly likely that CNVs will be responsible for a non-trivial proportion of common disease risk. However, our understanding of these variants is still in its infancy. The chips currently used in GWAS, which interrogate single base-pair variations between individuals known as SNPs, can be used to detect a small proportion of CNVs indirectly (by looking for distortions of signal intensity or inheritance patterns), and may effectively "tag" a fraction of the remainder (by using SNPs that are very close to the CNV, and therefore tend to be inherited along with it). However, the vast majority of copy number variation remains invisible to current GWAS technology. The solution: High-resolution tiling arrays - chips containing millions of probes, each of which binds to a small region of the genome - can be used to explore CNVs in some areas of the genome, but they break down for the large fraction of the genome containing repetitive elements. Ultimately, the complete detection of CNVs from patients and controls will require whole-genome sequencing, preferably using methods with much longer read lengths than the current crop of rapid sequencing technologies. Epigenetic inheritance The problem: Not all inherited information is carried in the DNA sequence of the genome; a child also receives "epigenetic" information from its parents in the form of chemical modifications of DNA that can alter the expression of genes - and thus physical traits - without changing the sequence. Although epigenetic inheritance is known to occur, the degree to which it influences human physical variation and disease risk is essentially totally unknown. All existing technologies used in GWAS are based on DNA sequence, and thus don't detect epigenetic variation. It is even invisible to full-genome sequencing. The solution: It first needs to be established that epigenetically inherited variations do actually contribute a non-trivial fraction of human disease risk. If so, techniques currently being developed to identify these variants in a high-throughput fashion could be used to perform EWAS (epigenome-wide association studies). Disease heterogeneity The problem: Some "diseases" are actually simply collections of symptoms, which may stem from multiple, distinct genetic causes. Lumping patients with fundamentally different conditions into a single patient cohort for a GWAS is a recipe for failure: even if there are strong genetic risk factors for each one of the separate conditions, each of these will be http://scienceblogs.com/geneticfuture/2008/09/why_do_genomewide_scans_fail.php (4 of 8) [27/09/2008 14:49:28]
Genetic Future : Why do genome-wide scans fail?
drowned out by the noise from the other, unrelated diseases. The problem is that for some diseases - particularly mental illnesses, where causation lurks deep within the complex and poorly-understood human brain - the knowledge and tools required to separate patients into distinct sub-categories simply may not exist yet. The solution: The geneticists can't fix this one - it will take a combined effort from clinicians and medical researchers to break down complex diseases into useful diagnostic categories, which can then each be subjected to separate genetic analysis. In the cancer arena, conditions previously lumped together as one entity have now been separated using new technologies such as gene expression arrays; similar approaches will no doubt prove fruitful in a range of other diseases, although the inaccessibility of brain tissue will make it more difficult to apply such approaches to mental illness. The future of genetic association studies Current chip-based technologies for genome-wide analysis, while having some success in identifying the lowest-hanging genetic fruit for many common diseases, seem to have already started to run up against barriers that are unlikely to be overcome by simply increasing sample sizes. These technologies should really be regarded as little more than a placeholder for whole-genome sequencing, which should become affordable enough to use for large-scale association studies within 3-5 years. The application of cheap, rapid sequencing technology is likely to generate a harvest of new disease genes that far exceeds the yield of current GWAS, by providing simultaneous access to both the rare variants and copy number variations that are inaccessible to current chip-based approaches. However, building a more complete catalogue of the heritable variants that drive common disease risk will require more than just cheap sequencing: it will also take advances in clinical diagnostics to better sub-categorise patients into homogeneous groups, as well as new and powerful analytical approaches to cope with the torrent of sequence data, and to efficiently identify epistatic interactions between disease variants. To have any chance of picking out variants of small effect from whole-genome sequencing data sample sizes will have to be enormous - massive cohorts currently being assembled, such as the 500,000person UK Biobank and a similar NIH-funded study currently in the works, will provide essential raw material for the selection of participants. Naturally, to be applicable to humanity as a whole, cohorts will need to be gathered separately from many different human populations. Finally, epigenetic variation remains a wild-card of uncertain significance, which will need to be tackled with a different set of high-throughput technologies (although it's likely that many of these will feed on advances in high-throughput sequencing). Although I probably sound pretty negative about GWAS, I want to emphasise that the current problems are the result of technological limitations that will soon disappear. Barring global catastrophe, within the lifetimes of most of those reading this post we will have a nearcomplete catalogue of the genetic variants influencing the risk of most of the common diseases that plague the industrialised world (and, hopefully, many of those that plague the rest of humanity). Together with parallel advances in medical science, this catalogue will provide an unprecedented ability to predict, treat and potentially completely eliminate a host http://scienceblogs.com/geneticfuture/2008/09/why_do_genomewide_scans_fail.php (5 of 8) [27/09/2008 14:49:28]
Genetic Future : Why do genome-wide scans fail?
of common diseases. It will also bring social and ethical challenges of unprecedented magnitude - but that's a topic for another post...
Subscribe to Genetic Future.
Find ShareThis more
Comments posts in: I'm not searching through your old archives right now, but it would be nice to read a post on Lifegenome-wide scan succeed." Given the current state of the technology and analyses, "where Science are there commonalities to the successes? Are there particular types of conditions where this approach is more likely to succeed? There would be some overlap with this text, but it would be a nice parallel. Posted by: bsci | September 15, 2008 12:03 PM
Your study confirms the sad consequence of neglecting the structure and the function of the epigenetic control system of the organism. Indeed, the subject requires focusing on the physical carrier of this control system and nobody wants to challenge the current physical paradigm. The subject is discussed in our book presented at www.misaha.com Best regards, Savely Savva Posted by: Savely Savva | September 15, 2008 2:59 PM
bsci - I'm currently working on precisely that post. :-) Posted by: Daniel MacArthur | September 15, 2008 8:07 PM
Add dyslexia to the list. Too many variables. Posted by: gillt | September 15, 2008 11:22 PM
Nicholas Wade interviews Daniel Goldstein of Duke University on related subjects in today's NYT. Link via Razib at GNXP, who adds useful context. Posted by: AMac | September 16, 2008 10:18 AM
Post a Comment (Email is required for authentication purposes only. Comments are moderated for spam, your comment may not appear immediately. Thanks for waiting.)
Name:
Email Address:
http://scienceblogs.com/geneticfuture/2008/09/why_do_genomewide_scans_fail.php (6 of 8) [27/09/2008 14:49:28]
Genetic Future : Why do genome-wide scans fail?
URL:
Comments: (you may use HTML tags for style)
Preview
Post
Having problems commenting? (UPDATED)
●
Focus
●
News ●
Seed's Daily Zeitgeist: 8/7/2008
●
Magazine ●
Mechanical Generation
●
Beauty and the Brain
●
The Creation Simulation ✔
●
●
Standing on the Shoulders of Giants
YES! Send me a free issue of Seed. If I like what I see, I'll receive 5 more issues (6 in all) for just $14.95. That's
Steven Pinker on Swearing and Violence
50% off the cover price! If I'm not completely satisfied, I'll simply write
●
Seed Salon: Jill Tarter + Will Wright
●
Wing of Bat, and Mouse's Leg
●
Inheriting Confucius
"cancel" on the invoice and owe nothing. The free issue is mine to keep. First Name:
Last Name:
Address:
City:
State:
Zip Code:
Email:
(Non-U.S. subscribers, click here.) Copyright ©2005-2008 ScienceBlogs LLC · Advertise with Seed · Privacy Policy · Terms & Conditions · Contact Us · Home
http://scienceblogs.com/geneticfuture/2008/09/why_do_genomewide_scans_fail.php (7 of 8) [27/09/2008 14:49:28]
Genetic Future : Why do genome-wide scans fail?
http://scienceblogs.com/geneticfuture/2008/09/why_do_genomewide_scans_fail.php (8 of 8) [27/09/2008 14:49:28]