One of the main questions we sought to address in the paper which we recently published in Nature [1] relates to the relative contribution of common (technically speaking, those with an allele frequency over 5%) as opposed to lower-frequency alleles with respect to  predisposition to type 2 diabetes (T2D). We concluded that the evidence is increasingly stacking up in favour of the view that most of the genetic risk of T2D can be attributed to common alleles that are widely shared within, and between, human populations.

In this blog, I try to explain what that evidence is, why this question matters, and what the answer tells us about the forces that might underlie the exploding prevalence of this condition?

Two contrasting views of genetic architecture

This question — whether the genetic architecture of common, complex diseases like type 2 diabetes is best described in terms of the joint effects of a large number of shared variants of small effect, or alternatively as a jumble of genetically-distinct (quasi-)Mendelian syndromes — has been the subject of much debate over the years, and has surfaced in many different forms.

It can be traced back to the biometrician vs Mendelian debate over a century ago which featured different, apparently irreconcilable, views regarding the basis for common, continuous, inherited traits. Bateson and others in the Mendelian camp were focused on the discrete effects of segregating variation that could be shown to be consistent with Mendel’s “laws”. The biometricians (led by Pearson, Weldon and others) were struggling to see how those rules could be applied to continuous traits like height. Ultimately, these opposing views were reconciled by Fisher and colleagues, who demonstrated how the combined effects of multiple small variants, each of them observing the rules of Mendelian segregation could underpin continuous traits. This was termed the “infinitesimal” model. The concept of a continuous scale of liability (with disease defined when some threshold on that scale is exceeded), means that this model is equally applicable to apparently discrete traits such as diabetes or schizophrenia.

Large scale genetic data arrives

The advent of genome-wide association analyses provided empirical data to reinvigorate this debate.

Some were convinced that common diseases were largely the consequence of common variation along the lines of the infinitesimal model. This “common disease, common variant” argument was based on an important (but underappreciated) fact about human variation. Whilst the number of rare variant sites massively exceeds that of common variant sites (if we sequence enough humans we should find rare alleles at all sites where heterozygosity is compatible with life), most of the genetic differences between two individuals are to be found at common sites. If this sounds counterintuitive, try thinking about it this way. Whilst there are many, many rare allele sites across the genome, each of us is boringly wild-type (ie carries two copies of the “normal” allele) at virtually all of those sites, and they contribute little to interindividual variation. The question about the frequency spectrum of the variants influencing a common disease like diabetes then becomes: are the set of variants that influence diabetes risk somehow different from the broad swath of overall variation?

The contrary view (sometimes referred to as “common disease, rare variant”) holds that diseases like diabetes are really a collection of discrete syndromes in which risk is dominated by rare, large effect variants private to an individual and his or her relatives. After all, we already know some such syndromes (the various forms of maturity onset diabetes of the young are often hard to distinguish clinically from common forms of T2D): perhaps there are just way more of those waiting to be found? Mary-Claire King in a recent review, invoked Tolstoy to illustrate this concept: “Every unhappy family is unhappy in its own way” [2]. Others have used the term “clan genomics” to describe the view that the genetic contribution to common phenotypic variation would be dominated by rare alleles of recent origin and large effect that cluster within closely-related individuals [3]. The concept of synthetic association (the idea that common variant signals detected by GWAS could be driven by multiple rare causal variants) comes from the same well-spring of Mendelian focus [4]. One often hears at meetings the (apparently) accepted “wisdom” that “type 2 diabetes is really a collection of diseases”, the implication being that we will, at some point in the future, be able to shatter this diagnostic monolith into its constituency of discrete diseases (type 2A, type 2B …, type 2Z).

High Throughput Sequencing with an Illumina HiSeq X Sequencing System

Does it matter?

This may seem an abstract, almost Jesuitical, argument (“how many variants dance on the head of a pin?”). But the answer matters a great deal more than appears at first glance. It matters in terms of defining the best strategies for uncovering risk variants and using them for mechanistic insight. And it matters with regard to the most effective ways for using genetic information to develop more personalised (precision, individualised) strategies for treatment and prevention.

The case for common variants

The genome wide association (GWAS) approaches of the past decade have, for technical reasons, been biased towards the detection of association signals emanating from common variants. It’s no surprise therefore that almost all of the 100 or so genome wide significant signals for T2D appear to be driven by common, shared variants. At the same time, those 100 signals explain only around 10% of the overall genetic contribution to T2D [5].

Is this “missing” heritability down to a long tail of common variant effects, or does it represent a “hidden iceberg” of rare variant effects that have been invisible to common variant-biased discovery efforts?

The transition from array-based GWAS to sequence-based GWAS makes it possible to address this question, since it brings variants of all frequencies into view. In the paper just published in Nature, we have been able to start to pursue this strategy, and, for the first time, make a fair comparison of the contribution of those different types of allele to T2D risk.

I won’t go into the detail of what we discovered, but instead summarise the various lines of evidence that we and others have collected, all of which appear to be pointing towards the same conclusion:

  • In sequence based studies, we and others (notably our colleagues at Decode in Iceland [6]) have found only a handful of rare or low frequency alleles of large effect. This is true for analysis conducted at the single variant level but also for efforts to improve power through gene-level aggregation of rare alleles;
  • In conventional GWAS, the inclusion of ever larger sample sizes, and ever better imputation reference panels (both of which offer more traction for low frequency variants at least) has failed to deliver large numbers of lower frequency signals [7][9];
  • In exome array studies, which have made a subset of rare and low frequency variants of high biological pertinence available for high volume genotyping, discoveries have been limited to common variant signals; in the present study, we estimate that coding variants in the 0.1% to 5% frequency range, though far more numerous than those in the 5-50% range, make a considerably smaller contribution to individual risk of diabetes;
  • In fine mapping studies, there are few, if any, robust examples where focused genotyping has resolved the original common variant signal to a lower-frequency causal variant of larger effect [8]; in the present study, we extend this study to rare variants and again fail to find any evidence for the synthetic association model;
  • In trans-ethnic association studies, it is remarkable how many of the GWAS loci identified in one ethnic group can be detected in others: this would not be expected if these were driven by rare alleles [9];
  • In GWAS studies, we can show that a long tail of signals below genome wide significance makes a substantial contribution to overall diabetes risk [10], mirroring similar work for other traits performed by Peter Visscher and colleagues using the GCTA approach [11];
  • In simulation studies, where one models the allele frequency spectrum expected of T2D risk alleles under different assumptions regarding selective pressure, the distribution of association signals observed empirically is most obviously consistent with a model of limited selection pressure, and domination by common risk-alleles [12].

What this means for evolutionary selection

The conclusions seem clear. For T2D, genetic risk is predominantly driven by common risk variants that are widely shared both within and between populations. In other words, the allele frequency spectrum of diabetes risk variants (in terms of their contribution to interindividual variation in diabetes risk) mirrors the distribution of genetic variation at large. This in turn indicates that the variants that influence T2D risk have been under limited selection in human prehistory. (If a set of risk alleles were under very strong negative selection, any that arose through mutation would have limited longevity, and would constantly be eradicated from the population: the only risk alleles you would see would be those that arose recently, and they would be rare and private and unique to a lineage. Of course that’s exactly what you see with Mendelian alleles).

On one level that’s not a surprise. After all, T2D appears to be a relatively recent arrival as a major global contributor to disease burden, and its negative impact on health is mostly (though not exclusively) experienced in post-reproductive years. Both of those should limit the extent to which diabetes itself could have resulted in selective pressure.  Diabetes risk-alleles might, of course, have been the subject of adverse selection on the basis of some other, pleiotropic, impact, but the evidence points against that. It has been suggested that T2D risk alleles have been advantageous in human prehistory (the “thrifty genotype” concept) but efforts to detect the resonance of balancing selection in large-scale genetic data have not been successful [13].

B0009538 DNA double helix, illustration Credit: Anna Tanczos. Wellcome Images Illustration of the DNA double helix structure first discovered by Watson and Crick in 1953. The DNA fragment depicted here is slightly distorted. The sugar-phosphate backbone is visible on complementary nucleotide strands with paired bases represented as rungs on a ladder. Digital artwork/Computer graphic 2014 Published: - Copyrighted work available under Creative Commons by-nc-nd 4.0, see

DNA double helix, illustration,

What our findings do NOT say.

Let me finish by pointing out some of the (hopefully obvious) caveats.

First, what we find for T2D may or may not be relevant to other complex traits. Certainly, one would expect that rarer alleles would have a proportionately larger impact for complex diseases of earlier onset, with more profound effects on morbidity, mortality and fecundity (such as autism or schizophrenia).

Second, we have only just started our exploration of the rare variant space. The numbers of subjects we have been able to examine through sequencing remains far lower than those for whom we have common variant GWAS data. Our studies are thus far from comprehensive, and so far we provide a far more systematic exploration of the contribution of lower-frequency variants than the really rare ones.

Third, just because low-frequency and rare alleles don’t dominate the spectrum of risk for T2D does not mean that we should shut down the sequencers. Just as Mendelian alleles have provided powerful new insights into the biology of health and disease, the identification of high impact alleles influencing T2D risk (whether common or rare, though most of them will, of course, be the latter) will highlight key processes involved in the maintenance of metabolic homeostasis. Such pathways provide opportunities for the design of novel preventative and therapeutic approaches. The impact of protein truncating variants in the SLC30A8 gene provides an excellent example [14]. Many more such examples are likely to flow as more individuals, of more diverse ethnic origin, are sequenced.

Fourth, whilst it’s easier to write from an oppositional perspective (common vs rare, nature vs nurture, complex vs mendelian), the truth of course is far more inclusive. The genetic contribution to individual risk of T2D is influenced by common variants, by low-frequency variants, and by rare variants. T2D predisposition is also subject to the effects of events, exposures and experiences in early life, childhood, adolescence, adulthood and beyond. Somatic mutation almost certainly plays a role. Transgenerational epigenetic inheritance may be involved. To understand individual risk, and to define the mechanistic basis of T2D, we will first need to examine, dissect, characterise and quantify the contribution of each. Only once we have done that will we be able to generate a comprehensive model of T2D pathogenesis.



1 Fuchsberger C et al The genetic architecture of type 2 diabetes Nature 11 July 2016
2 McClellan J, King MC Genetic heterogeneity in human disease. Cell 2010;141:201
3 Lupski JR et al Clan Genomics and the Complex Architecture of Human Disease Cell 2011, 147, 32
4 Dickson SP et al Rare variants create synthetic genome-wide associations PLoS Biol 2010;8:e1000294
5 Morris AP et al Large-scale association analysis provides insights into the genetic architecture and pathophysiology of type 2 diabetes Nature Genetics 2012;44:981-990
6 Steinthorsdottir V et al Identification of low-frequency and rare sequence variants associated with elevated or reduced risk of type 2 diabetes Nature Genetics 2014;46:294
7 Morris AP et al Large-scale association analysis provides insights into the genetic architecture and pathophysiology of type 2 diabetes Nature Genetics 2012;44:981-990
8 Gaulton KJ et al Genetic fine mapping and genomic annotation defines causal mechanisms at type 2 diabetes susceptibility loci Nat Genet. 2015;47:1415-25
9 Mahajan A et al Genome-wide trans-ancestry meta-analysis provides insight into the genetic architecture of type 2 diabetes susceptibility Nat Genet. 2014;46:234-244
10 Morris AP et al Large-scale association analysis provides insights into the genetic architecture and pathophysiology of type 2 diabetes Nature Genetics 2012;44:981-990
11 Yang J Genetic variance estimation with imputed variants finds negligible missing heritability for human height and body mass index Nature Genetics 2015;47:1114
12 Agarwala V Evaluating empirical bounds on complex disease genetic architecture Nature Genetics 2013;45:1418
13 Ayub Q et al Revisiting the thrifty gene hypothesis via 65 loci associated with susceptibility to type 2 diabetes American Journal of Human Genetics 2014;94,1
14 Flannick J Loss-of-function mutations in SLC30A8 protectagainst type 2 diabetes Nat Genet. 2014;46:357

Robert Turner Professor of Diabetic Medicine, Group Head, Wellcome Trust Centre for Human Genetics, Group Head / PI, Grant Holding Senior Scientist, Consultant Physician.