©
20
10
N
at
ur
e
A
m
er
ic
a,
In
c.
A
ll
rig
ht
s
re
se
rv
ed
.
Nature GeNetics ADVANCE ONLINE PUBLICATION �
l e t t e r s
Endometriosis is a common gynecological disease associated
with pelvic pain and subfertility. We conducted a genome-wide
association study (GWAS) in 3,�94 individuals with surgically
confirmed endometriosis (cases) and 7,060 controls from
Australia and the UK. Polygenic predictive modeling showed
significantly increased genetic loading among �,364 cases with
moderate to severe endometriosis. The strongest association
signal was on 7p�5.2 (rs�2700667) for ‘all’ endometriosis
(P = 2.6 × �0–7, odds ratio (OR) = �.22, 95% CI �.�3–�.32)
and for moderate to severe disease (P = �.5 × �0−9, OR =
�.38, 95% CI �.24–�.53). We replicated rs�2700667 in an
independent cohort from the United States of 2,392 self-
reported, surgically confirmed endometriosis cases and 2,27�
controls (P = �.2 × �0−3, OR = �.�7, 95% CI �.06–�.28),
resulting in a genome-wide significant P value of �.4 × �0−9
(OR = �.20, 95% CI �.�3–�.27) for ‘all’ endometriosis in
our combined datasets of 5,586 cases and 9,33� controls.
rs�2700667 is located in an intergenic region upstream of the
plausible candidate genes NFE2L3 and HOXA10.
Endometriosis (MIM131200) is a disease affecting 6–10% of women
of reproductive age1 with substantial annual health costs2 and health
burden for individuals3,4. Common symptoms include chronic pelvic
pain, severe dysmenorrhea (painful periods) and subfertility. The
causes of endometriosis remain uncertain despite over 50 years of
hypothesis-driven research. Disease severity is classified using the
revised American Fertility Society (rAFS) system5, assigning affected
individuals to one of four stages (stages I–IV, defined as minimal to
severe disease) based on lesion size and associated pelvic adhesions.
However, it remains unclear whether the disease progresses through
these stages, and it has been suggested that small lesions (present in
disease stages I and II) represent an epiphenomenon rather than a
disease entity6. Endometriosis risk is influenced by genetic factors7–14
and has an estimated heritability of around 51%.
We genotyped 3,194 unrelated cases with surgically confirmed
endometriosis recruited by the International Endogene Consortium,
IEC (QIMR, Australia dataset, n = 2,270; Oxford, UK dataset, n =
924)15, using the Illumina Human670Quad BeadArray (Online
Methods). We assessed disease stage from surgical records using
the rAFS classification system5,15 and grouped the subjects into two
phenotypes: stage A (stage I or II disease or some ovarian disease with
a few adhesions; n = 1,686, 52.7%) or stage B (stage III or IV disease;
n = 1,364, 42.7%), or unknown (n = 144, 4.6%) (Supplementary Table 1).
Illumina Human610Quad control genotypes for QIMR cases were
available for 1,870 individuals in an adolescent twin study16,17. For
the Oxford cases, we obtained Illumina Human1M-Duo genotypes
for 5,190 UK population controls from the Wellcome Trust Case
Control Consortium (WTCCC2). Although endometriosis affects
only women, the Australian and UK control sets included men to
maximize the power of the association detection on the autosomal
chromosomes (Online Methods). We detected no significant auto-
somal allele frequency differences between the male and female con-
trol samples (Supplementary Fig. 1), indicating that the association
signals would not be influenced by a differing female to male ratio in
the cases and controls.
Studies to date have established that endometriosis is heritable
but have not addressed the genetic burden for different disease
stages. We used the GWAS data to assess genetic loading in cases in
two complementary ways. Using a new method18, we estimated the
proportion of variation in case-control status that can be explained
Genome-wide association study identifies a locus at
7p15.2 associated with endometriosis
Jodie N Painter1,13, Carl A Anderson2,3,13, Dale R Nyholt4,13, Stuart Macgregor5, Jianghai Lin6, Sang Hong Lee5,
Ann Lambert6, Zhen Z Zhao1, Fenella Roseman6, Qun Guo7, Scott D Gordon8, Leanne Wallace1, Anjali K Henders1,
Peter M Visscher5, Peter Kraft9,10, Nicholas G Martin8, Andrew P Morris2, Susan A Treloar1,11,14,
Stephen H Kennedy6,14, Stacey A Missmer7,9,12,14, Grant W Montgomery1,14 & Krina T Zondervan2,6,14
1Molecular Epidemiology, Queensland Institute of Medical Research, Herston, Queensland, Australia. 2Genetic and Genomic Epidemiology Unit, Wellcome Trust
Centre for Human Genetics, University of Oxford, Oxford, UK. 3Wellcome Trust Sanger Institute, Wellcome Trust Genome Campus, Hinxton, UK. 4Neurogenetics
Laboratory, Queensland Institute of Medical Research, Herston, Queensland, Australia. 5Queensland Statistical Genetics, Queensland Institute of Medical Research,
Herston, Queensland, Australia. 6Nuffield Department of Obstetrics and Gynaecology, University of Oxford, John Radcliffe Hospital, Oxford, UK. 7Channing Laboratory,
Department of Medicine, Brigham and Women’s Hospital and Harvard Medical School, Boston, Massachusetts, USA. 8Genetic Epidemiology, Queensland Institute of
Medical Research, Herston, Queensland, Australia. 9Department of Epidemiology, Harvard School of Public Health, Boston, Massachusetts, USA. 10Department of
Biostatistics, Harvard School of Public Health, Boston, Massachusetts, USA. 11Centre for Military and Veterans’ Health, The University of Queensland, Mayne Medical
School, Queensland, Australia. 12Department of Obstetrics, Gynecology and Reproductive Biology, Brigham and Women’s Hospital and Harvard Medical School,
Boston, Massachusetts, USA. 13These authors contributed equally to this work. 14These authors jointly directed this work. Correspondence should be addressed to
K.T.Z. (krina.zondervan@well.ox.ac.uk) or J.N.P. (jodie.painter@qimr.edu.au).
Received 14 May; accepted 17 November; published online 12 December 2010; doi:10.1038/ng.731
lenovo
高亮
lenovo
矩形
©
20
10
N
at
ur
e
A
m
er
ic
a,
In
c.
A
ll
rig
ht
s
re
se
rv
ed
.
2 ADVANCE ONLINE PUBLICATION Nature GeNetics
l e t t e r s
by considering all SNPs simultaneously through inference of
distant relatedness from marker data and comparing it to case-control
status (Online Methods). The proportion of variation in case-control
status explained by the GWAS data was highly significant for both
‘all’ and stage B endometriosis (Table 1 and Supplementary Table 2).
The estimate for stage B endometriosis (0.34, s.e. = 0.04) was signi-
ficantly higher than that for stage A endometriosis (0.15, s.e. =
0.04; Table 1).
We also assessed the genetic loading of the different stages using a
prediction approach (Online Methods)19 in which we used the Oxford
data as a discovery set to identify increasingly large SNP sets ranked
on their significance of association (‘allele specific scores’) and used
these scores to predict disease status in target samples from QIMR.
The discovery and target sets were then reversed (Supplementary
Fig. 2). Oxford ‘all’ endometriosis predicted endometriosis in the
QIMR sample, with the smallest P value (P = 8.4 × 10−6) obtained for
a score set including ~75% of the SNPs (Fig. 1). This result was highly
significant, although the proportion of variance explained was small
(maximum Nagelkerke r2 of 0.007; 0.7% of the variance). For stage
B cases, the proportion of variance explained by most score sets was
higher; for example, the score set including the ~20% most associated
SNPs (P = 3.5 × 10−7) explained 1.3% of the variance, consistent with
a greater (polygenic) genetic loading for stage B disease.
We performed two genome-wide association analyses stratified by
dataset (QIMR and Oxford) using (i) 3,194 ‘all’ endometriosis cases
and (ii) 1,364 stage B cases, given their substantially greater genetic
loading (Online Methods). For ‘all’ endometriosis, we observed the
strongest signal for rs12700667 in an intergenic region on chromo-
some 7p15.2 (P = 2.6 × 10−7, OR = 1.22, 95% CI 1.13–1.32; Table 2).
As predicted from our quantitative genetic analyses, we observed
stronger signals of association across the
genome for stage B disease compared to
‘all’ endometriosis (Supplementary Fig. 3).
The 7p15.2 signal for stage B endometriosis
was considerably stronger, producing P =
1.5 × 10−9, OR = 1.38, 95% CI 1.24–1.53
(Table 2) for rs12700667 and P = 6.0 × 10−8,
OR = 1.34, 95% CI 1.21–1.49 for the nearby
SNP rs7798431 (r2 = 0.87). A second strong
association was found for rs1250248 (2q35)
within FN1 (P = 3.2 × 10−8) (Supplementary
Table 3). Results for the SNPs rs12700667,
rs7798431 and rs1250248 remained genome-
wide significant after adjustment for multiple
testing in the two non-independent genome-
wide association analyses using permutation
(Online Methods). Only one of the permuted
genome-wide association analyses produced an independent
P value less than that observed for rs12700667 (P = 0.001). The
SNPs rs12700667 and rs7798431 lie in a narrow region of strong LD
(r2 > 0.8) that extends approximately 48 kb. Following imputation using
1000 Genomes Project and HapMap data (Fig. 2 and Supplementary
Note) conditioning on the effect of rs12700667 in logistic regression
analysis showed no other independent associations with ‘all’ or stage B
endometriosis in the region.
In addition to the three genome-wide significant SNPs, we geno-
typed 70 SNPs that produced nominal evidence of association with
‘all’ (P < 1.0 × 10−4) or stage B endometriosis (P <1.0 × 10−4 in stage B
and P <1.0 × 10−3 in ‘all’ endometriosis analyses; Online Methods) in
an independent IEC dataset comprising 2,392 self-reported surgically
confirmed cases from the Nurses’ Health Study II (NHSII) and 2,271
controls from GWAS of breast cancer20 and kidney function from
table 1 estimates of proportion of variation due to common
genetic variants for ‘all’ endometriosis and stage A or B disease
using genome-wide sNP data from cases and controlsa
Phenotypes Cases Controls
Proportion of
variation (s.e.) P
All endometriosis 3,154 6,981 0.27 (0.04) 4.4 × 10−16
Stage B 1,347 6,981 0.34 (0.04) 4.4 × 10−16
Stage A 1,666 6,981 0.15 (0.04) 2.6 × 10−4
aProportion of variation and associated P values for the likelihood ratio test were estimated
using a linear mixed model incorporating 203,826 SNPs from the GWA panel after additional
QC. Case and control numbers are slightly lower than for the GWA analyses due to the stricter
QC measures (Online Methods). Stage A and stage B estimates of the variance explained are
significantly different from each other (P = 1.8 × 10−3, using a two sample t-test which is
conservative since the control samples are the same). Results were verified by prediction of
individual genetic risk using QIMR and Oxford as alternate “discovery” and “target” datasets
(supplementary table 2).
0
0.01 0.01 0.1 0.2 0.3 0.4 0.5 0.750.050.05 0.1 0.2 0.3 0.4 0.5 0.75
Proportion of top SNPs included
in prediction
Proportion of top SNPs included
in prediction
0.005
0.010
0.015
P
ro
po
rt
io
n
of
v
ar
ia
nc
e
ex
pl
ai
ne
d
(R
2 )
0.020
a bP values for each R2 appear
above each bar
P values for each R2 appear
above each bar
0.
03
11
5.
93
×
1
0–
5
0.
00
01
48
8.
4
×
1
0–
5
1.
58
×
1
0–
5
1.
48
×
1
0–
5
1.
2
×
1
0–
5
8.
39
×
1
0–
6
6.
55
×
1
0–
5
4.
39
×
1
0–
6
3.
51
×
1
0–
7
7.
86
×
1
0–
7
6.
06
×
1
0–
7
1.
41
×
1
0–
6
3.
2
×
1
0–
6
0.
00
48
2
0
0.005
0.010
0.015
0.020
‘All’ endometriosis Stage B endometriosis
Figure 1 Allele-specific score prediction for endometriosis, using the
Oxford population as the discovery dataset and the QIMR population as the
target dataset. Results for ‘all’ endometriosis are shown in a, and results
for stage B endometriosis are shown in b. The variance explained in the
target dataset on the basis of allele-specific scores derived in the discovery
dataset for eight significance thresholds (P < 0.01, P < 0.05, P < 0.1,
P < 0.2, P < 0.3, P < 0.4, P < 0.5 and P < 0.75, plotted left to right in
each study). The y axis indicates Nagelkerke’s pseudo R2 representing the
proportion of variance explained. The number above each bar is the P value
for the target dataset analysis. This figure shows that the results were not
driven by a few highly associated regions, indicating a substantial number
of common variants underlying disease.
table 2 GWAs, replication and meta-analysis results for rs12700667
Analysis
Number of
cases/controls
Risk allele (A)
frequency
in controls P OR (95% CIs)
Heterogeneity
test P value
1. GWA – all endometriosis
QIMR 2,270/1,870 0.73 1.5 × 10−5 1.25 (1.13–1.38) –
Oxford 924/5,190 0.74 3.9 × 10−3 1.19 (1.06–1.34) –
Combined 3,194/7,060 0.74 2.6 × 10−7 1.22 (1.13–1.32) 0.56
2. GWA – stage B
QIMR 910/1,870 0.73 8.3 × 10−7 1.40 (1.22–1.60) –
Oxford 454/5,190 0.74 4.2 × 10−4 1.35 (1.14–1.60) –
Combined 1,364/7,060 0.74 1.5 × 10−9 1.38 (1.24–1.53) 0.75
3. Replication NHSII –
all endometriosisa 2,392/2,271 0.73 1.2 × 10−3 1.17 (1.06–1.28) –
4. Meta-analysis
All endometriosis (1 + 3) 5,586/9,331 0.74 1.4 × 10−9 1.20 (1.13–1.27) 0.64
aStage was unknown for cases in the NHSII replication cohort, though it was estimated to include ~40% stage B cases21.
©
20
10
N
at
ur
e
A
m
er
ic
a,
In
c.
A
ll
rig
ht
s
re
se
rv
ed
.
Nature GeNetics ADVANCE ONLINE PUBLICATION 3
l e t t e r s
the Nurses’ Health Study (NHS) I and II.
Stage information was not available for NHSII
cases, but the proportion likely to have stage
B disease has been estimated at approximately
40% (ref. 21), similar to that observed in the
QIMR case set (Supplementary Table 1).
Association with ‘all’ endometriosis for the two SNPs on 7p15.2 was
replicated in the US dataset, with P = 1.2 × 10−3, OR = 1.17, 95%
CI 1.06–1.28 for rs12700667 and P = 1.6 × 10−3, OR = 1.17, 95%
CI 1.06–1.28 for rs7798431 (Supplementary Table 3). There was no
evidence (nominal P ≤ 0.05) for replication of rs12540248 (FN1) or
association with the remaining 70 SNPs (Supplementary Table 3).
Analysis of all 5,586 cases and 9,331 controls from the combined
QIMR, Oxford and NHS cohorts further confirmed association
between ‘all’ endometriosis and 7p15.2, producing P = 1.4 × 10−9,
OR = 1.20, 95% CI 1.13–1.27 for rs12700667 and P = 1.1 × 10−7,
OR = 1.18, 95% CI 1.11–1.25 for rs7798431 (Table 2). Although effect
sizes from discovery datasets may be inflated22, the similarity of ORs
for ‘all’ endometriosis in our discovery (GWAS) and replication data-
sets (Table 2) suggests this type of bias has not played a major role.
Assuming the estimated OR of 1.20 and allele frequency of 0.74 for
the rs12700667 A allele, a multiplicative risk model and a population
prevalence of 8% (refs. 10,21,23), the estimated percentage of ‘all’
endometriosis variance explained by rs12700667 was 0.36, or 0.69%
of the estimated 51% heritability of endometriosis9.
The associated SNPs are located in a ~924-kb intergenic region
containing at least one noncoding RNA (AK057379), predicted tran-
scripts and regulatory elements, and a miRNA (hsa-mir-148a) ~88 kb
upstream of rs12700667. The closest gene, NFE2L3, which is highly
expressed in placenta, is located ~331 kb downstream of rs12700667.
Two endometriosis candidate genes, HOXA10 and HOXA11
(refs. 24,25), encoding members of the homeobox A family of tran-
scription factors that play a role in uterine development, lie ~1.35 Mb
downstream of this SNP.
Among reported candidate gene associations for endometrio-
sis14, the only gene with P < 10−3 for SNPs in the GWAS data was
PGR on chromosome 11 (Supplementary Table 3), but the result for
the SNP in this gene was not significant in the replication stage. A
recent genome-wide association scan in Japanese women reported
significant association of endometriosis with rs10965235 (P = 5.8 ×
10−12, OR = 1.44), located on chromosome 9p21, and possible asso-
ciations with rs13271465 on 8p22 and rs16826658 on 1p36 (ref. 26).
The Japanese GWAS did not report our 7p15.2 signal among their 100
top SNPs followed up for replication, but with 1,423 cases and 1,318
controls, they would have had only 13% power to detect the effect
of rs12700667 with P ≤ 1.8 × 10-4 (Online Methods). We found no
evidence for association with rs10965235 (which is monomorphic
in individuals of European descent, reflecting the different genetic
(ancestral) backgrounds between the studies) or any other SNP in LD
(r2 > 0.5 in the HapMap Japanese JPT population) in the QIMR
and Oxford data (Supplementary Table 4). We also found no evidence
of association with 8p22. We did find evidence for replication of
rs7521902 on 1p36, which is close to WNT4, for both ‘all’ endometriosis
(P = 9.0 × 10−5, OR = 1.16, 95% CI 1.08–1.25) and stage B cases
(P = 7.5 × 10−6, OR = 1.25, 95% CI 1.13–1.38), with the stronger signal
in stage B providing additional empirical evidence for the benefit in
examining stage B cases. Importantly, a meta-analysis of the QIMR and
Oxford ‘all’ endometriosis OR with the reported Japanese OR of 1.25
(95% CI 1.12–1.39) for rs7521902 produced a genome-wide significant
P value of 4.2 × 10−8 (OR = 1.19, 95% CI 1.12–1.27). The frequency of
the rs7521902 risk allele (A) was 0.57 and 0.51 in the Japanese GWAS
cases and controls, respectively, and 0.26 and 0.24 in our combined
GWAS cases and controls, respectively. WNT4 is important for develop-
ment of the female reproductive tract27, ovarian follicle development
and steroidogenesis28,29, making it a plausible biological candidate.
We have identified a new locus on chromosome 7p15.2 that is sig-
nificantly associated with risk of endometriosis in women of European
ancestry, and we confirm a previously reported suggestive association
for SNPs close to the WNT4 locus. Our analyses also demonstrate
a higher genetic loading for moderate to severe (stage B) endome-
triosis, and consistent with these results, we observed the strongest
association signals with stage B disease. Our predictive modeling
demonstrates that there are additional common variants contribut-
ing to risk for this disease and that future larger studies enriched for
laparoscopically-confirmed moderate to severe cases will be better
powered to identify risk loci and aberrant pathways contributing to
the development of endometriosis.
URLs. ECR Browser, http://ecrbrowser.dcode.org/; SNPTESTv2,
http://www.stats.ox.ac.uk/~marchini/software/gwas/snptest.html;
1000 Genomes Project, http://www.1000genomes.org/; HapMap,
http://hapmap.ncbi.nlm.nih.gov/.
METhOdS
Methods and any associated references are available in the online
version of the paper at http://www.nature.com/naturegenetics/.
Note: Supplementary information is available on the Nature Genetics website.
ACKNoWLeDGMeNTS
We acknowledge with appreciation all the women who participated in the QIMR,
OXEGENE and NHS studies. We thank Endometriosis Associations for
supporting the study recruitment. We also thank the many hospital directors
and staff, gynecologists, general practitioners and pathology services in Australia,
the UK and the United States who provided assistance with confirmation of
diagnoses. We thank S. Nicolaides and the Queensland Medical Laboratory for
pro bono collection and delivery of blood samples and other pathology services
for assistance with blood collection.
10
a b
100
80
R
ecom
bination rate (cM
/M
b)
60
40
20
0
100
80
R
ecom
bination rate (cM
/M
b)
60
40
20
0
rs12700667 rs12700667
0.8
0.6
0.4
0.2
r2
0.8
0.6
0.4
0.2
r2
Plotted
SNPs
Plotted
SNPs
‘All’ endometriosis Stage B endometriosis
–l
og
10
P
8
6
4
2
0
25.0 25.5 26.526.0
Position o