This article synthesizes current evidence on the performance of polygenic risk scores (PRS) across diverse endometriosis subphenotypes, addressing a critical gap in precision medicine for this complex disease.
This article synthesizes current evidence on the performance of polygenic risk scores (PRS) across diverse endometriosis subphenotypes, addressing a critical gap in precision medicine for this complex disease. We explore the foundational genetic architecture of endometriosis and its subtypes, detail methodological approaches for PRS construction and application, and critically evaluate performance variations across ovarian, peritoneal, infiltrating, and other clinical presentations. The review further examines limitations in current PRS models for predicting specific clinical manifestations and explores integrative approaches combining PRS with epigenetic markers and inflammatory biomarkers. For researchers and drug development professionals, this analysis provides essential insights into the potential of PRS for patient stratification, subtype-specific risk prediction, and guiding targeted therapeutic development.
Endometriosis, a chronic gynecological condition affecting approximately 10% of reproductive-aged women, demonstrates a substantial genetic component, with heritability estimates ranging from 47% to 51% based on twin studies [1] [2]. This technical review synthesizes current understanding of heritability patterns, polygenic risk score (PRS) performance across endometriosis subphenotypes, and associated molecular mechanisms. We examine methodologies for estimating genetic contribution, from traditional familial risk assessment to advanced genomic approaches, including methylation risk score (MRS) modeling and expression quantitative trait loci (eQTL) analysis. The integration of polygenic risk scores with epigenetic data demonstrates enhanced predictive power over genetic risk assessment alone, highlighting the complex interplay between inherited variants and regulatory mechanisms in disease pathogenesis. This synthesis provides researchers and drug development professionals with a comprehensive framework for advancing personalized diagnostic and therapeutic strategies in endometriosis.
Endometriosis is characterized by the presence of endometrial-like tissue outside the uterine cavity, leading to chronic pelvic pain, infertility, and reduced quality of life [3]. The diagnostic delay for this condition ranges from 7 to 12 years, contributing to its significant socioeconomic burden and underscoring the urgent need for better understanding of its genetic architecture to enable early detection and intervention [3]. While retrograde menstruation remains a prevailing theory of pathogenesis, this alone cannot explain why only some individuals develop the condition, pointing to substantial genetic predisposition factors [1].
The genetic basis of endometriosis has been elucidated through various study designs, including twin studies, which have established heritability estimates of approximately 50% [1] [2]. More recent genome-wide association studies (GWAS) have identified multiple risk loci, with the largest meta-analysis to date comprising 60,674 cases and 701,926 controls, identifying 42 genome-wide significant loci that explain up to 5.01% of disease variance [4]. This review systematically examines the methodologies for estimating heritability, familiar risk patterns, and the performance of polygenic risk scores across endometriosis subphenotypes, providing critical insights for researchers and drug development professionals working to advance precision medicine in this complex disease.
Traditional approaches to estimating endometriosis heritability have relied on familial aggregation and twin studies, which provide the foundation for understanding the disease's genetic component:
Twin Studies: The classic twin study design comparing concordance rates between monozygotic and dizygotic twins has provided fundamental heritability estimates ranging from 47% to 51% [1] [2]. These studies establish that genetic factors explain approximately half of the variation in endometriosis risk within populations.
Familial Risk Patterns: First-degree relatives of affected women have a 7- to 10-fold increased risk of developing endometriosis compared to the general population [1]. This increased familial risk provides further evidence for a significant genetic component in disease susceptibility.
Advanced genomic methodologies have refined heritability estimation and enabled dissection of specific genetic contributions:
Genome-Wide Complex Trait Analysis (GCTA): This method uses genome-wide SNP data to estimate the proportion of phenotypic variance explained by all common SNPs. For endometriosis, SNP-based heritability estimates are approximately 26%, indicating that common genetic variants account for about half of the overall heritability [4].
Omics Residual Maximum Likelihood (OREML): This approach quantifies the variance captured by different relationship matrices. Analyses using OREML have demonstrated that DNA methylation profiles in endometrial tissue capture 19.58% of variance in endometriosis status, while common genetic variants capture 28.83% [5]. When both are included in the model, DNAm accounts for 12.18% of variance independent of genetics [5].
Table 1: Heritability Estimates from Different Methodological Approaches
| Methodology | Heritability Estimate | Sample Characteristics | Reference |
|---|---|---|---|
| Twin Studies | 47-51% | Australian twin cohort | [1] [2] |
| SNP-based Heritability | ~26% | GWAS meta-analysis (60,674 cases) | [4] |
| DNA Methylation Capture | 19.58% | Endometrial tissue (908 samples) | [5] |
| Common Genetic Variants | 28.83% | Endometrial tissue (908 samples) | [5] |
Polygenic risk scores aggregate the effects of multiple genetic variants to quantify individual disease susceptibility. The development of PRS for endometriosis involves specific methodological considerations:
SNP Selection and Weighting: PRS typically incorporates genome-wide significant variants from large-scale GWAS. A 14-SNP PRS derived from a meta-analysis of 17,045 cases and 191,596 controls has been validated across multiple cohorts [6]. More recent approaches utilize Bayesian methods (SBayesR) for effect size adjustment, excluding the MHC region to avoid spurious associations [2].
Performance Metrics: In surgically confirmed cases, the 14-SNP PRS demonstrated an odds ratio (OR) of 1.59 per standard deviation increase (p = 2.57×10⁻⁷) [6]. When validated in the UK Biobank, the same PRS showed an OR of 1.28 (p < 2.2×10⁻¹⁶) [6]. The discriminative accuracy, while statistically significant, remains insufficient for standalone clinical utility, highlighting the need for integration with other biomarkers.
Endometriosis encompasses diverse subphenotypes with distinct genetic architectures:
Anatomic Subtypes: PRS performance varies across endometriosis locations. The strongest associations are observed for ovarian endometriosis (OR = 1.72, p = 6.7×10⁻⁵) and infiltrating disease (OR = 1.66, p = 2.7×10⁻⁹), compared to peritoneal endometriosis (OR = 1.51, p = 2.6×10⁻³) [6]. This differential performance indicates subtype-specific genetic architectures.
Disease Stages: Genetic correlation analyses reveal that advanced-stage endometriosis has a stronger genetic component than minimal/mild disease [4]. Notably, ovarian endometriosis demonstrates a different genetic basis compared to superficial peritoneal disease [4].
Comorbidity Patterns: PRS phenome-wide association studies (PheWAS) reveal shared genetic architecture between endometriosis and other pain conditions, including migraine, back pain, and multi-site pain [2] [4]. This suggests that genetic factors contribute to the central sensitization observed in chronic pain patients with endometriosis.
Table 2: Polygenic Risk Score Performance Across Endometriosis Subphenotypes
| Subphenotype | Odds Ratio per SD PRS | P-value | Cohort |
|---|---|---|---|
| Overall Endometriosis | 1.59 | 2.57×10⁻⁷ | Surgically confirmed cohort |
| Ovarian Endometriosis | 1.72 | 6.7×10⁻⁵ | Combined Danish cohorts |
| Infiltrating Endometriosis | 1.66 | 2.7×10⁻⁹ | Combined Danish cohorts |
| Peritoneal Endometriosis | 1.51 | 2.6×10⁻³ | Combined Danish cohorts |
| UK Biobank Validation | 1.28 | <2.2×10⁻¹⁶ | UK Biobank (2,967 cases) |
The combination of polygenic risk scores with epigenetic markers enhances predictive power:
Methylation Risk Scores (MRS): MRS developed from endometrial methylation data can achieve an area under the receiver-operating characteristic curve (AUC) of 0.6748 [5]. The combination of MRS and PRS consistently outperforms PRS alone in classification accuracy [5].
Tissue-Specific Effects: Analyses of endometriosis-associated genetic variants acting as expression quantitative trait loci (eQTLs) reveal tissue-specific regulatory effects [7]. In reproductive tissues (uterus, ovary), regulated genes are enriched for hormonal response, tissue remodeling, and adhesion pathways, whereas in intestinal tissues and blood, immune and epithelial signaling genes predominate [7].
Genetic and epigenetic studies have implicated several key molecular pathways in endometriosis pathogenesis:
Hormonal Signaling Pathways: Genes involved in estrogen biosynthesis (CYP19A1) and signaling (ESR1, GREB1) show strong associations with endometriosis [8] [3]. Progesterone resistance is mediated through reduced progesterone receptor expression and disrupted signaling pathways [3].
Inflammatory and Immune Pathways: Regulatory variants in immune-related genes (IL-6, MICB) demonstrate significant enrichment in endometriosis cohorts [9]. These variants modulate inflammatory responses and may contribute to the immune dysregulation characteristic of endometriosis.
Developmental Pathways: WNT4, a critical gene in reproductive tract development, contains polymorphisms associated with increased endometriosis risk [8]. This suggests that disruptions in developmental programming may contribute to disease susceptibility.
The following diagram illustrates the integration of genetic and environmental factors in endometriosis pathogenesis through these key signaling pathways:
Environmental factors modulate genetic risk through epigenetic mechanisms:
Endocrine-Disrupting Chemicals (EDCs): Exposure to EDCs can alter gene expression via DNA methylation changes in endometriosis-associated genes [9]. Regulatory variants in genes like IL-6 and CNR1 overlap with EDC-responsive regions, suggesting gene-environment interactions exacerbate disease risk [9].
Ancient Regulatory Variants: Some endometriosis-associated regulatory variants, including Neandertal-derived methylation sites in IL-6, show significant enrichment in modern patients [9]. These ancient variants may modulate immune and inflammatory responses that interact with contemporary environmental exposures.
The polygenic risk score phenome-wide association study (PRS-PheWAS) approach enables comprehensive assessment of pleiotropic effects:
Cohort Definition: The workflow involves curating unrelated European individuals from biobanks (e.g., 159,855 males and 188,221 females from UK Biobank), with sensitivity analyses in females without endometriosis diagnoses (n = 182,789) [2].
PRS Calculation: SBayesR weightings are applied to adjusted GWAS summary statistics, excluding the MHC region. PRS is calculated using plink1.9's score function and converted to z-scores for analysis [2].
Association Testing: Associations with phecodes, blood/urine biomarkers, and reproductive factors are tested using logistic regression (for phecodes) or linear regression (for biomarkers), adjusting for age and the first 10 genetic principal components [2].
The following workflow diagram illustrates the key steps in PRS-PheWAS analysis:
Methylation risk score development for endometriosis involves specific analytical steps:
Quality Control and Covariate Adjustment: Following methylation quality control, samples are assessed for technical covariates (age, processing institution, genetic ancestry) significantly associated with DNA methylation principal components [5]. Surrogate variable analysis removes batch effects and hidden sources of variation.
MRS Construction: MRS is developed using multiple models (elastic net, lasso, ridge regression) with performance evaluation through training/test set splits based on independent cohort institutions [5]. The best-performing MRS incorporates 746 DNAm sites.
Variance Partitioning: Omics residual maximum likelihood (OREML) analyses quantify the proportion of variance in endometriosis status captured by DNA methylation independent of common genetic variants [5].
Table 3: Essential Research Reagents and Resources for Endometriosis Genetic Studies
| Resource/Reagent | Specification | Research Application | Reference |
|---|---|---|---|
| GWAS Summary Statistics | Sapkota et al. 2017 meta-analysis (14,926 cases; 189,715 controls) combined with FinnGen Release 8 (13,456 cases, 100,663 controls) | PRS weight derivation using SBayesR | [2] |
| GTEx v8 Database | Tissue-specific eQTL data from 53 non-diseased human tissues, including uterus, ovary, vagina, colon, ileum, and whole blood | Functional mapping of endometriosis-associated variants | [7] |
| DNA Methylation Array | Genome-wide methylation profiling of endometrial tissue (318 controls, 590 cases) | Methylation risk score development and epigenetic quantitative trait loci detection | [5] |
| UK Biobank | Comprehensive health records and genetic data of ~500,000 individuals, including ICD10 diagnoses, biomarker data, and reproductive histories | PRS-PheWAS analysis and validation across multiple subphenotypes | [2] |
| Genomics England 100,000 Genomes | Whole-genome sequencing data from rare disease programs, including endometriosis patients | Identification of regulatory variants and ancient introgressed alleles | [9] |
The integration of heritability estimates with polygenic risk scoring across endometriosis subphenotypes provides powerful insights for advancing precision medicine approaches. While significant progress has been made in identifying genetic risk variants, several challenges and opportunities remain:
Improved Subphenotype Stratification: Future research should focus on refining genetic risk prediction for specific endometriosis subtypes, particularly deep infiltrating and ovarian endometriosis, which demonstrate distinct genetic architectures [6] [4].
Multi-omics Integration: Combining PRS with epigenetic markers, such as methylation risk scores, consistently enhances predictive power over genetic information alone [5]. The development of integrated risk models that incorporate genetic, epigenetic, and environmental factors will be essential for improving early detection and risk stratification.
Functional Validation: Advanced techniques including CRISPR-based screening and organoid models will be critical for validating the functional impact of identified genetic variants and their role in disease pathogenesis [9] [3].
Diverse Population Applications: Current genetic studies predominantly focus on European and East Asian populations. Expanding research to include diverse ancestral backgrounds is essential for ensuring equitable application of genetic discoveries across all populations.
The field of endometriosis genetics has evolved from initial heritability estimates to sophisticated polygenic risk assessment across subphenotypes. By leveraging these advances, researchers and drug development professionals can accelerate the development of personalized diagnostic and therapeutic strategies for this complex condition.
Endometriosis, a chronic and inflammatory gynecological condition affecting approximately 10% of reproductive-aged women, represents a substantial healthcare challenge characterized by diagnostic delays and complex etiology [10]. The disease demonstrates a significant heritable component, estimated at 47-52% from twin and family studies, prompting extensive research to uncover its genetic underpinnings [11]. Genome-wide association studies (GWAS) have emerged as a powerful hypothesis-free approach to identify common genetic variants underlying this complex condition. To date, multiple GWAS and meta-analyses across diverse populations have identified several genome-wide significant loci, with WNT4, VEZT, and GREB1 representing consistently replicated regions [11]. These discoveries provide crucial insights into biological pathways dysregulated in endometriosis and form the foundation for developing polygenic risk scores (PRS) aimed at predicting individual disease risk and understanding its clinical subphenotypes.
The translation of GWAS findings into clinically useful tools requires careful consideration of effect sizes, population-specific frequencies, and functional mechanisms of identified variants. This technical review comprehensively examines the key endometriosis susceptibility loci, their biological mechanisms, and their collective contribution to polygenic risk prediction across disease subphenotypes. We integrate fine-mapping data, functional genomic evidence, and multi-ancestry validation to provide researchers and drug development professionals with a rigorous resource for understanding endometriosis genetics and its applications in stratified medicine.
Table 1: Key Endometriosis Susceptibility Loci Identified through GWAS
| Locus/ Gene | Lead SNP | Risk Allele (Frequency) | Odds Ratio (95% CI) | P-value | Primary Function |
|---|---|---|---|---|---|
| WNT4 | rs7521902 | A (0.49) | 1.20 (1.14-1.26) | 1.8×10⁻¹⁵ | Reproductive tract development, estrogen response |
| VEZT | rs10859871 | C (0.74) | 1.19 (1.14-1.25) | 4.7×10⁻¹⁵ | Cell adhesion, tumor suppressor |
| GREB1 | rs13394619 | NA | NA | 4.5×10⁻⁸ | Estrogen-regulated gene growth, steroid receptor cofactor |
| CDKN2B-AS1 | rs1537377 | C (0.57) | 1.17 (1.11-1.23) | 1.5×10⁻⁸ | Cell cycle regulation |
| FN1 | rs1250248 | A (0.18) | 1.87 (1.34-2.61) | 0.002 | Extracellular matrix formation |
| 7p15.2 | rs12700667 | A (0.79) | 1.22 (1.13-1.32) | 1.6×10⁻⁹ | Intergenic regulatory region |
Table 2: Sub-phenotype Associations for Key Endometriosis Loci
| Locus | Stage I/II Association | Stage III/IV Association | Ovarian Endometriosis | Infiltrating Endometriosis |
|---|---|---|---|---|
| WNT4 | Moderate | Stronger | Yes | Yes |
| GREB1 | Limited | Stronger | Yes | Not reported |
| FN1 | Significant (P=0.0066) | Less pronounced | Not reported | Not reported |
| VEZT | Moderate | Stronger | Yes | Not reported |
Meta-analyses of GWAS datasets encompassing over 11,500 cases and 32,600 controls have confirmed six loci with genome-wide significance (P < 5 × 10⁻⁸), with most showing consistent directional effects across populations of European and Japanese ancestry [11]. The WNT4 locus demonstrates particularly strong and consistent associations, with fine-mapping studies identifying rs3820282 as a likely causal variant that introduces a high-affinity estrogen receptor alpha-binding site, dramatically increasing WNT4 transcription in endometrial stroma following estrogen stimulation [12] [13]. This mechanism represents a classic example of how non-coding regulatory variants can influence disease susceptibility through altered hormone response.
The GREB1 locus exhibits equally sophisticated regulation, functioning as a steroid receptor cofactor in a feedforward mechanism that governs differential hormone action in endometrial function versus endometriosis pathology [14]. In normal endometrial physiology, GREB1 controls progesterone responses in uterine stroma, affecting receptivity and decidualization, while in endometriosis, estrogen-induced GREB1 modulates estrogen-dependent gene expression to promote lesion growth [14]. This cell-type and context-specific functionality highlights the complexity of translating GWAS signals into mechanistic understanding.
Figure 1: Molecular Mechanisms of WNT4 and GREB1 in Endometriosis Pathogenesis
Functional genomic approaches have been essential for moving from statistical associations to biological mechanisms. For the WNT4 locus, CRISPR/Cas9-generated mouse models demonstrate that the human risk allele increases uterine Wnt4 transcription in proestrus and estrus by 1.48-3.27 log2 fold, specifically in endometrial stromal fibroblasts underlying the luminal epithelium [12]. This spatiotemporal specificity highlights the importance of the uterine microenvironment in mediating genetic risk. RNAscope in situ hybridization confirms this stromal-specific upregulation, which subsequently downregulates epithelial proliferation and induces progesterone-regulated pro-implantation genes [12].
For the GREB1 locus, chromatin immunoprecipitation sequencing (ChIP-seq) in human endometrial stromal cells (HESCs) reveals that GREB1 binds to over 2,000 genomic regions, approximately 50% of which are co-occupied by the progesterone receptor [14]. GREB1 knockdown impairs progesterone-induced FOXO1 expression and reduces PR occupancy on target genes, demonstrating its role as a essential PR cofactor [14]. This molecular function explains why GREB1 loss severely compromises female fertility in mouse models through impaired uterine responses to steroid hormones.
Table 3: Polygenic Risk Score Performance Across Endometriosis Studies
| Study Population | Sample Size (Cases/Controls) | Number of SNPs in PRS | Odds Ratio per SD (95% CI) | Variance Explained (R²) |
|---|---|---|---|---|
| Surgically Confirmed (Danish) | 249/348 | 14 | 1.59 (1.32-1.91) | Not reported |
| Danish Twin Registry | 140/316 | 14 | 1.50 (1.22-1.84) | Not reported |
| UK Biobank | 2,967/256,222 | 14 | 1.28 (1.23-1.33) | Not reported |
| Combined Danish | 389/664 | 14 | 1.57 (1.37-1.80) | Not reported |
| Greek Population | 166/168 | 2 (FN1, GREB1) | 1.87 (FN1 rs1250248) | Not reported |
Polygenic risk scores for endometriosis aggregate the effects of multiple susceptibility variants into a single predictive metric. Most studies have utilized 14 genome-wide significant SNPs identified from large meta-analyses, achieving statistically significant but clinically modest risk discrimination [6]. In Danish populations, each standard deviation increase in PRS was associated with 1.57-fold increased odds of endometriosis (P = 2.5×10⁻¹¹), with similar effects across major subtypes: ovarian (OR=1.72), infiltrating (OR=1.66), and peritoneal (OR=1.51) [6]. Notably, the same PRS was not associated with adenomyosis, suggesting distinct genetic architectures for these related gynecological conditions [6].
The discriminative accuracy of current endometriosis PRS remains insufficient for standalone clinical utility, with one study finding inverse associations between PRS and disease spread that lost significance when calculated as p-for-trend [10]. This indicates that current PRS constructions may not adequately capture the genetic basis of severe disease presentations. However, PRS consistently demonstrate association with endometriosis risk irrespective of clinical diagnosis, suggesting they measure genetic liability beyond manifested disease [15].
Robust PRS analysis requires stringent quality control procedures for both base GWAS data and target genotypes. Key considerations include:
Advanced PRS methods that incorporate multi-ancestry data and functional annotations show promise for improving predictive performance. For coronary artery disease, such approaches have increased the proportion of individuals identified with 3-fold increased risk from 8.3% to 20.0% of the population [17]. Similar methodologies applied to endometriosis could substantially enhance risk stratification, particularly if integrated with clinical risk factors and biomarkers.
CRISPR/Cas9-mediated genome editing provides a powerful approach for validating human genetic associations in mouse models. The protocol for modeling the WNT4 rs3820282 variant involves:
This approach confirmed that the human risk allele significantly upregulates uterine Wnt4 expression specifically during proestrus and estrus, mirroring the estrogen-responsive regulation suspected in humans. Two independent knock-in lines showed consistent phenotypes, strengthening evidence for causality [12].
Primary human endometrial stromal cell (HESC) models enable detailed molecular characterization of risk variants:
Application of this pipeline demonstrated that GREB1 physically interacts with progesterone receptor following progestin treatment and is required for optimal PR occupancy at key target genes like FOXO1 [14]. Cut&Run sequencing further defined the GREB1 cistrome, revealing extensive overlap with PR binding sites in endometrial stroma.
Table 4: Key Research Reagents for Endometriosis Genetic Studies
| Reagent/Tool | Specific Application | Function/Utility | Example Use Case |
|---|---|---|---|
| CRISPR/Cas9 | Genome editing | Introduction of precise human risk variants into mouse genome | WNT4 rs3820282 functional validation [12] |
| Primary HESCs | In vitro modeling | Patient-derived stromal cells for hormone response studies | GREB1-PR interaction analysis [14] |
| RNAscope | Spatial transcriptomics | Localization of gene expression in tissue context | WNT4 stromal-specific expression [12] |
| ChIP/Cut&Run | Epigenomic profiling | Mapping transcription factor binding and chromatin states | GREB1 and PR cistrome definition [14] |
| TaqMan assays | Genotyping | Accurate SNP allele discrimination | Case-control association studies [18] |
| Illumina arrays | Genotyping | Genome-wide variant detection | PRS calculation and validation [10] |
The integration of GWAS discoveries with functional genomics has illuminated key pathways in endometriosis pathogenesis, particularly those involving steroid hormone response, developmental patterning, and cellular growth regulation. The well-established loci near WNT4, VEZT, and GREB1 represent the tip of the genetic iceberg, with emerging evidence suggesting numerous additional loci await discovery through expanded sample sizes and diverse population inclusion.
Future research priorities should include:
The observation that lower testosterone levels may be causal for endometriosis highlights how genetic studies can reveal unexpected biological insights with translational potential [15]. As GWAS sample sizes expand and functional characterization methods advance, genetic discoveries will increasingly inform diagnostic stratification, prognostic assessment, and targeted therapeutic development for this complex condition.
Figure 2: Research Pipeline from Genetic Discovery to Clinical Translation in Endometriosis
Endometriosis represents a common inflammatory gynecological disorder affecting approximately 10% of reproductive-aged women worldwide, characterized by the presence of endometrial-like tissue outside the uterine cavity [19] [20]. This complex disease manifests through distinct subphenotypes that demonstrate unique clinical and molecular characteristics: superficial peritoneal endometriosis (PE), ovarian endometrioma (OE), and deep infiltrating endometriosis (DIE) [21] [22]. These subphenotypes are increasingly recognized as clinicopathologically distinct entities with potentially different underlying pathophysiological mechanisms [19]. Within the context of polygenic risk score (PRS) performance research, understanding these subphenotypes becomes paramount, as genetic susceptibility may vary across different manifestations of the disease. The traditional revised American Society for Reproductive Medicine (rASRM) classification system stages endometriosis from minimal (Stage I) to severe (Stage IV) based on surgical findings but correlates poorly with pain symptoms and treatment outcomes [22] [20]. This limitation has driven research toward molecular stratification approaches that may better reflect disease heterogeneity and inform personalized therapeutic strategies.
The recognition of distinct subphenotypes has emerged from observations that lesions at different anatomical locations exhibit varied clinical behavior, histopathological features, and molecular profiles [19]. Superficial peritoneal implants represent the earliest and most common form, while ovarian endometriomas form cysts within the ovaries, and deep infiltrating endometriosis penetrates into retroperitoneal structures [22]. This subphenotype framework provides a critical foundation for investigating the genetic architecture of endometriosis, particularly as it relates to PRS performance across different disease manifestations. Research indicates that these subphenotypes may represent distinct molecular entities rather than a disease continuum, with implications for both biomarker development and therapeutic targeting [21] [19].
The peritoneal fluid microenvironment reflects the inflammatory milieu associated with endometriosis and reveals distinct molecular profiles across subphenotypes. Multiplex immunoassays of 48 cytokines in peritoneal fluid from laparoscopically-confirmed cases have identified unique cytokine signatures that distinguish endometriosis subphenotypes with greater accuracy than traditional staging systems (p < 0.0001) [21] [19].
Table 1: Distinct Cytokine Signatures Differentiating Endometriosis Subphenotypes
| Comparison | Signature Size | Key Cytokines | Pathway Associations |
|---|---|---|---|
| PE vs. OE | 6 cytokines | IL-1α, IL-7, IL-8, MCP-1, MIF, TNF-α | Angiogenesis, immune cell recruitment |
| OE vs. DIE | 7 cytokines | IL-1α, IL-1RA, IL-8, IL-12p40, IL-12p70, IL-16, TNF-α | Inflammation, cell proliferation |
| PE vs. DIE | 6 cytokines | IL-8, IL-12p70, IL-16, MCP-1, MIF, TNF-α | ERK1/2, AKT, MAPK, STAT4 signaling |
Pathway analysis of these cytokine signatures has revealed associations with critical signaling pathways including ERK1/2, AKT, MAPK, and STAT4, which are linked to angiogenesis, cell proliferation, migration, and inflammation in the subphenotypes [19]. The clear separation of subphenotypes based on peritoneal fluid cytokines (cumulative principal component scores: 77% to 92%) significantly outperforms separation based on disease stages (43% to 59%), highlighting the molecular distinctness of these clinical entities [21]. These findings suggest that the subphenotypes may represent different biological processes and inflammatory microenvironments rather than a continuum of disease severity.
The identification of subphenotype-specific cytokine signatures follows a standardized experimental workflow:
Sample Collection: Peritoneal fluid (PF) is collected during laparoscopic surgery from women with and without endometriosis. Participants are stratified according to subphenotype: PE, OE, or DIE, with confirmation by histological examination [19].
Cytokine Analysis: PF samples are analyzed using validated multiplex immunoassays (e.g., Luminex platform) capable of simultaneously quantifying 48 cytokines, chemokines, and growth factors. The assay includes technical replicates and appropriate controls to ensure reproducibility [21] [19].
Data Processing: Raw fluorescence data is converted to concentration values using standard curves for each analyte. Values below detection limits are handled using appropriate statistical methods, and data normalization is performed to account for technical variability.
Statistical Analysis: Partial least squares regression (PLSR) is employed to identify cytokine signatures that optimally distinguish between subphenotypes. Model performance is evaluated using cumulative principal component scores, with significance testing via permutation tests [19].
Pathway Analysis: Bioinformatic tools (e.g., Ingenuity Pathway Analysis, DAVID) are used to map differentially expressed cytokines to biological pathways and networks, revealing subphenotype-specific molecular mechanisms [21].
Polygenic risk scores aggregate the effects of multiple genetic variants to quantify an individual's genetic susceptibility to a disease. For endometriosis, PRS has demonstrated utility across all major subphenotypes, though with varying effect sizes. Research using a 14-variant PRS derived from genome-wide association studies (GWAS) has revealed that genetic risk factors contribute to all types of endometriosis rather than specific locations [6].
Table 2: Polygenic Risk Score Performance Across Endometriosis Subphenotypes
| Subphenotype | Odds Ratio (OR) | P-value | Cohort | Sample Size |
|---|---|---|---|---|
| Overall Endometriosis | 1.57 | 2.5 × 10⁻¹¹ | Danish Combined | 389 cases, 664 controls |
| Ovarian Endometrioma | 1.72 | 6.7 × 10⁻⁵ | Danish Combined | 75 cases |
| Deep Infiltrating | 1.66 | 2.7 × 10⁻⁹ | Danish Combined | 210 cases |
| Peritoneal | 1.51 | 2.6 × 10⁻³ | Danish Combined | 60 cases |
| Overall Endometriosis | 1.28 | < 2.2 × 10⁻¹⁶ | UK Biobank | 2,967 cases, 256,222 controls |
Notably, the PRS was not associated with adenomyosis (OR = 1.07, p = 0.71), suggesting that while adenomyosis shares histological features with endometriosis, it is not driven by the same common genetic risk variants [6]. This specificity supports the biological distinction between these conditions and highlights the potential of PRS for differential risk prediction. The somewhat lower odds ratio in the UK Biobank (1.28) likely reflects differences in case ascertainment, as this cohort relied on ICD-10 codes from hospital records rather than surgical confirmation [6].
PRS phenome-wide association studies (PheWAS) have revealed intriguing pleiotropic effects of endometriosis genetic risk, including an association with lower testosterone levels [2]. This relationship was consistent across sexes, suggesting fundamental biological connections rather than consequences of the disease itself. Mendelian randomization analysis further supported a potential causal effect of lower testosterone on endometriosis risk, with implications for understanding disease mechanisms [2].
The genetic correlation between endometriosis and testosterone levels highlights the potential for endocrine pathways in disease pathogenesis. Lower testosterone may create a permissive environment for the establishment or growth of ectopic lesions, possibly through effects on inflammation, immune function, or cellular proliferation [2]. These findings align with clinical observations of altered hormonal profiles in endometriosis patients and suggest that genetic risk may operate partially through endocrine mechanisms.
Pathway analysis of molecular data from endometriosis subphenotypes has revealed activation of distinct signaling cascades that may drive disease pathogenesis and progression. These pathways represent potential targets for subphenotype-specific therapeutic interventions and provide mechanistic insights into the observed clinical differences.
The ERK1/2, AKT, and MAPK pathways emerge as central regulators across subphenotypes, with variations in their activation patterns and downstream effects [19]. These pathways integrate signals from cytokines, growth factors, and hormonal stimuli to control critical cellular processes including proliferation, survival, and invasion. In deep infiltrating endometriosis, which demonstrates the most aggressive behavior, these pathways show heightened activation, potentially explaining the invasive characteristics of this subphenotype [21].
STAT4 signaling, particularly prominent in peritoneal and deep infiltrating endometriosis, links inflammatory cytokines to transcriptional programs that may perpetuate the disease microenvironment [19]. This pathway plays important roles in immune cell differentiation and function, suggesting immune involvement in subphenotype determination. Additionally, angiogenesis-related pathways driven by VEGF-A and other factors appear differentially activated across subphenotypes, reflecting variations in vascularization requirements for different lesion environments [19].
Table 3: Essential Research Reagents for Endometriosis Subphenotype Investigation
| Reagent Category | Specific Examples | Research Application | Considerations |
|---|---|---|---|
| Multiplex Immunoassay Kits | Luminex 48-plex cytokine panels | Simultaneous quantification of inflammatory mediators in peritoneal fluid | Validate detection limits for low-abundance analytes |
| DNA Genotyping Arrays | Illumina Global Screening Array, Infinium Asian Screening Array | Genome-wide SNP data for PRS calculation | Ensure coverage of endometriosis-associated loci |
| RNA Extraction Kits | Qiagen RNeasy, TRIzol-based methods | Gene expression analysis from lesion tissue | Address challenges of fibrotic tissue in DIE samples |
| Pathway Inhibitors | ERK1/2 inhibitors (SCH772984), AKT inhibitors (MK-2206) | Functional validation of signaling pathways in model systems | Test specificity to avoid off-target effects |
| Antibody Panels | CD45 (immune cells), CD31 (endothelial cells), cytokeratin (epithelial cells) | Immunophenotyping of lesion microenvironment | Optimize for formalin-fixed paraffin-embedded tissue |
| Cell Culture Media | Specific formulations for endometrial stromal cells | In vitro models of lesion establishment and growth | Consider hormone supplementation to mimic menstrual cycle |
The selection of appropriate research reagents is critical for investigating the molecular distinctions between endometriosis subphenotypes. Multiplex immunoassay platforms enable comprehensive cytokine profiling that has been instrumental in identifying subphenotype-specific inflammatory signatures [21] [19]. For genetic studies, high-density genotyping arrays provide the data necessary for polygenic risk score calculation, with careful consideration of ancestry-matched reference panels to ensure accurate risk prediction across diverse populations [6] [23].
Functional studies require well-validated pathway inhibitors and cell culture models that recapitulate key aspects of each subphenotype. For instance, deep infiltrating endometriosis models should prioritize invasive capacity, while ovarian endometrioma models might focus on cyst formation mechanisms [22]. Antibody panels for tissue staining must be optimized for the unique microenvironment of endometriosis lesions, which often contain mixed cell populations and substantial fibrotic components [22] [20].
The delineation of endometriosis subphenotypes—ovarian, peritoneal, and deep infiltrating—represents a crucial advance in understanding this heterogeneous condition. Molecular evidence increasingly supports the concept that these are distinct entities with unique cytokine signatures, signaling pathway activation, and partially non-overlapping genetic architectures [21] [6] [19]. The performance of polygenic risk scores across all subphenotypes indicates shared genetic susceptibility, while variation in effect sizes suggests additional subphenotype-specific genetic factors yet to be fully characterized [6].
Future research directions should include larger subphenotype-stratified GWAS to identify genetic variants with subtype-specific effects, potentially revealing biological mechanisms unique to each form of the disease. Integration of multi-omics approaches—genomics, transcriptomics, proteomics—will provide a more comprehensive understanding of the molecular networks underlying each subphenotype [23]. Additionally, development of refined PRS models that incorporate subphenotype information could enhance predictive accuracy and clinical utility.
From a translational perspective, these findings highlight the potential for subphenotype-specific therapeutic approaches targeting the distinct signaling pathways and inflammatory environments characteristic of each form [19]. The association between endometriosis genetic risk and testosterone levels further suggests endocrine pathways that might be modulated for prevention or treatment [2]. As our understanding of endometriosis subphenotypes continues to evolve, so too will opportunities for personalized risk assessment and targeted interventions aligned with the specific molecular drivers of each patient's disease.
Endometriosis, a chronic systemic disease characterized by the presence of endometrial-like tissue outside the uterine cavity, affects approximately 10% of women of reproductive age worldwide [22] [3]. The diagnostic pathway for this condition remains challenging, with an average delay of 7 to 12 years from symptom onset to definitive surgical diagnosis [22] [3]. This condition exhibits a substantial genetic component, with heritability estimates ranging from 47% to 51% [2]. Beyond its flagship symptoms of pelvic pain and infertility, endometriosis frequently co-occurs with multiple other conditions, including pain disorders, osteoarthritis, and various autoimmune diseases [2]. Understanding the shared genetic architecture between endometriosis and these comorbid conditions provides not only insights into underlying biological mechanisms but also opportunities for improving polygenic risk score (PRS) performance across endometriosis subphenotypes.
The complex genetic landscape of endometriosis is characterized by polygenic inheritance, where numerous genetic variants collectively contribute to disease susceptibility. Recent genome-wide association studies (GWAS) have identified multiple risk loci associated with endometriosis, including genes such as WNT4, VEZT, and GREB1 [6] [3]. The aggregation of these susceptibility variants into polygenic risk scores offers a powerful approach to quantify genetic predisposition. However, the performance of endometriosis PRS varies across different disease subtypes and comorbid conditions, reflecting the underlying genetic heterogeneity [6]. This technical review examines the genetic sharing between endometriosis and its comorbid conditions, with particular emphasis on methodological approaches for investigating these relationships and their implications for refining PRS stratification in endometriosis research.
The genetic basis of endometriosis has been elucidated through large-scale GWAS meta-analyses, revealing significant associations across multiple genomic loci. The most recent and well-powered GWAS have identified 42 independent loci associated with endometriosis risk, which collectively explain up to 5.01% of disease variance [2]. These findings build upon earlier studies that initially identified 14 genome-wide significant single nucleotide polymorphisms (SNPs) from a meta-analysis comprising over 17,000 cases [6]. The identified loci implicate genes involved in sex hormone signaling (ESR1, GREB1, WNT4), developmental processes (HOXA10), and inflammatory pathways (IL1A) [6] [2] [3].
Table 1: Key Genetic Loci Associated with Endometriosis Risk
| Genomic Locus | Nearest Gene | Putative Function | Associated Endometriosis Subtypes |
|---|---|---|---|
| 1p36.12 | WNT4 | Sex hormone signaling, development | Ovarian, infiltrating [6] |
| 2p25.1 | GREB1 | Estrogen regulation | All subtypes [6] |
| 12q21.2 | VEZT | Cell adhesion | Ovarian, peritoneal [3] |
| 6q25.1 | ESR1 | Estrogen receptor | All subtypes [2] |
| 7p15.2 | HOXA10 | Developmental processes | Infiltrating [3] |
| 2q13 | IL1A | Inflammatory response | Peritoneal [3] |
The performance of polygenic risk scores derived from these established loci varies across endometriosis subtypes. In a study evaluating a 14-SNP PRS, each standard deviation increase in PRS was associated with endometriosis overall (OR = 1.57, p = 2.5×10-11), with varying effect sizes across subtypes: ovarian (OR = 1.72, p = 6.7×10-5), infiltrating (OR = 1.66, p = 2.7×10-9), and peritoneal (OR = 1.51, p = 2.6×10-3) [6]. This differential performance across subtypes highlights the genetic heterogeneity within endometriosis and underscores the need for subtype-specific PRS optimization.
A key finding from recent PRS phenome-wide association studies (PheWAS) is the association between genetic liability to endometriosis and altered testosterone levels [2]. This relationship was identified through a multi-step analytical approach that integrated PRS-PheWAS with Mendelian randomization to infer causal directionality. The findings suggest that lower testosterone levels may be causal for both endometriosis and clear cell ovarian cancer, revealing a shared hormonal mechanism that extends beyond traditional estrogen-centric models of endometriosis pathophysiology [2].
The hormonal interplay involves multiple pathways, including altered steroidogenesis in endometrial stromal cells, estrogen-induced overexpression of nicotinamide N-methyltransferase (NNMT), and progesterone resistance mediated through dysregulated FKBP4 expression and microRNA-29c regulation [3]. These pathways not only contribute to endometriosis pathogenesis but also represent potential shared mechanisms with other hormone-sensitive conditions.
Diagram 1: Shared Genetic Pathways in Endometriosis and Comorbid Conditions. This diagram illustrates how genetic risk variants for endometriosis influence hormonal and immune pathways, contributing to both endometriosis and related autoimmune comorbidities through shared biological mechanisms.
PRS-PheWAS represents a powerful methodological approach for investigating the pleiotropic effects of genetic liability to endometriosis across a broad spectrum of traits and conditions. This technique involves testing the association between an endometriosis PRS and multiple phenotypes in large biobanks, such as the UK Biobank [2]. The fundamental advantage of this approach is that it can identify genetic associations irrespective of disease diagnosis status, thereby capturing effects that may be present in undiagnosed individuals.
The standard workflow for PRS-PheWAS in endometriosis research involves several key steps: (1) derivation of PRS weightings from large-scale GWAS summary statistics using Bayesian methods such as SBayesR; (2) calculation of individual PRS in target cohorts; (3) association testing between PRS and multiple phenotypic categories, including ICD-10 diagnostic codes, blood and urine biomarkers, and reproductive factors; and (4) stratification by sex to identify sex-specific effects [2]. This approach has revealed numerous associations between genetic liability to endometriosis and other conditions, including migraine, irritable bowel syndrome, and depression [2].
Mendelian randomization (MR) has emerged as a essential technique for investigating potential causal relationships between endometriosis and comorbid conditions. MR utilizes genetic variants as instrumental variables to assess causality while minimizing confounding from environmental factors [24] [25]. The standard MR approach requires that genetic variants satisfy three key assumptions: (1) robust association with the exposure, (2) independence from confounders, and (3) affect the outcome only through the exposure [24].
In the context of autoimmune disease and osteoarthritis, recent MR studies have employed univariable, multivariable, and two-step mediation analyses [24] [25]. These analyses have identified several autoimmune diseases with potential causal relationships to osteoarthritis, including celiac disease (OR = 1.061, 95% CI = 1.018-1.105, p = 0.005), Crohn's disease (OR = 1.235, 95% CI = 1.149-1.327, p = 9.44E-09), ankylosing spondylitis (OR = 2.63, 95% CI = 1.21-5.717, p = 0.015), rheumatoid arthritis (OR = 1.082, 95% CI = 1.034-1.133, p = 0.001), and ulcerative colitis (OR = 1.175, 95% CI = 1.068-1.294, p = 0.001) [24] [25]. These findings demonstrate the utility of MR for elucidating shared genetic mechanisms across seemingly distinct disease domains.
Diagram 2: Mendelian Randomization Framework for Causal Inference. This diagram illustrates the three key assumptions of Mendelian randomization used to investigate causal relationships between autoimmune diseases and osteoarthritis, applicable to endometriosis comorbidity research.
Cross-phenotype meta-analysis represents an advanced methodological approach for identifying shared genetic architecture across multiple autoimmune diseases. This technique, applied to ten pediatric autoimmune diseases, has revealed 27 genome-wide loci, with 22 shared by at least two diseases and 19 shared by at least three [26]. These shared loci predominantly map to biological pathways involved in immune processes, including cell activation, proliferation, and signaling systems [26].
The CPMA approach enables researchers to identify clusters of autoimmune diseases with similar genetic architectures, providing insights into potential shared therapeutic targets. For instance, one study identified that rheumatoid arthritis and ankylosing spondylitis form one distinct cluster, while multiple sclerosis and autoimmune thyroid disease form another, with type 1 diabetes showing similarities to both groups [27]. These patterns of genetic sharing have important implications for understanding the comorbid relationships between endometriosis and specific autoimmune conditions.
Endometriosis demonstrates significant genetic overlap with various autoimmune diseases, suggesting shared etiological pathways. Large-scale genetic studies have identified substantial pleiotropy, with many endometriosis risk loci also conferring susceptibility to autoimmune conditions [2] [26]. The genetic relationship appears to be particularly strong for certain autoimmune diseases, including rheumatoid arthritis, systemic lupus erythematosus, and inflammatory bowel disease [2] [27].
Table 2: Patterns of Genetic Sharing Between Endometriosis and Autoimmune Diseases
| Autoimmune Disease | Shared Genetic Loci | Key Shared Biological Pathways | Implications for Endometriosis PRS |
|---|---|---|---|
| Rheumatoid Arthritis | PTPN22, CTLA4, TNFAIP3 | T-cell signaling, immune regulation | Informs inflammatory subphenotype PRS [27] |
| Systemic Lupus Erythematosus | IL1A, IL-1 family genes | Innate immunity, cytokine signaling | Relevant for systemic manifestations [2] |
| Inflammatory Bowel Disease | NOD2, ATG16L1 | Autophagy, microbial defense | Guides GI symptom subphenotyping [27] |
| Celiac Disease | SH2B3, IL2-IL21 region | Immune cell differentiation | Informs nutrient malabsorption comorbidity [24] |
| Ankylosing Spondylitis | IL23R, ERAP1 | IL-23/Th17 pathway, antigen presentation | Relevant for axial pain components [24] |
The shared genetic architecture between endometriosis and autoimmune diseases predominantly involves pathways related to immune cell differentiation, cytokine signaling, and innate immunity [27]. Notably, many of the shared loci demonstrate stronger expression in specific immune cell types, such as B cells, offering potential targets for therapeutic interventions that could simultaneously address both endometriosis and comorbid autoimmune conditions [26].
The relationship between endometriosis and osteoarthritis represents a compelling example of genetic sharing across traditionally distinct disease categories. Recent Mendelian randomization studies have provided evidence for a potential causal relationship between several autoimmune diseases and osteoarthritis [24] [25]. While osteoarthritis has historically been considered a degenerative "wear-and-tear" condition, these genetic findings support the involvement of inflammatory processes in its pathogenesis, creating conceptual overlap with endometriosis.
Transcriptome analysis has revealed that metabolism-related pathways play a key role in the comorbidity between autoimmune diseases and osteoarthritis [24] [25]. This observation aligns with findings in endometriosis, where metabolic reprogramming has been increasingly recognized as a contributor to disease pathogenesis. The genetic overlap between these conditions suggests the potential for shared therapeutic approaches targeting inflammatory and metabolic pathways.
Chronic pain represents a central feature of endometriosis that frequently co-occurs with other chronic pain conditions, suggesting shared genetic vulnerability to pain sensitization. While specific genetic variants underlying this relationship remain to be fully elucidated, PRS-PheWAS studies have identified associations between genetic liability to endometriosis and other pain-related conditions, including migraine and irritable bowel syndrome [2].
The genetic sharing between endometriosis and other pain disorders may involve pathways related to neuroinflammation, central sensitization, and altered pain processing. Emerging evidence suggests that the genetic liability to endometriosis can influence pain perception and sensitization independent of the physical disease manifestation, as demonstrated by associations observed in males who carry endometriosis genetic risk factors but do not develop the condition [2]. This finding highlights the potential for using genetic data to identify individuals at risk for chronic pain syndromes beyond traditional diagnostic boundaries.
Objective: To identify pleiotropic associations between genetic liability to endometriosis and a broad range of phenotypes in large biobanks.
Materials and Reagents:
Procedure:
Interpretation: Associations identified in all three cohorts (females, males, and females without diagnosis) suggest pleiotropic effects independent of disease manifestation, while sex-specific associations highlight the importance of hormonal or anatomical factors.
Objective: To assess potential causal relationships between endometriosis and comorbid conditions using genetic instruments.
Materials and Reagents:
Procedure:
Interpretation: A consistent effect across multiple MR methods with no evidence of directional pleiotropy supports a potential causal relationship. Significant heterogeneity may indicate subtype-specific effects or pleiotropic mechanisms.
Table 3: Key Research Reagents and Resources for Genetic Sharing Studies
| Resource Category | Specific Examples | Application in Endometriosis Genetics | Key Features |
|---|---|---|---|
| GWAS Summary Statistics | Sapkota et al. 2017 meta-analysis [2], FinnGen endometriosis data [2] | PRS construction, genetic correlation | Large sample sizes, diverse endometriosis subphenotypes |
| Biobank Resources | UK Biobank [6] [2], Danish Twin Registry [6] | PRS-PheWAS, validation studies | Deep phenotyping, genetic data, longitudinal follow-up |
| Analysis Software | PLINK [2], GCTB (for SBayesR) [2], TwoSampleMR [24] [25] | PRS calculation, MR analysis | Efficient processing of large genetic datasets |
| Genetic Instruments | 14-SNP PRS [6], 42-locus PRS [2] | Genetic overlap studies | Established effect sizes, validated associations |
| Pathway Analysis Tools | MAGMA, DEPICT | Biological mechanism elucidation | Gene set enrichment, tissue-specific expression |
The patterns of genetic sharing between endometriosis and its comorbid conditions have profound implications for refining PRS performance across endometriosis subphenotypes. The differential effect sizes of PRS across ovarian, infiltrating, and peritoneal subtypes [6] suggest that subtype-specific PRS optimization may enhance predictive accuracy and clinical utility. Furthermore, the identification of specific genetic overlaps with particular comorbid conditions may enable the development of PRS that not only predict endometriosis risk but also the likelihood of specific symptom profiles or comorbid conditions.
From a therapeutic perspective, the shared genetic architecture between endometriosis and autoimmune diseases offers opportunities for drug repurposing and novel target development. The identification of shared pathways, such as those involving B-cell activation or specific cytokine signaling, may guide the application of existing immunomodulatory therapies to endometriosis treatment [26]. Additionally, the genetic relationship with testosterone levels [2] suggests potential for hormonal interventions that extend beyond traditional estrogen-focused approaches.
Future research directions should include the development of integrated PRS that incorporate variants associated with both endometriosis and its key comorbidities, potentially offering improved stratification of patients based on their likely disease presentation and progression. Furthermore, investigation of the genetic relationships between endometriosis and comorbid conditions across diverse ancestral backgrounds represents a critical priority for addressing health disparities in endometriosis diagnosis and care.
The genetic sharing between endometriosis and comorbid conditions, including pain disorders, osteoarthritis, and autoimmune diseases, reveals complex pleiotropic relationships that extend beyond traditional diagnostic boundaries. Methodological advances in PRS-PheWAS, Mendelian randomization, and cross-phenotype meta-analysis have provided powerful tools for elucidating these relationships and their underlying biological mechanisms. The integration of these genetic insights into endometriosis subphenotype research holds significant promise for improving risk prediction, understanding disease heterogeneity, and developing targeted therapeutic approaches that address the multifaceted nature of this complex condition.
Endometriosis is a complex gynecological disorder with a significant genetic component, characterized by a polygenic architecture where numerous common genetic variants of small effect size collectively contribute to disease susceptibility. This whitepaper synthesizes current understanding of how genome-wide association studies (GWAS) have identified endometriosis risk loci and how their cumulative effects are quantified through polygenic risk scores (PRS). We examine the performance of PRS across different endometriosis subphenotypes, highlighting the increased genetic burden associated with moderate-to-severe disease stages. The functional characterization of risk variants through expression quantitative trait loci (eQTL) analysis reveals tissue-specific regulatory effects, providing insights into biological mechanisms. Emerging evidence suggests that genetic liability to endometriosis has pleiotropic effects on other traits, including hormonal factors. While current PRS models show promising discriminative ability, they have not yet reached clinical utility as stand-alone tools. Integration with clinical risk factors and symptoms may enable development of risk stratification tools to reduce diagnostic delays and improve patient outcomes.
Endometriosis is a common, estrogen-dependent inflammatory condition affecting approximately 10% of women of reproductive age, with familial clustering indicating a strong genetic component. Twin studies estimate the heritability of endometriosis at approximately 51% [11], while common single nucleotide polymorphisms (SNPs) account for approximately 26% of the disease variance [28]. The condition demonstrates complex inheritance patterns consistent with a polygenic architecture, where multiple genetic variants interact with environmental factors to influence disease risk.
The polygenic nature of endometriosis presents both challenges and opportunities for understanding its pathophysiology. Genome-wide association studies (GWAS) have successfully identified multiple risk loci, though individual variants confer only modest increases in risk. The cumulative effect of these variants can be quantified through polygenic risk scores (PRS), which aggregate the effects of many risk alleles into a single metric. These scores show particular utility for stratifying patients by disease subphenotypes, with stronger genetic effects observed in more severe disease forms [29].
This technical review examines the current state of knowledge regarding the polygenic architecture of endometriosis, with particular focus on PRS performance across disease subphenotypes. We provide detailed methodological frameworks for GWAS and PRS construction, analyze the functional characterization of risk variants, and discuss applications in both research and clinical contexts.
Initial genetic investigations of endometriosis employed candidate gene approaches, which were largely unsuccessful due to limited genomic coverage and inadequate sample sizes [11]. The advent of genome-wide association studies (GWAS) enabled hypothesis-free identification of common variants, revealing the highly polygenic nature of the condition. Early GWAS in Japanese and European populations identified the first robust associations, including variants in CDKN2B-AS1 and an intergenic region on 7p15.2 [11]. Subsequent meta-analyses substantially increased discovery power, identifying additional loci and highlighting the genetic correlation between European and Asian populations [30].
Table 1: Key GWAS and Meta-Analyses in Endometriosis Genetics
| Study | Sample Size (Cases/Controls) | Ancestry | Novel Loci Identified | Key Findings |
|---|---|---|---|---|
| Uno et al. 2010 [11] | 1,907/5,292 | Japanese | CDKN2B-AS1 | First GWAS in Japanese population |
| Painter et al. 2011 [11] | 3,194/7,060 | European | 7p15.2 | First GWAS in European ancestry |
| Nyholt et al. 2012 [30] | 4,604/9,393 | Trans-ancestry | 6 novel loci | Demonstrated genetic correlation between populations |
| Sapkota et al. 2017 [28] | 17,045/191,596 | Trans-ancestry | 5 novel loci | Implicated genes in sex steroid hormone pathways |
| Recent meta-analysis [7] | 17,045+/191,596+ | Trans-ancestry | 42 loci | Identified tissue-specific regulatory effects |
Current GWAS have identified 42 genetic loci associated with endometriosis risk [2]. The majority of these variants reside in non-coding regions, suggesting they exert their effects through gene regulation rather than protein coding changes. Several key biological pathways are enriched among the implicated genes:
Notably, many endometriosis risk loci show pleiotropic effects with other reproductive traits and hormonal cancers, suggesting shared biological mechanisms [2].
Polygenic risk scores are calculated as the weighted sum of risk alleles an individual carries, with weights typically derived from GWAS effect sizes. The standard approach involves:
The PRS for an individual is calculated as:
[ PRSi = \sum{j=1}^{M} wj \times G{ij} ]
Where (wj) is the weight for SNP (j) (typically the log odds ratio from GWAS), (G{ij}) is the genotype of individual (i) for SNP (j) (coded as 0, 1, or 2 copies of the effect allele), and (M) is the number of SNPs included in the score [6].
Multiple studies have validated PRS for endometriosis across diverse cohorts. A study using a 14-variant PRS demonstrated significant association with endometriosis risk in both Danish cohorts (OR = 1.57 per standard deviation increase, p = 2.5×10^-11) and the UK Biobank (OR = 1.28, p < 2.2×10^-16) [6]. The PRS was associated with all major subtypes of endometriosis, with the strongest effect for ovarian endometriosis (OR = 1.72) [6].
Table 2: Polygenic Risk Score Performance Across Endometriosis Subphenotypes
| Subphenotype | Cohort | Odds Ratio per SD | P-value | Sample Size (Cases/Controls) |
|---|---|---|---|---|
| All endometriosis | Combined Danish | 1.57 | 2.5×10^-11 | 389/664 |
| All endometriosis | UK Biobank | 1.28 | <2.2×10^-16 | 2,967/256,222 |
| Ovarian (N80.1) | Combined Danish | 1.72 | 6.7×10^-5 | 75/664 |
| Infiltrating (N80.4-5) | Combined Danish | 1.66 | 2.7×10^-9 | 210/664 |
| Peritoneal (N80.2-3) | Combined Danish | 1.51 | 2.6×10^-3 | 60/664 |
| Stage B (rAFS III-IV) | Australian/UK | 1.38* | 5.8×10^-12 | 1,357/8,075 |
| Stage A (rAFS I-II) | Australian/UK | 1.15* | 0.015 | 1,680/8,075 |
*Genetic risk score based on increasing number of SNPs at p-value threshold <0.5 [29]
Notably, PRS was not associated with adenomyosis (N80.0), suggesting distinct genetic architecture despite clinical similarities [6]. This specificity supports the hypothesis that PRS captures endometriosis-specific risk rather than general susceptibility to gynecological disorders.
A key finding in endometriosis genetics is the differential genetic loading across disease stages. Multiple studies have demonstrated that common genetic variants contribute more substantially to moderate-severe (rAFS Stage III-IV) endometriosis compared to minimal-mild disease (rAFS Stage I-II) [29]. The common SNP-based heritability is significantly higher for Stage B endometriosis (0.35) than for Stage A disease (0.15) [29].
Further analysis refining the staging to four categories (minimal, mild, moderate, and severe) revealed a gradient of genetic burden, with increasing contribution of common genetic variation from minimal to severe disease [29]. This gradient effect suggests that more severe forms of endometriosis may represent a more genetically determined subset of the condition.
Functional characterization of endometriosis risk variants through expression quantitative trait loci (eQTL) analysis reveals tissue-specific regulatory patterns. A recent study investigating 465 endometriosis-associated variants across six relevant tissues (uterus, ovary, vagina, sigmoid colon, ileum, and peripheral blood) found distinct regulatory profiles [7]:
Key regulated genes include MICB (immune evasion), CLDN23 (barrier function), and GATA4 (proliferative signaling) [7]. This tissue-specific functional annotation provides mechanistic insights into how genetic variants might contribute to different disease manifestations.
PRS-phenome wide association studies (PheWAS) demonstrate that genetic liability to endometriosis has pleiotropic effects on numerous other traits, even in individuals without diagnosed endometriosis [2]. Notable associations include:
Mendelian randomization analyses suggest that lower testosterone levels may be causal for endometriosis risk, revealing a potentially modifiable hormonal risk factor [2].
The translation of GWAS findings into biological insights requires comprehensive functional characterization. Most endometriosis-associated variants reside in non-coding regions, suggesting they influence gene regulation rather than protein function [7]. Integration with multi-omics data provides a powerful approach to understanding variant function:
A systematic eQTL analysis of endometriosis risk variants found that 35% showed significant regulatory effects in at least one tissue, with the strongest effects observed in uterus and ovary [7].
Several novel endometriosis risk loci implicate genes with established roles in sex steroid hormone pathways, including FN1, CCDC170, ESR1, SYNE1, and FSHB [28]. Hormonal regulation appears to be a central mechanism in endometriosis genetics:
These findings not only illuminate disease mechanisms but also highlight potential targets for therapeutic intervention, particularly for patients with specific genetic profiles.
Table 3: Essential Research Tools for Endometriosis Genetic Studies
| Reagent/Resource | Function | Example Use | Key Features |
|---|---|---|---|
| GTEx Database v8 | Tissue-specific eQTL reference | Mapping regulatory consequences of risk variants | 6 endometriosis-relevant tissues; significance threshold FDR <0.05 [7] |
| MSigDB Hallmark Gene Sets | Curated biological pathway database | Functional interpretation of regulated genes | 50 well-defined biological states; Cancer Hallmarks collections [7] |
| GWAS Catalog (EFO_0001065) | Repository of published GWAS results | Variant selection and functional annotation | 465 unique endometriosis-associated variants with p<5×10^-8 [7] |
| UK Biobank | Population-based cohort with genetic data | PRS validation and phenome-wide association | ~500,000 participants; extensive phenotype data [2] |
| SBayesR | Bayesian method for PRS calculation | Effect size adjustment for PRS weighting | Accounts for genetic architecture; improves prediction accuracy [2] |
| Ensembl VEP | Variant effect predictor | Functional annotation of risk variants | Genomic location, functional region, associated gene [7] |
Objective: To develop and validate a polygenic risk score for endometriosis using GWAS summary statistics and independent target cohorts.
Materials:
Procedure:
Validation: Assess discriminative accuracy via odds ratios per standard deviation increase in PRS and area under the receiver operating characteristic curve (AUC-ROC)
Objective: To characterize the regulatory effects of endometriosis-associated variants across relevant tissues.
Materials:
Procedure:
Analysis: Compare regulatory profiles across tissues and identify tissue-specific enriched pathways.
The polygenic architecture of endometriosis encompasses hundreds of common variants with small effect sizes that collectively contribute to disease risk. Polygenic risk scores effectively capture this cumulative genetic burden and demonstrate utility for subphenotype stratification, with stronger effects observed in moderate-to-severe disease. Functional characterization of risk variants reveals tissue-specific regulatory mechanisms, particularly in hormonal pathways. While current PRS models show significant associations with endometriosis risk, their discriminative accuracy remains insufficient for standalone clinical application. Future research directions should include: (1) development of trans-ancestry PRS to improve equity in genetic risk prediction, (2) integration of rare variants from whole-exome sequencing studies, (3) multi-omics approaches combining genomic, transcriptomic, and epigenomic data, and (4) implementation in longitudinal cohorts to assess utility for risk prediction and early intervention. The ongoing expansion of GWAS sample sizes and functional annotation resources will continue to enhance our understanding of endometriosis genetics and move the field toward precision medicine approaches.
Polygenic risk scores (PRS) have emerged as a powerful tool for quantifying an individual's genetic liability to complex diseases. For endometriosis, a condition with a significant genetic component accounting for 47-51% of heritability, PRS offers promising avenues for risk prediction, patient stratification, and understanding of shared disease aetiology [31] [10]. The performance and clinical utility of PRS for endometriosis are fundamentally dependent on two core technical aspects: the selection of single nucleotide polymorphisms (SNPs) included in the score and the statistical methods used to weight their individual effects. This technical guide provides an in-depth examination of current SNP selection and weighting strategies within the broader context of endometriosis subphenotype research, addressing the critical need for standardized methodologies in genetic risk prediction for this complex gynecological disorder.
The selection of SNPs for inclusion in endometriosis PRS has traditionally followed two primary pathways: genome-wide significant SNP selection and clumping and thresholding methods.
Genome-wide Significant SNP Selection involves curating variants that surpass the conventional genome-wide significance threshold (P < 5 × 10-8) from large-scale genome-wide association studies (GWAS). For endometriosis, multiple studies have utilized this approach with a focused set of SNPs. A 2021 study derived a PRS based on 14 genome-wide significant lead SNPs identified from a published GWAS meta-analysis comprising over 17,000 endometriosis cases [6]. Similarly, a 2022 clinical presentation study calculated PRS using 13 SNPs with p-values < 5 × 10-8 that were present in their dataset [10]. While this approach ensures the inclusion of robustly associated variants, it potentially omits SNPs with smaller effect sizes that collectively contribute to disease risk.
Clumping and Thresholding (C+T) methods represent a more inclusive approach by incorporating SNPs below the genome-wide significance threshold. This method involves clumping SNPs to account for linkage disequilibrium (LD) and setting p-value thresholds for inclusion. The standard clumping parameters typically include an LD r2 threshold of 0.1-0.2 within a specified genomic distance (e.g., 250-500 kb) [32] [16]. PRSice-2, a widely used software for PRS analysis, automates this process and allows for optimization of p-value thresholds using the target dataset [32]. This method enables the capture of a broader polygenic signal beyond genome-wide significant hits, potentially improving predictive power.
Table 1: Standard Quality Control Parameters for PRS Analysis
| Data Type | QC Parameter | Threshold | Rationale |
|---|---|---|---|
| Base Data (GWAS Summary Statistics) | Heritability (h²snp) | > 0.05 | Ensures sufficient genetic signal for meaningful PRS [16] |
| Effect allele identification | Must be specified | Prevents spurious results from strand mismatches [16] | |
| INFO score | > 0.8 | Ensures high imputation quality [10] | |
| Target Data (Genotypes) | Sample missingness | < 0.02 | Removes poor-quality samples [16] |
| Minor allele frequency | > 0.01 | Reduces noise from rare variants [16] | |
| Hardy-Weinberg equilibrium | P > 1×10-5 | Excludes variants with genotyping errors [10] | |
| Heterozygosity rate | ±3 SD from mean | Removes contaminated samples [10] |
The weighting of SNP effects in PRS construction has evolved from simple to increasingly sophisticated methods.
Unaligned Effect Size Weighting represents the most straightforward approach, where SNP effect sizes (beta coefficients or odds ratios) from GWAS summary statistics are directly applied as weights in the PRS calculation. The basic PRS formula is expressed as:
[ PRSj = \sum{i=1}^{n} wi \times G{ij} ]
where ( PRSj ) is the polygenic risk score for individual ( j ), ( wi ) is the weight of SNP ( i ) derived from GWAS summary statistics, ( G_{ij} ) is the genotype of SNP ( i ) for individual ( j ) (coded as 0, 1, or 2 copies of the effect allele), and ( n ) is the total number of SNPs in the score [32].
Alternative Genetic Models can be implemented in PRS calculation software such as PRSice-2, which allows for different genetic models including additive (standard), dominant, recessive, and heterozygous models [32]. The coding of genotypes varies according to the selected model, affecting how the weighted scores are computed.
Scoring Methods include options beyond simple summation. PRSice-2 implements multiple scoring approaches: --score sum for the standard weighted sum, --score avg which divides the sum by the number of alleles included, --score std for standardized scores, and --score con-std for conditional standardization [32].
Advanced Bayesian methods have demonstrated improved performance for endometriosis PRS by applying shrinkage to SNP effect sizes to account for linkage disequilibrium and varying genetic architectures.
SBayesR Approach was employed in a 2023 PRS-PheWAS study of endometriosis, where summary statistics from multiple European cohorts were meta-analyzed and subsequently adjusted using SBayesR implemented in GCTB 2.02 [31]. This method uses a mixture of normal distributions with different variances to shrink SNP effects, effectively assigning more weight to SNPs with stronger evidence of association while shrinking others toward zero. The study excluded the MHC region due to its complex LD structure, a common practice in PRS analysis to avoid spurious associations [31].
LD Pred is another Bayesian method that infers the posterior mean effect size of each SNP by using a prior that reflects assumptions about the distribution of effect sizes across the genome, leveraging LD information from a reference panel. While not explicitly mentioned in the endometriosis-focused search results, it represents a widely used approach in the PRS methodology toolkit that could be applied to endometriosis risk prediction [16].
Emerging machine learning techniques show promise for enhancing genomic prediction of endometriosis beyond traditional methods.
Deep Neural Network Approaches are being explored to capture complex, non-additive genetic effects in endometriosis. A 2025 study described "an extensive multi-variant deep neural network approach to enhance genomic prediction of endometriosis," suggesting the potential for machine learning to improve prediction accuracy by modeling higher-order interactions between genetic variants [33]. These methods can incorporate thousands of genetic variants without relying on strict p-value thresholds, potentially capturing a more comprehensive genetic signal.
Table 2: Comparison of PRS Performance Across Endometriosis Studies
| Study | SNP Selection Method | Weighting Approach | Sample Size | Performance (OR per SD) |
|---|---|---|---|---|
| Søgaard et al. (2021) [6] | 14 genome-wide significant SNPs | Unaligned effect sizes | 249 cases, 348 controls (clinical cohort) | OR = 1.59, p = 2.57×10^-7 |
| Same 14 SNPs | Unaligned effect sizes | 2,967 cases, 256,222 controls (UK Biobank) | OR = 1.28, p < 2.2×10^-16 | |
| Law et al. (2023) [31] | Multi-threshold | SBayesR | 159,855 males, 188,221 females (UK Biobank) | Significant associations with multiple health conditions |
| León et al. (2022) [10] | 13 genome-wide significant SNPs | Weighted and unweighted | 172 patients | Inverse associations with disease spread |
Endometriosis exhibits considerable heterogeneity in clinical presentation, and emerging evidence suggests that genetic burden varies across disease stages and subtypes, necessitating tailored PRS approaches.
Stage-Specific Genetic Burden was demonstrated in a genetic burden analysis that found increasing polygenic contribution from minimal to severe endometriosis [29]. The study revealed that moderate and severe endometriosis (rAFS Stage III/IV) showed greater genetic burden than minimal or mild disease (rAFS Stage I/II), suggesting that PRS constructed from GWAS of advanced disease may have better predictive power for severe forms [29].
Subtype-Specific PRS Performance was evaluated in a 2021 study that tested PRS association across endometriosis subtypes, finding that the PRS was associated with ovarian (OR = 1.72), infiltrating (OR = 1.66), and peritoneal (OR = 1.51) endometriosis [6]. This indicates that genetic risk factors contribute to all major subtypes rather than being specific to certain locations, supporting the use of a unified PRS for general endometriosis risk prediction.
Differential PRS Associations with clinical presentations were explored in a 2022 study that identified inverse associations between endometriosis PRS and spread of endometriosis, involvement of the gastrointestinal tract, and hormone treatment, though with limited specificity and sensitivity [10]. This suggests that specific PRS may need to be developed to predict clinical presentations in patients with endometriosis.
Implementation of robust PRS analysis requires specific computational tools and quality-controlled datasets.
Table 3: Research Reagent Solutions for Endometriosis PRS Analysis
| Tool/Reagent | Function | Implementation Example |
|---|---|---|
| PRSice-2 [32] | PRS calculation and clumping | Command-line tool for automated C+T analysis with p-value threshold optimization |
| PLINK 1.9/2.0 [31] [10] | Genotype data management and PRS calculation | --score function for applying SNP weights to target genotypes |
| GCTB [31] | Bayesian PRS modeling | Implementation of SBayesR for effect size shrinkage using summary statistics |
| Illumina Global Screening Array [10] | Genotyping platform | Used in clinical studies for generating target genotype data |
| TOPMed Imputation Server [10] | Genotype imputation | Reference-based imputation to increase SNP coverage using TOPMed panel |
| FlashPCA2 [10] | Principal component analysis | Population structure correction in target datasets |
Validation Procedures for endometriosis PRS must include association testing in independent cohorts, with particular attention to subtype-specific performance. The 2021 study by Søgaard et al. demonstrated a rigorous validation approach, testing PRS performance in three different cohorts: surgically confirmed cases from a specialized endometriosis center, cases from a twin registry based on ICD-10 codes, and a large replication analysis in the UK Biobank [6]. This multi-cohort approach strengthens the evidence for PRS validity across different ascertainment methods.
Pleiotropy Assessment through PRS-PheWAS represents an advanced application, as demonstrated in a 2023 study that investigated associations between endometriosis PRS and numerous health conditions, biomarkers, and reproductive factors across males, females, and females without endometriosis diagnoses [31]. This approach helps elucidate the broader phenomic impact of endometriosis genetic risk factors and reveals potential shared biological pathways with comorbid conditions.
SNP selection and weighting strategies for endometriosis PRS have evolved from simple approaches using a handful of genome-wide significant SNPs to sophisticated methods incorporating thousands of variants with Bayesian shrinkage or machine learning algorithms. The genetic architecture of endometriosis, with its subtype-specific burden and varying heritability across disease stages, necessitates careful consideration of both SNP selection parameters and weighting schemes. Optimal PRS construction for endometriosis research should account for the specific research question—whether predicting general risk, specific subphenotypes, or exploring genetic correlations with comorbid conditions. As GWAS sample sizes continue to grow and methods become more refined, PRS is poised to play an increasingly important role in endometriosis research, from elucidating biological mechanisms to potentially informing clinical stratification in the future.
The accuracy of endpoint ascertainment is a fundamental methodological consideration in endometriosis research, particularly for studies evaluating polygenic risk score (PRS) performance across disease subphenotypes. The diagnostic gold standard for endometriosis remains surgical visualization with histological confirmation, yet research practicality often necessitates using registry-based diagnoses from administrative health data or self-reporting [22] [34]. This technical guide examines the operational characteristics, validation evidence, and methodological implications of these divergent ascertainment approaches for genetic epidemiological studies.
The prolonged diagnostic delay of 7-11 years from symptom onset to surgical diagnosis exacerbates ascertainment challenges, as many cases remain undetected in population-based registries [22] [3]. Furthermore, endometriosis manifests as heterogeneous subphenotypes—superficial peritoneal endometriosis (SPE), ovarian endometriomas (OMA), and deep infiltrating endometriosis (DIE)—each with distinct clinical presentations and potentially different genetic architectures [22]. Understanding how diagnostic ascertainment methods capture this heterogeneity is crucial for interpreting PRS performance across subtypes.
Surgical confirmation represents the diagnostic reference standard, characterized by direct visualization of lesions during laparoscopy or laparotomy, often accompanied by histological examination. The procedural methodology typically involves:
This method allows for precise subphenotype classification according to established systems including rASRM, ENZIAN, and AAGL classifications [22]. However, surgical confirmation introduces selection biases as it typically captures patients with more severe symptoms, infertility, or those who have failed conservative management.
Registry-based diagnoses utilize International Classification of Diseases (ICD) codes from administrative health databases, typically coded as N80.0-N80.9 for endometriosis and its subtypes. The methodological framework involves:
This approach enables large sample sizes and population-based sampling but is subject to coding inaccuracies, healthcare access biases, and variability in clinical diagnostic practices preceding code assignment.
Recent validation studies provide quantitative metrics for interpreting registry-based diagnoses against the surgical gold standard. A 2024 analysis of the ENDO Study cohort (n=412) linked with the Utah Population Database offers key validation statistics [34]:
Table 1: Validation Metrics for Endometriosis Diagnoses in Administrative Health Data Versus Surgical Confirmation
| Endometriosis Category | Sensitivity | Specificity | Agreement (Kappa) | Sample Size |
|---|---|---|---|---|
| Overall Endometriosis | 0.88 | 0.87 | 0.74 (Substantial) | 173 |
| Superficial Endometriosis | 0.86 | 0.83 | 0.65 (Substantial) | 143 |
| Ovarian Endometriomas | 0.82 | 0.92 | 0.58 (Moderate) | 38 |
| Deep Infiltrating Endometriosis | 0.12 | 0.99 | 0.17 (Slight) | 58 |
These data reveal critical patterns: while overall endometriosis diagnosis shows substantial agreement between administrative data and surgical confirmation, deep infiltrating endometriosis is markedly under-ascertained in registry data [34]. This has profound implications for genetic studies targeting this specific subphenotype.
Emerging research also addresses the validity of self-reported endometriosis. A 2025 validation study within the Australian Longitudinal Study on Women's Health (ALSWH) found high agreement between self-report and clinical diagnosis, though specific metrics were not provided in the available excerpt [35]. Previous literature cited in the validation studies suggests confirmation rates between 84-95% for self-reported endometriosis when verified against surgical records [34].
The choice of ascertainment method significantly influences PRS performance metrics and downstream analyses. Evidence from recent studies demonstrates:
Table 2: PRS Performance Across Diagnostic Ascertainment Methods in Endometriosis
| Study Cohort | Ascertainment Method | Sample Size | PRS Odds Ratio per SD | P-value | Subtype Information |
|---|---|---|---|---|---|
| Clinical Cohort [6] | Surgical confirmation | 249 cases, 348 controls | 1.59 | 2.57×10⁻⁷ | Complete subphenotyping |
| Danish Twin Registry [6] | ICD-10 codes from patient registry | 140 cases, 316 controls | 1.50 | 0.0001 | Limited to ICD subcodes |
| UK Biobank [6] | ICD-10 codes + self-report | 2,967 cases, 256,222 controls | 1.28 | <2.2×10⁻¹⁶ | Basic subtype differentiation |
The pattern of decreasing odds ratios with increasing sample size and less stringent ascertainment reflects the dilution effect of including misclassified cases and etiologically heterogeneous phenotypes. Notably, the PRS showed association with all endometriosis subtypes in surgically confirmed cases (ovarian: OR=1.72, infiltrating: OR=1.66, peritoneal: OR=1.51) [6], highlighting the value of precise phenotyping for elucidating subtype-specific genetic architectures.
The validation evidence demonstrates that misclassification varies substantially across endometriosis subphenotypes. Deep infiltrating endometriosis shows particularly poor sensitivity in administrative data (12%) despite high specificity (99%) [34]. This differential misclassification introduces substantial bias in genetic association studies:
Diagram 1: Differential ascertainment of endometriosis subphenotypes impacts PRS performance. Registry-based diagnoses show markedly lower sensitivity for deep infiltrating disease (12%) compared to superficial (86%) and ovarian (82%) forms [34].
For optimal PRS development and validation, a hybrid ascertainment approach leveraging the complementary strengths of both methods is recommended:
To maximize data quality within each ascertainment framework, implement standardized protocols:
Surgical confirmation protocols:
Registry-based ascertainment protocols:
Table 3: Key Research Reagents and Resources for Endometriosis Cohort Studies
| Resource Category | Specific Examples | Research Application | Technical Considerations |
|---|---|---|---|
| Validation Cohorts | ENDO Study (Utah operative cohort) [34] | Provides gold-standard phenotyping for algorithm validation | Limited diversity; surgical population |
| Biobanks | UK Biobank [6] [2], Danish Twin Registry [6] | Large-scale genetic studies with health record linkage | Heterogeneous phenotyping across sites |
| Genetic Arrays | Illumina Infinium MethylationEPIC BeadChip [36] | Epigenomic profiling of endometrial tissue | Cellular heterogeneity impacts interpretation |
| PRS Methods | SBayesR [2], LDpred | Polygenic risk score calculation | Sensitivity to ancestral background |
| Phenotyping Tools | rASRM operative forms [34], ENZIAN classification [22] | Standardized surgical documentation | Inter-rater variability requires training |
| Data Linkage Systems | Utah Population Database [34] | Links surgical data with longitudinal health records | Privacy protections limit granular data |
The choice between surgical confirmation and registry-based diagnoses represents a fundamental trade-off between phenotyping precision and sample size in endometriosis genetic research. Surgical confirmation enables precise subphenotype characterization essential for elucidating subtype-specific genetic architectures but introduces selection biases and limits sample size. Registry-based diagnoses facilitate large-scale genetic studies but suffer from differential misclassification across subphenotypes, particularly for deep infiltrating disease.
For PRS studies targeting specific endometriosis subphenotypes, surgical confirmation remains preferable despite practical limitations. In registry-based studies, researchers should implement validated case definitions, acknowledge differential misclassification, and conduct sensitivity analyses to assess robustness of findings. The integration of novel data sources, including molecular markers and advanced imaging, promises to enhance future phenotyping approaches beyond this traditional dichotomy.
As endometriosis research advances toward personalized risk prediction and targeted interventions, precise phenotype ascertainment will remain the foundation upon which valid genetic discoveries are built.
This whitepaper provides a comprehensive technical analysis of performance metrics, specifically odds ratios, associated with various subtypes of endometriosis and their corresponding risks for ovarian cancer. Framed within a broader thesis on polygenic risk score performance across endometriosis subphenotypes, this guide synthesizes cutting-edge genetic epidemiology and clinical cohort studies to elucidate distinct risk profiles. Endometriosis, a complex inflammatory condition affecting approximately 6.3% to 11% of reproductive-aged women, is now recognized not as a single entity but as a spectrum of diseases with potentially divergent etiologies and oncogenic potentials [37]. Understanding these subtype-specific risk profiles is critical for refining genetic risk models and developing targeted surveillance and prevention strategies for at-risk populations. This document serves as a critical resource for researchers, scientists, and drug development professionals working to translate genetic discoveries into clinically actionable insights.
Table 1 summarizes the causal relationships between genetically proxied endometriosis and major ovarian cancer histotypes, as determined by two-sample Mendelian randomization analysis. These data establish that endometriosis significantly increases risk for specific, but not all, ovarian cancer subtypes [38].
Table 1: Causal Effects of Endometriosis on Ovarian Cancer Histotypes via Mendelian Randomization
| Ovarian Cancer Histotype | Odds Ratio (OR) | 95% Confidence Interval | P-value |
|---|---|---|---|
| Overall Ovarian Cancer | 1.18 | 1.10-1.28 | < 0.001 |
| High-Grade Serous | 1.12 | 1.01-1.23 | 0.03 |
| Clear Cell Carcinoma | 1.87 | 1.44-2.43 | < 0.001 |
| Endometrioid Carcinoma | 1.48 | 1.30-1.69 | < 0.001 |
| Low-Grade Serous | Not Significant | - | - |
| Invasive Mucinous | Not Significant | - | - |
Different anatomic subtypes of endometriosis demonstrate differential oncogenic potential. Table 2 presents odds ratios for ovarian cancer histotypes based on specific endometriosis locations, revealing distinct patterns of association that underscore their etiological heterogeneity [38].
Table 2: Anatomic Subtype-Specific Causal Effects on Ovarian Cancer Histotypes
| Endometriosis Subtype | High-Grade Serous OR (95% CI) | Clear Cell OR (95% CI) | Endometrioid OR (95% CI) |
|---|---|---|---|
| Pelvic Peritoneal | Not Significant | 1.81 (1.52-2.16) | Not Significant |
| Deep Infiltrating | 1.10 (1.04-1.17) | Not Significant | 1.25 (1.13-1.40) |
| Ovarian | 1.09 (1.02-1.15) | 1.65 (1.46-1.86) | 1.48 (1.30-1.69) |
| Rectovaginal | Not Significant | Not Significant | 1.25 (1.04-1.51) |
Table 3 illustrates the association of a 14-SNP polygenic risk score with endometriosis and its major subtypes across multiple cohorts. These findings demonstrate that PRS captures increased risk for all types of endometriosis rather than site-specific susceptibility [6] [39] [40].
Table 3: Polygenic Risk Score Associations Across Endometriosis Subtypes
| Study Cohort | Overall Endometriosis OR (95% CI) | Ovarian Endometriosis OR (95% CI) | Infiltrating Endometriosis OR (95% CI) | Peritoneal Endometriosis OR (95% CI) |
|---|---|---|---|---|
| Combined Danish Cohorts | 1.57 (p = 2.5×10⁻¹¹) | 1.72 (p = 6.7×10⁻⁵) | 1.66 (p = 2.7×10⁻⁹) | 1.51 (p = 2.6×10⁻³) |
| UK Biobank | 1.28 (p < 2.2×10⁻¹⁶) | - | - | - |
The most robust evidence for subtype-specific risks comes from Mendelian randomization (MR) studies, which utilize genetic variants as instrumental variables to infer causality while minimizing confounding bias inherent in observational studies [38]. The MR approach relies on three core assumptions: (1) genetic variations must be strongly associated with the exposure (endometriosis subtypes), (2) genetic variations must not be associated with confounders, and (3) genetic variations must affect the outcome (ovarian cancer) only through the exposure [38].
Instrumental Variable Selection: Genome-wide association study (GWAS) summary data for endometriosis subtypes were obtained from the FinnGen Consortium (20,190 cases, 130,160 controls of European ancestry) [38]. Ovarian cancer GWAS data came from the Ovarian Cancer Association Consortium (25,509 cases, 40,941 controls) [38]. Single nucleotide polymorphisms (SNPs) significantly associated with endometriosis (p < 5 × 10⁻⁸) were selected as instrumental variables, with linkage disequilibrium clumping (r² < 0.001) to ensure independence [38]. Weak instrument bias was assessed via F-statistics (range: 30.01-228.09), with all values >10 indicating robust instruments [38].
Statistical Analysis: The primary analysis used inverse variance weighted (IVW) meta-analysis to combine SNP-specific causal estimates [38]. Sensitivity analyses included MR-Egger regression (to assess directional pleiotropy), weighted median method (providing consistent estimates when up to 50% of information comes from invalid instruments), and MR-PRESSO (to identify and correct for outliers) [38]. Heterogeneity was assessed using Cochran's Q statistic, with random-effects models applied when significant heterogeneity was detected [38].
Polygenic risk scores aggregate the effects of multiple genetic risk variants into a single measure of genetic susceptibility [6] [40]. The PRS methodology employed in the cited studies followed this protocol:
Variant Selection: A 14-variant PRS was derived from the largest endometriosis GWAS meta-analysis published at the time, comprising over 17,000 cases [6] [40]. These SNPs represented genome-wide significant lead variants from the discovery GWAS.
Score Calculation: The PRS was calculated as the weighted sum of risk alleles: PRS = β₁SNP₁ + β₂SNP₂ + ... + β₁₄SNP₁₄, where β represents the effect size (log odds ratio) of each SNP from the original GWAS [6]. Scores were standardized to a mean of 0 and standard deviation of 1 for analysis.
Validation Cohorts: The PRS was validated across three independent cohorts: (1) surgically confirmed cases from a Western Danish endometriosis referral center (249 cases, 348 controls), (2) cases identified from the Danish Twin Registry based on ICD-10 codes (140 cases, 316 controls), and (3) replication in the UK Biobank (2,967 cases, 256,222 controls) [6] [40]. Association analyses used logistic regression with PRS as predictor and endometriosis status as outcome, adjusting for principal components to account for population stratification.
Long-term hospital-based cohort studies provide critical insights into the natural history and recurrence patterns of different endometriosis subtypes [41]. The methodology for these studies typically includes:
Patient Recruitment: Medical records of all patients undergoing surgery for endometriosis during a defined period (e.g., 1997-2018) are reviewed [41]. Inclusion criteria typically require surgically confirmed endometriosis recurrence, defined as subsequent surgery for endometriosis after previous complete surgical excision [41].
Subtype Classification: Three primary subtypes are defined based on surgical and histopathological findings: superficial peritoneal endometriosis (SUP), ovarian endometrioma (OMA), and deep infiltrating endometriosis (DIE) [41]. Each subtype is confirmed through visual inspection during laparoscopy and histopathological examination of excised tissue.
Outcome Measures: The primary outcomes are time to recurrence and variation in endometriosis subtype between first and recurrent surgeries [41]. Statistical analyses include Kaplan-Meier survival curves for recurrence-free survival, Cox proportional hazards models for time-to-event data, and logistic regression for subtype transitions.
The association between specific endometriosis subtypes and distinct ovarian cancer histotypes suggests underlying biological mechanisms that may drive malignant transformation. The following diagram illustrates the proposed signaling pathways linking endometriosis subtypes to ovarian cancer development:
Figure 1: Proposed Pathophysiological Pathways from Endometriosis Subtypes to Ovarian Cancer
This mechanistic model illustrates how different endometriosis subtypes create distinct microenvironments conducive to specific ovarian cancer histotypes. Pelvic peritoneal endometriosis is strongly linked to chronic inflammation and shows particular specificity for clear cell carcinoma development [38] [42]. Ovarian endometriomas involve iron-induced oxidative stress from recurrent hemorrhage, creating conditions favorable for both clear cell and endometrioid carcinomas [38]. Deep infiltrating endometriosis promotes tissue remodeling and fibrosis, with associations spanning multiple histotypes including endometrioid and high-grade serous carcinomas [38].
Table 4 catalogs key reagents and methodologies essential for investigating endometriosis subtypes and their associated ovarian cancer risks.
Table 4: Research Reagent Solutions for Endometriosis Subtype Studies
| Reagent/Methodology | Function/Application | Example Implementation |
|---|---|---|
| FinnGen Consortium GWAS Data | Provides genetic association summary statistics for endometriosis subtypes | Source of genetic instruments for Mendelian randomization studies [38] |
| OCAC Ovarian Cancer GWAS | Offers genomic data for ovarian cancer histotype analysis | Outcome data for causal inference analyses [38] |
| 14-SNP Polygenic Risk Score | Quantifies aggregated genetic susceptibility to endometriosis | PRS construction using effect sizes from largest endometriosis GWAS meta-analysis [6] |
| CD138/Syndecan-1 Immunohistochemistry | Identifies plasma cells for diagnosis of chronic endometritis | Marker for endometrial inflammatory profile in peritoneal endometriosis [42] |
| Laparoscopic Visualization & Staging | Gold standard for endometriosis diagnosis and subtyping | Surgical confirmation of SUP, OMA, and DIE subtypes according to rASRM classification [41] [42] |
| MR-PRESSO Statistical Package | Detects and corrects for horizontal pleiotropy in MR studies | Outlier removal and distortion testing in causal inference analyses [38] |
| Utah Population Database | Population-based resource linking pedigrees with health data | Retrospective cohort studies of endometriosis-ovarian cancer associations [37] |
The comprehensive analysis of performance metrics across endometriosis subtypes reveals a complex landscape of subtype-specific ovarian cancer risks. The data demonstrate that pelvic peritoneal lesions show particular specificity for clear cell carcinoma, while deep infiltrating endometriosis exhibits broader associations across multiple histotypes. The polygenic risk scores currently available capture general endometriosis susceptibility rather than subtype-specific risk, highlighting an important limitation in current genetic prediction models. These findings underscore the necessity for refined classification systems that integrate anatomical, molecular, and genetic data to improve risk stratification. For drug development professionals, these insights suggest potential opportunities for subtype-targeted therapeutic strategies and prevention protocols. Future research should focus on elucidating the precise molecular mechanisms driving the subtype-specific malignant transformation and developing more sophisticated polygenic risk models that can accurately predict not just overall endometriosis risk, but specifically the high-risk subtypes associated with ovarian cancer development.
Endometriosis is a complex, chronic inflammatory condition affecting approximately 10% of women of reproductive age, characterized by the growth of endometrial-like tissue outside the uterus [31] [43] [44]. It presents a substantial diagnostic challenge, with an average delay of 7-10 years from symptom onset to definitive diagnosis, primarily because the current gold standard requires invasive laparoscopic surgery [31] [44]. The disease demonstrates significant heterogeneity in its clinical presentation, anatomical location, and treatment response, creating an pressing need for better stratification tools and early detection methods [45] [44].
The genetic component of endometriosis is substantial, with heritability estimates ranging from 47% to 51% [31] [10]. This strong genetic basis has motivated the development of polygenic risk scores (PRS) which aggregate the effects of numerous genetic variants into a single measure of genetic liability [6] [43]. Large-scale biobanks, particularly the UK Biobank and Danish health registries, have become invaluable resources for developing and validating these PRS, providing the extensive genotyped populations necessary for robust statistical analysis [31] [6] [43].
This technical guide examines the application of PRS for endometriosis within biobank populations, focusing specifically on methodologies and findings from the UK Biobank and Danish registry studies. Framed within broader research on PRS performance across endometriosis subphenotypes, this review provides researchers and drug development professionals with a comprehensive analysis of current capabilities, methodological considerations, and clinical translational potential.
Studies conducted in both Danish and UK Biobank populations have consistently demonstrated the association between endometriosis PRS and disease risk, though effect sizes vary across cohorts and endometriosis subtypes.
Table 1: Performance Metrics of Endometriosis PRS Across Biobank Studies
| Cohort | Case Definition | Sample Size (Cases/Controls) | Odds Ratio per SD | P-value | Subtypes Analyzed |
|---|---|---|---|---|---|
| Danish Clinical Cohort | Surgically confirmed | 249/348 | 1.59 | 2.57×10^-7 | Ovarian, Infiltrating, Peritoneal |
| Danish Twin Registry | ICD-10 codes | 140/316 | 1.50 | 0.0001 | Ovarian, Infiltrating, Peritoneal |
| Combined Danish Cohorts | Mixed | 389/664 | 1.57 | 2.5×10^-11 | All major subtypes |
| UK Biobank | ICD-10 codes | 2,967/256,222 | 1.28 | <2.2×10^-16 | All major subtypes |
The Danish cohorts, particularly those with surgically confirmed cases, demonstrated higher effect sizes compared to the UK Biobank [6] [43]. This difference may reflect variations in case ascertainment, with surgical confirmation potentially identifying more severe cases. When analyzing specific subtypes, infiltrating endometriosis showed the strongest genetic association (OR = 1.66), followed by ovarian (OR = 1.72) and peritoneal (OR = 1.51) subtypes in the combined Danish cohorts [43]. Importantly, PRS was not associated with adenomyosis (N80.0), suggesting distinct genetic architectures between these related conditions [6] [43].
PRS performance shows significant variation across the genetic ancestry continuum, an important consideration for equitable application [46]. A comprehensive evaluation of the UK Biobank PRS Release demonstrated that accuracy decreases individual-to-individual along the continuum of genetic distances from the training data, with a Pearson correlation of -0.95 between genetic distance and PRS accuracy averaged across 84 traits [46].
Table 2: PRS Performance Across Genetic Ancestries in UK Biobank Testing Subgroup
| Genetic Ancestry | Sample Size in Testing Subgroup | Relative Performance* | Key Considerations |
|---|---|---|---|
| European | 97,608 | Reference | Best performance due to match with training population |
| South Asian | 9,542 | Moderate decrease | Portability affected by genetic distance |
| East Asian | 2,864 | Significant decrease | Substantial portability gap |
| African | 9,476 | Largest decrease | Greatest need for diverse reference data |
*Relative performance compared to European ancestry based on multiple traits [47]
This ancestry-based performance decay highlights the critical need for diverse training populations and careful interpretation of PRS across different genetic backgrounds [46] [47]. When applying PGS models trained on individuals labelled as white British in the UK Biobank to individuals with European ancestries in external cohorts, individuals in the furthest genetic distance decile have 14% lower accuracy relative to the closest decile [46].
The development of polygenic risk scores for endometriosis follows a structured pipeline from genotyping to clinical application:
Robust quality control procedures are essential for reliable PRS calculation. The standard pipeline includes:
Sample Quality Control: Exclusion of samples with ≥15% missing rates, followed by exclusion of samples with ≥5% missing rates after marker QC [10]. Related samples (PI-HAT > 0.1875) are excluded, along with samples whose genotyped sex cannot be determined and those with high heterozygosity rates (exceeding three standard deviations from the mean) [10].
Marker Quality Control: Removal of markers with non-called alleles, missing call rates > 0.05, Hardy-Weinberg equilibrium P-value < 1×10^−5, and those showing significant differential missingness between cases and controls (P < 1×10^−5) [10]. Only autosomal SNPs are retained for analysis.
Population Stratification: Principal components are calculated using pruned SNP sets without linkage disequilibrium, with outliers excluded (deviation >6 times interquartile range) [10]. These components are included as covariates in association analyses to control for population stratification.
Two primary approaches have been employed in endometriosis PRS studies:
SBayesR Method: A Bayesian method implemented in GCTB 2.02 for adjusting GWAS summary statistics effect sizes, performed with default settings while excluding the MHC region and imputing sample size [31]. This method was used in the PRS-PheWAS analysis of UK Biobank data.
Clumping and Thresholding: A more straightforward method implemented in PLINK software, calculating both unweighted (counting risk alleles) and weighted scores (using beta values of effect sizes) [10]. The Danish registry studies utilized a 14-SNP PRS derived from lead SNPs identified in a large endometriosis GWAS meta-analysis [6] [43].
A polygenic risk score phenome-wide association study (PRS-PheWAS) was conducted in the UK Biobank to investigate the pleiotropic effects of genetic liability to endometriosis [31]. This approach tested associations between the endometriosis PRS and numerous health conditions, biomarkers, and reproductive factors across females, males, and females without an endometriosis diagnosis.
The workflow for this analysis proceeded systematically:
Key findings from this PRS-PheWAS included:
Multiple health conditions, biomarkers, and reproductive factors were associated with genetic liability to endometriosis across all groups, including males and females without diagnosed endometriosis [31].
Differences in associated traits between males and females highlighted the importance of sex-specific pathways in the overlap of endometriosis with many other traits [31].
A particularly significant association was identified between genetic liability to endometriosis and lower testosterone levels, with Mendelian randomization analyses suggesting that lower testosterone may be causal for both endometriosis and clear cell ovarian cancer [31].
Transcriptomic analyses have revealed distinct molecular subtypes of endometriosis with implications for treatment response:
Stroma-Enriched Subtype (S1): Characterized by fibroblast activation and extracellular matrix remodeling in the ectopic milieu [45].
Immune-Enriched Subtype (S2): Marked by upregulation of immune pathways and higher positive correlation with immunotherapy response, strongly associated with failure of/intolerance to hormone therapy [45].
These molecular subtypes demonstrate the potential for PRS to inform not just disease risk but also therapeutic strategy, particularly given the association between the S2 subtype and hormone therapy resistance [45].
Table 3: Essential Research Reagents and Resources for Endometriosis PRS Studies
| Resource Category | Specific Examples | Application in Endometriosis PRS Research |
|---|---|---|
| Genotyping Arrays | Illumina Global Screening Array [10] | Initial genotyping of samples for PRS calculation |
| Imputation Reference Panels | TOPMed Version R2 on GRC38 [10] | Genotype imputation to increase SNP coverage |
| Analysis Software | PLINK 1.9/2.0 [31] [10], GCTB 2.02 [31], FlashPCA [10] | PRS calculation, quality control, population stratification |
| Biobank Resources | UK Biobank PRS Release [47], Danish National Patient Register [6] [43] | Validation cohorts with extensive phenotype data |
| Laboratory Assays | Proseek Multiplex Inflammation 1 kit [10], ELISA for AXIN1 [10] | Analysis of inflammatory proteins associated with endometriosis |
| Statistical Analysis Tools | R statistical environment, SPSS [10], CONSENSUSClusterPlus [45] | Statistical analysis and subtype identification |
The application of polygenic risk scores for endometriosis in biobank populations has substantially advanced our understanding of the genetic architecture of this complex condition. Research utilizing the UK Biobank and Danish health registries has demonstrated that PRS can effectively stratify endometriosis risk across different subtypes, with particularly strong performance for infiltrating and ovarian forms of the disease.
However, important limitations remain. The current discriminative accuracy of endometriosis PRS is not yet sufficient for standalone clinical utility [43]. Furthermore, performance varies significantly across the genetic ancestry continuum, raising equity concerns that must be addressed through more diverse training populations [46] [47]. Additionally, the association between PRS and specific clinical presentations or symptoms remains unclear, with one study finding no correlation between PRS and inflammatory proteins or TSH receptor antibodies [10].
Future research directions should focus on developing more sophisticated PRS that incorporate rare variants, epigenetic markers, and clinical risk factors [44]. Additionally, increasing diversity in genetic studies is imperative to ensure equitable benefits across all populations [46] [47]. As these tools evolve, integration of PRS with transcriptomic subtyping and clinical biomarkers promises to enable truly personalized approaches to endometriosis risk prediction, prevention, and treatment.
The integration of machine learning (ML) with polygenic risk score (PRS) modeling represents a transformative frontier in endometriosis research. This technical guide details how ML methodologies are addressing the limitations of traditional PRS by enhancing predictive accuracy, elucidating subtype-specific risk architectures, and integrating multifactorial data streams. Deploying these advanced models requires meticulously curated genomic data, robust computational frameworks, and specialized analytical pipelines. The ensuing protocols and resources provide a foundational toolkit for researchers and drug development professionals aiming to translate genetic discoveries into refined stratification tools and targeted therapeutic strategies.
Endometriosis is a complex gynecological disorder with a significant genetic component, exhibiting a heritability estimated between 47% and 51% [48] [31]. Polygenic risk scores, which aggregate the effects of many genetic variants into a single measure of genetic liability, have become a standard tool for quantifying this risk. However, traditional PRS models for endometriosis face several critical challenges that limit their clinical utility and biological insight.
Table 1: Performance of Traditional Endometriosis PRS Across Cohorts
| Cohort | Cases/Controls | Odds Ratio (OR) per SD increase in PRS | p-value | Key Finding |
|---|---|---|---|---|
| Surgically Confirmed (Danish) | 249 / 348 | 1.59 | 2.57 × 10⁻⁷ | Validates PRS in a clinical cohort [6] |
| Danish Twin Registry | 140 / 316 | 1.50 | 0.0001 | Confirms association in registry data [6] |
| UK Biobank (Replication) | 2,967 / 256,222 | 1.28 | < 2.2 × 10⁻¹⁶ | Replicates in a large, independent biobank [6] |
| Combined Danish Cohorts | 389 / 664 | 1.57 | 2.5 × 10⁻¹¹ | Demonstrates consistent effect [6] |
A primary limitation is the modest predictive power of existing scores. As shown in Table 1, while PRS consistently shows a significant association with endometriosis risk, the discriminative accuracy is not yet sufficient for standalone clinical diagnosis [6] [10]. Furthermore, traditional PRS often fails to capture the heterogeneity of the disease. For instance, a PRS based on 14 genome-wide significant SNPs was associated with all major subtypes of endometriosis (ovarian, infiltrating, peritoneal) but was not associated with adenomyosis, suggesting a distinct genetic etiology for this related condition [6]. This underscores the need for models that can differentiate between disease subphenotypes.
Another layer of complexity arises from the interaction between genetic risk and comorbid conditions. A recent study using the UK and Estonian Biobanks found that the comorbidity burden was positively correlated with endometriosis PRS in women without endometriosis, but negatively correlated in women with the disease [49]. This indicates a complex interplay where the clinical manifestation of genetic risk is modified by other physiological factors. ML approaches are uniquely positioned to model these non-linear interactions and integrate diverse data types, paving the way for more powerful, personalized risk assessment.
Machine learning algorithms move beyond the linear assumptions of traditional PRS by identifying complex, non-additive interactions between genetic variants and integrating genetic data with clinical and molecular phenotypes. Below are detailed methodologies for key experimental approaches.
Objective: To develop a unified predictive model for endometriosis by integrating PRS with a wide array of clinical diagnoses, lifestyle factors, and female health-relevant data. Experimental Workflow:
Diagram 1: Multimodal ML workflow for endometriosis risk prediction.
Objective: To systematically identify the pleiotropic effects of genetic liability for endometriosis on other diagnoses, biomarkers, and reproductive factors, independent of disease diagnosis. Experimental Workflow:
Objective: To develop a rapid, cost-effective pre-screening tool using non-genomic data to identify individuals at high risk, who could then be prioritized for genetic testing or more invasive diagnostics. Experimental Workflow:
Understanding the functional mechanisms through which endometriosis-risk variants operate is critical for refining PRS and identifying druggable pathways. Integrating GWAS findings with functional genomic data reveals the tissue-specific regulatory architecture of genetic risk.
Table 2: Tissue-Specific Regulatory Profiles of Endometriosis Risk Loci
| Tissue | Prominent Biological Hallmarks | Key Regulator Genes |
|---|---|---|
| Sigmoid Colon & Ileum | Immune response, epithelial signaling | MICB, CLDN23 |
| Ovary, Uterus, Vagina | Hormonal response, tissue remodeling, cell adhesion | GATA4 |
| Peripheral Blood | Systemic immune and inflammatory signals | - |
A systematic analysis of endometriosis-associated GWAS variants against the GTEx eQTL database reveals distinct tissue-specific regulatory profiles [7]. As summarized in Table 2, risk variants exert their effects through different biological processes in reproductive versus intestinal tissues. This suggests that a more powerful PRS could be constructed by prioritizing variants based on their functional activity in disease-relevant tissues.
Diagram 2: Tissue-specific functional characterization of risk variants.
The pathways identified through these integrative analyses point toward specific therapeutic opportunities. Drug-repurposing analyses based on the implicated biological systems have highlighted potential interventions currently used for breast cancer and preterm birth prevention [52]. Furthermore, the finding that genetic liability to lower testosterone may be causal for endometriosis opens up novel avenues for therapeutic targeting [31].
Table 3: Essential Research Reagents and Resources for ML-PRS Studies
| Resource Category | Specific Tool / Assay | Function in Experimental Pipeline |
|---|---|---|
| Genotyping & Biobanks | UK Biobank, Estonian Biobank, FinnGen | Provide large-scale genomic and phenotypic data for model training and validation [49] [50] [31]. |
| Genotyping Technology | Illumina Global Screening Array | High-throughput genotyping of research samples to generate PRS input data [10]. |
| Functional Genomics | GTEx eQTL Database (v8) | Annotates GWAS variants with tissue-specific gene regulatory information [7]. |
| Phenotype Processing | Phecode Map (v1.2) | Standardizes ICD-10 codes into analyzable phenotype groups for PheWAS [31]. |
| Proteomic Analysis | Proseek Multiplex Inflammation I Kit (Olink) | Quantifies 92 inflammatory serum proteins for integration with PRS models [10]. |
| Spectroscopic Analysis | ATR-FTIR Spectrometer (e.g., Bruker ALPHA II) | Generates biochemical profile spectra from urine for non-genomic ML models [51]. |
| ML & Statistical Software | PLINK, CatBoost, XGBoost, SHAP, GCTB (for SBayesR) | Software packages for core genetics, machine learning, and model interpretation [50] [31]. |
The application of machine learning to polygenic risk scoring is fundamentally advancing the research landscape for endometriosis. By moving beyond simple, additive genetic models, ML enables the development of integrated, multifactorial risk stratification systems that account for disease heterogeneity, complex comorbidities, and tissue-specific biology. The experimental frameworks and tools detailed in this guide provide a roadmap for building more predictive and biologically interpretable models. The continued growth of large, diverse biobanks, coupled with advances in functional genomics and explainable AI, will be crucial for translating these sophisticated models into tangible benefits for patient stratification and the development of novel therapeutics.
Endometriosis is a complex, chronic inflammatory condition affecting approximately 10% of women of reproductive age, characterized by the presence of endometrial-like tissue outside the uterine cavity [6] [10]. Despite significant advances in understanding its genetic architecture, the clinical translation of polygenic risk scores (PRS) remains challenged by the disease's substantial heterogeneity. Current PRS models, derived from genome-wide association studies (GWAS), demonstrate promising but incomplete discriminative ability across diverse clinical presentations [6] [10]. This technical analysis examines the performance gaps of endometriosis PRS across different subphenotypes, exploring the molecular foundations of this heterogeneity and proposing methodological frameworks for enhanced risk stratification.
The fundamental limitation of current PRS approaches lies in their predominantly generalized construction, which often fails to capture the spectrum of molecular mechanisms driving distinct clinical manifestations. While GWAS have identified numerous susceptibility loci, these variants are primarily non-coding and likely exert tissue-specific regulatory effects that remain poorly characterized in the context of different endometriosis subphenotypes [7]. This gap is particularly problematic for drug development, where targeted therapeutic strategies require precise patient stratification based on underlying disease drivers rather than blanket genetic risk assessment.
Table 1: Performance of a 14-SNP PRS for Endometriosis Across Cohorts
| Cohort | Cases/Controls | Odds Ratio (per SD) | P-value | Subtype Analysis |
|---|---|---|---|---|
| Danish Surgical Cohort | 249/348 | 1.59 | 2.57×10^-7 | Surgically confirmed (ASRM II-IV) |
| Danish Twin Registry | 140/316 | 1.50 | 0.0001 | ICD-10 coded cases |
| Combined Danish Cohorts | 389/664 | 1.57 | 2.5×10^-11 | All major subtypes |
| - Ovarian (N80.1) | - | 1.72 | 6.7×10^-5 | Specific subtype |
| - Infiltrating (N80.4-N80.5) | - | 1.66 | 2.7×10^-9 | Specific subtype |
| - Peritoneal (N80.2-N80.3) | - | 1.51 | 2.6×10^-3 | Specific subtype |
| UK Biobank Replication | 2,967/256,222 | 1.28 | <2.2×10^-16 | Large-scale validation |
Data adapted from [6]
As demonstrated in Table 1, while PRS consistently shows association with endometriosis risk across diverse cohorts, the effect sizes vary considerably. The reduction in odds ratio observed in the larger UK Biobank cohort (OR=1.28) compared to the smaller Danish surgical cohort (OR=1.59) suggests potential spectrum bias or differences in case ascertainment methods [6]. Importantly, the PRS showed no significant association with adenomyosis (N80.0), indicating some specificity to endometriosis pathogenesis mechanisms [6] [39].
Table 2: PRS Associations with Clinical Presentation Features
| Clinical Feature | Association Direction | Statistical Significance | Cohort Details |
|---|---|---|---|
| Disease Spread | Inverse | Lost significance (p-trend) | 172 patients, surgical confirmation [10] |
| Gastrointestinal Involvement | Inverse | Lost significance (p-trend) | 172 patients, surgical confirmation [10] |
| Hormone Treatment Response | Inverse | Lost significance (p-trend) | 172 patients, surgical confirmation [10] |
| Inflammatory Proteins (AXIN1, ST1A1, CXCL9) | No correlation | Non-significant | Multiplex immunoassay [10] |
| TSH Receptor Antibodies (TRAb) | No correlation | Non-significant | Electro Chemi Luminescence Immunoassay [10] |
Data adapted from [10]
A critical limitation emerges when examining specific clinical presentations. As summarized in Table 2, a dedicated study of 172 surgically confirmed endometriosis patients found that PRS showed inverse associations with disease spread, gastrointestinal involvement, and hormone treatment that failed to maintain statistical significance when calculated as p for trend [10]. This indicates that current PRS models lack the sensitivity to predict disease severity or specific phenotypic manifestations, severely limiting their clinical utility for personalized treatment approaches.
Recent transcriptomic profiling has revealed fundamental molecular heterogeneity in endometriosis that likely explains the limitations of current PRS approaches. Unsupervised clustering of 198 ectopic endometriosis lesions identified two distinct subtypes:
These subtypes demonstrate significant clinical relevance, with the S2 subtype strongly associated with failure of or intolerance to hormone therapy [53]. This stratification provides a biological basis for the observed poor correlation between PRS and treatment response noted in Table 2.
The functional characterization of endometriosis-associated genetic variants reveals additional complexity. An analysis of 465 genome-wide significant variants found that they exhibit tissue-specific regulatory effects as expression quantitative trait loci (eQTLs) [7]:
This tissue-specific regulatory pattern suggests that current PRS models, which typically aggregate genetic effects across tissues, may obscure important subtype-specific risk mechanisms.
Molecular Heterogeneity in Endometriosis Pathogenesis
Protocol: Identification of Molecular Subtypes via Unsupervised Clustering
Data Acquisition and Preprocessing
Consensus Clustering
Biological Characterization
Protocol: Development of Subtype-Informed PRS
Variant Selection and Functional Annotation
Subtype-Stratified PRS Calculation
Validation in Phenotyped Cohorts
Subtype-Informed PRS Development Workflow
Table 3: Essential Research Reagents for Endometriosis Subtype Studies
| Reagent/Technology | Application | Key Features | Representative Use |
|---|---|---|---|
| Olink Target 96 (Inflammation) | Multiplex protein quantification | 92 inflammatory proteins, proximity extension assay | Serum protein analysis in PRS-clinical correlation studies [10] [54] |
| Illumina Global Screening Array | Genotyping | High-throughput SNP array | PRS calculation in cohort studies [10] |
| Proseek Multiplex Inflammation 1 kit | Inflammation biomarker analysis | 92 protein panels, normalized protein expression (NPX) | Correlation of inflammatory proteins with clinical symptoms [10] |
| TOPMed Imputation Server | Genotype imputation | Reference panel: TOPMed Version R2 on GRC38 | Imputation of missing genotypes for PRS calculation [10] |
| GTEx v8 Database | Tissue-specific eQTL analysis | Normalized effect sizes (slope) across 54 tissues | Functional annotation of endometriosis risk variants [7] |
| xCell & CIBERSORT | Cell type decomposition | Tissue infiltration scores from transcriptomic data | Immune-stromal characterization of endometriosis subtypes [53] |
The current performance gaps in endometriosis PRS stem fundamentally from the molecular heterogeneity of the disease and the limitations of one-size-fits-all genetic risk models. The identification of distinct molecular subtypes (stroma-enriched and immune-enriched) with differential treatment responses provides both an explanation for these limitations and a pathway forward [53]. Future PRS development must incorporate tissue-specific regulatory information [7] and stratify by molecular subtypes to achieve the precision required for meaningful clinical application, particularly in drug development contexts where targeting specific pathogenic mechanisms is paramount. The integration of transcriptomic subtyping with genetic risk assessment represents the most promising approach for developing predictive models that can genuinely inform personalized therapeutic strategies for endometriosis patients.
Emerging research reveals a paradoxical relationship in endometriosis wherein a higher genetic predisposition, quantified by polygenic risk scores (PRS), is inversely associated with the spread of the disease and its involvement of the gastrointestinal (GI) tract. This whitepaper synthesizes evidence from recent clinical and genetic studies, detailing the quantitative data, experimental methodologies, and putative biological mechanisms underlying this counterintuitive phenomenon. Framed within a broader thesis on PRS performance across endometriosis subphenotypes, this review provides researchers and drug development professionals with a technical guide to the current state of the art, highlighting the potential for genetic profiling to refine patient stratification and uncover novel pathophysiology.
Endometriosis is a common, estrogen-dependent chronic inflammatory gynecological disorder, characterized by the presence of endometrial-like tissue outside the uterine cavity [55] [3]. It affects approximately 5–15% of women of reproductive age, with a heritability estimated at 47–51% [2]. The clinical presentation is profoundly heterogeneous, ranging from superficial peritoneal lesions to deep infiltrating disease that can involve the ovaries, pelvic peritoneum, and gastrointestinal tract [56].
The development of polygenic risk scores (PRS)—a weighted sum of an individual's risk alleles derived from genome-wide association studies (GWAS)—has provided a powerful tool to quantify genetic susceptibility to complex diseases like endometriosis [6]. A compelling and counterintuitive finding is emerging from PRS research: a higher genetic load for endometriosis is associated with less severe disease manifestations in specific anatomical contexts, particularly concerning disease spread and GI tract involvement [57]. This inverse association challenges simple linear models of genetic risk and suggests the existence of distinct genetic architectures underlying different disease subphenotypes. Understanding this relationship is critical for refining predictive models and developing targeted therapies.
Key studies have systematically investigated the association between PRS and specific clinical presentations of endometriosis, quantifying the relationship with disease spread and GI involvement.
Table 1: Summary of Key Studies on Inverse Associations with Endometriosis Subphenotypes
| Study Cohort | Sample Size (Cases) | PRS Construction | Association with Disease Spread | Association with GI Involvement | Key Findings |
|---|---|---|---|---|---|
| Clinical Cohort (2022) [57] | 172 | Based on previous GWAS | Inverse association identified with the spread of endometriosis. | Inverse association identified with involvement of the GI tract. | Significance was lost when calculated as p for trend; specificity and sensitivity were low. |
| Danish & UK Biobank (2021) [6] | 249 (Surgically confirmed) | 14 SNPs from a large meta-GWAS | PRS was associated with all major subtypes (Ovarian, Infiltrating, Peritoneal). | Not explicitly studied for GI tract. | PRS was not associated with adenomyosis, suggesting different genetic drivers. |
| UK Biobank PRS-PheWAS (2023) [2] | 2,967 cases (UK Biobank) | Bayesian method (SBayesR) on meta-analysis | Pleiotropic effects were found irrespective of diagnosis. | Not explicitly studied. | Genetic liability to endometriosis was causally associated with lower testosterone levels. |
The most direct evidence comes from a 2022 study that explicitly tested the association between a PRS and clinical presentation in 172 endometriosis patients [57]. The study reported inverse associations between the PRS and both the spread of endometriosis and the specific involvement of the gastrointestinal tract. However, the authors noted that the statistical significance for these associations was lost when a p for trend was calculated, and the overall specificity and sensitivity of the PRS for predicting these subphenotypes were low [57]. This indicates that while the inverse relationship is observable, the current PRS models lack the discriminatory power for standalone clinical prediction of subphenotypes.
In a larger, multi-cohort study, a 14-SNP PRS was significantly associated with all major subtypes of endometriosis, including ovarian (OR = 1.72), infiltrating (OR = 1.66), and peritoneal (OR = 1.51) [6]. This suggests the PRS captures a general genetic risk for developing endometriosis across anatomical locations, rather than a risk skewed towards a specific subtype. The critical finding that this PRS was not associated with adenomyosis reinforces the notion that distinct pathological entities within the spectrum of endometriosis-related diseases have unique genetic drivers [6].
To enable replication and critical evaluation, this section details the experimental methodologies from the pivotal studies cited.
1. Patient Cohort Identification and Phenotyping:
2. Genotyping and PRS Calculation:
3. Statistical Analysis:
1. Cohort Descriptions and Definition of Cases/Controls:
2. Assay Design and Genotyping:
3. PRS Calculation and Statistical Analysis:
The observed inverse associations suggest that a higher genetic load for endometriosis might trigger compensatory biological pathways or that distinct genetic variants are linked to localized versus widespread disease. Two primary, non-mutually exclusive mechanisms are supported by the literature.
A landmark PRS-phenome-wide association study (PheWAS) revealed that genetic liability to endometriosis is associated with lower levels of testosterone [2]. Follow-up Mendelian randomization analyses suggested that lower testosterone may have a causal effect on increasing endometriosis risk.
Figure 1: Hormonal Pathway Linking PRS and Disease Spread
This pathway posits that a specific genetic profile (high PRS) predisposes individuals to lower circulating testosterone. Since testosterone may have a protective effect against the establishment and growth of ectopic lesions, individuals with a high PRS (and thus lower testosterone) might be more likely to develop initial endometriosis. However, this same hormonal milieu could be less conducive to the specific processes required for deep infiltration and GI tract involvement, potentially through modulation of immune cell function or fibrosis, leading to the observed inverse association with severe spread [2] [3].
The gut microbiota plays a significant role in regulating systemic inflammation and estrogen metabolism (the "estrobolome") [58] [55] [59]. Dysbiosis, characterized by a shift in microbial communities, is frequently reported in endometriosis patients.
Figure 2: Gut-Microbiota-Immune Axis in Endometriosis
A high-PRS genetic background might be linked to a gut microbiome configuration that, while permissive for the initial establishment of endometriosis (via elevated systemic inflammation and estrogen levels), simultaneously creates an environment that is resistant to the deep infiltration of the intestinal wall. For instance, specific microbial communities could influence the local immune landscape in the peritoneal cavity or the integrity of the gastrointestinal mucosal barrier, thereby limiting the ability of lesions to penetrate the GI tract [58] [55] [59]. This would manifest as an inverse association between PRS and GI involvement.
Table 2: Essential Research Reagents for Endometriosis PRS and Mechanistic Studies
| Reagent / Material | Function / Application | Example / Note |
|---|---|---|
| Genotyping Arrays | Genome-wide genotyping of DNA samples to determine individual genotypes for PRS calculation. | Illumina Global Screening Array, Infinium arrays. |
| GWAS Summary Statistics | Source of SNP effect sizes (odds ratios) and p-values used as weights for PRS calculation. | Data from large consortia like Sapkota et al. (2017) or FinnGen [6] [2]. |
| PRS Calculation Software | Software tools to compute polygenic risk scores for individuals in a cohort. | PLINK1.9 --score function, PRSice, LDPred [6] [2]. |
| ELISA Kits / Multiplex Immunoassays | Quantification of protein biomarkers in serum/plasma (e.g., inflammatory cytokines, hormones). | For measuring IL-1β, IL-18, TGF-β, TNF-α, Testosterone, Estradiol. |
| 16S rRNA Sequencing Reagents | Profiling the composition of the gut microbiota to identify dysbiosis. | Kits for amplification and sequencing of the 16S rRNA gene (e.g., targeting V4 region). |
| TLR4/NF-κB Pathway Inhibitors/Agonists | Mechanistic studies to validate the role of microbial components in inflammation. | Lipopolysaccharides (LPS) as TLR4 agonists; TAK-242 as a TLR4 signaling inhibitor. |
| Cell Culture Models | In vitro studies of endometriotic epithelial and stromal cell behavior. | Immortalized human endometriotic stromal cells (e.g., 12Z cell line). |
The inverse association between a high polygenic risk score for endometriosis and the spread of disease or gastrointestinal involvement presents a fascinating paradox that underscores the complexity of the disease's genetic architecture. Current evidence, while suggestive, indicates that standalone PRS models currently lack the sensitivity for clinical subphenotype prediction. The integration of PRS with other data layers, such as hormone levels (e.g., testosterone), gut microbiome profiles, and inflammatory biomarkers, is a promising avenue for building more powerful predictive models.
For drug development, these findings highlight the need for therapies that target specific biological pathways (e.g., testosterone-mediated effects or TLR4/NF-κB signaling) which may be more relevant for patients with certain genetic backgrounds and disease manifestations. Future research must prioritize large-scale, deeply phenotyped cohorts with genomic, microbiomic, and hormonal data to disentangle these complex relationships and fully realize the potential of polygenic risk scoring in endometriosis patient stratification and personalized treatment.
Endometriosis, a complex gynecological disorder affecting approximately 10% of reproductive-aged women, demonstrates a substantial heritable component, with genetic factors accounting for an estimated 50% of disease susceptibility [5] [60]. While polygenic risk scores (PRS) have emerged as valuable tools for aggregating the effects of numerous genetic variants, their predictive power remains limited for clinical implementation, with studies reporting area under the curve (AUC) values typically ranging from 0.546 to 0.636 [6] [61]. This limitation stems from the modest effect sizes of individual risk variants and the inability of PRS to capture the significant environmental contributions to disease pathogenesis [6] [61].
The integration of epigenetic data, particularly DNA methylation, offers a promising approach to enhance risk prediction models. DNA methylation represents a dynamic interface between genetic predisposition and environmental exposures, potentially capturing both inherited and acquired risk factors [5] [62]. Recent evidence indicates that DNA methylation profiles in endometrial tissue can capture approximately 15.4-24.2% of the variance in endometriosis status, with a significant portion (12-16.1%) remaining after accounting for genetic variation [5] [36]. This independent contribution highlights the potential of methylation risk scores (MRS) to complement traditional PRS and improve risk stratification across diverse endometriosis subphenotypes.
Table 1: Performance Comparison of Risk Prediction Models in Endometriosis
| Model Type | Key Components | AUC/Performance | Variance Explained | Sample Size | Reference |
|---|---|---|---|---|---|
| PRS (14-SNP) | 14 genetic variants from GWAS | OR = 1.57-1.59 per SD | ~26.2% (SNP heritability) | 249 cases, 348 controls | [6] |
| MRS | 746 DNAm sites | AUC = 0.6748 | 12-16.1% (independent of genetics) | 908 samples | [5] |
| Combined PRS+MRS | Genetic + epigenetic markers | Consistently higher than PRS alone | 37% combined (20.9% genetics + 16.1% DNAm) | 984 participants | [5] [36] |
| Multi-PRS Model | 40 PRSs across multiple traits | AUC = 0.636 | N/R | 1,996 women | [61] |
| Phenotype-Only Questionnaire | CA125, fatigue, gynecological symptoms | AUC = 0.904 | N/R | 506 participants | [61] |
Table 2: DNA Methylation Variance Components in Endometrial Tissue
| Variance Component | Proportion Explained | Biological Interpretation | Study Details |
|---|---|---|---|
| Total DNAm Variance | 24.2% | Combined genetic and environmental influences | Analysis of 759,345 DNAm sites in 984 samples [36] |
| DNAm Variance (independent of genetics) | 16.1% | Pure epigenetic contribution after controlling for SNPs | OREML models including GRM and ORM [5] [36] |
| Genetic Variance | 20.9% | Common SNP-based heritability | Simultaneous modeling with DNAm [36] |
| Combined Genetic + Epigenetic | 37% | Total variance captured by integrated model | [36] |
| Menstrual Cycle Phase | 4.30% | Hormonal influence on methylation patterns | After SVA correction [36] |
The development of robust MRS models requires stringent quality control protocols across multiple processing stages. For endometrial tissue studies, the initial sample collection phase should incorporate standardized surgical techniques and precise menstrual cycle dating through histological assessment according to Noyes' criteria [36]. Following tissue acquisition, DNA extraction should be performed using standardized kits such as the DNeasy Blood & Tissue Kit, with DNA quality verification through spectrophotometry or fluorometry [63].
For methylation analysis, the Illumina Infinium MethylationEPIC BeadChip platform provides comprehensive genome-wide coverage of over 850,000 CpG sites [36]. Quality control should include:
Batch effects from technical variables (array processing date, position) and biological covariates (age, institution) must be addressed through surrogate variable analysis (SVA), which has been shown to effectively reduce false positives while preserving biological signals [5] [36].
The construction of MRS follows a multi-step analytical pipeline with specific considerations for endometriosis applications:
Differential Methylation Analysis: Identify significantly associated CpG sites using linear models adjusted for key covariates including age, menstrual cycle phase, genetic ancestry, and technical batch effects. The model typically takes the form:
M-value ~ Endometriosis_status + Age + Cycle_phase + Genetic_PCs + SV1...SVk
where M-values represent logit-transformed beta values for improved statistical properties [36].
Feature Selection: Apply genome-wide significance thresholds (Bonferroni-corrected p < 6.58×10^-8 for EPIC array) to identify robustly associated CpG sites. In endometriosis, studies have identified significant signals in genes including ELAVL4 and TNPO2 in advanced stage disease [36].
Weighted Score Calculation: Generate MRS using effect size-weighted sums of methylation values:
MRS = Σ(β_i × DNAm_i)
where β_i represents the effect size estimate for each CpG site i from the discovery analysis [5].
Model Validation: Implement rigorous train-test validation splits, ideally separating samples by recruitment institution to ensure independence. Performance evaluation should include AUC calculations, sensitivity analyses across disease stages, and assessment of subtype-specific predictive ability [5].
The integration of MRS with PRS requires careful consideration of genetic and epigenetic relationships. Critical steps include:
mQTL Analysis: Identify methylation quantitative trait loci (mQTLs) where genetic variants influence DNA methylation levels. Recent large-scale studies have detected 118,185 independent cis-mQTLs in endometrial tissue, including 51 associated with endometriosis risk [36]. These represent prime candidates for integrated risk modeling.
Variance Partitioning: Use omics residual maximum likelihood (OREML) analyses to quantify the proportion of disease variance captured by genetic (GRM) and methylation (ORM) relationship matrices [5] [36]. This approach demonstrated that combining both matrices captured 37% of endometriosis variance, significantly exceeding either component alone.
Clinical Subphenotype Stratification: Evaluate model performance across endometriosis subtypes, including rASRM stages (I-IV), lesion characteristics (ovarian, peritoneal, deeply infiltrating), and infertility associations. Current evidence indicates stronger epigenetic effects in advanced stage (III/IV) disease [36].
DNA methylation alterations in endometriosis converge on several key pathological pathways that may inform both risk prediction and therapeutic targeting:
Hormonal Response Pathways: Key genes including ESR1 (estrogen receptor), PGR (progesterone receptor), and HOXA10 exhibit disease-specific methylation patterns associated with progesterone resistance and estrogen dominance [64] [65] [63]. Hyperestrogenism resulting from CYP19/aromatase hypomethylation creates a permissive environment for ectopic lesion growth [62] [63].
Immune-Inflammatory Regulation: Methylation changes in genes encoding inflammatory mediators (COX-2, IL-12B, TNF-α) contribute to the characteristic inflammatory microenvironment of endometriosis [64] [62]. Genome-wide analyses identify enrichment in HTLV infection, PI3K-Akt, and oxytocin signaling pathways [63].
Tissue Remodeling and Cell Adhesion: Aberrant methylation in extracellular matrix (ECM) interaction pathways, including adherens junctions, focal adhesion, and regulation of actin cytoskeleton, facilitates ectopic implantation and survival [36] [60].
Oxidative Stress Response: The interplay between oxidative stress and epigenetic modifications creates a feed-forward loop that promotes disease progression, with oxidative stress both influencing and being influenced by DNA methylation patterns [62].
Table 3: Essential Research Resources for Endometriosis Epigenetic Studies
| Category | Specific Product/Platform | Key Applications | Performance Considerations |
|---|---|---|---|
| DNA Methylation Profiling | Illumina Infinium MethylationEPIC BeadChip | Genome-wide CpG methylation analysis (850,000+ sites) | Coverage includes enhancers, gene bodies, promoters; suitable for formalin-fixed samples [36] |
| Targeted Methylation Analysis | Zymo Research EZ DNA Methylation Kit | Bisulfite conversion for targeted sequencing | High conversion efficiency (>99%); compatible with multiple sample types [63] |
| DNA Extraction | DNeasy Blood & Tissue Kit (Qiagen) | High-quality DNA from endometrial tissues | Effective for difficult tissues; minimal contaminant carryover [63] |
| Bioinformatic Analysis | R packages: minfi, sva, DMRcate | Preprocessing, batch correction, DMR identification | Integration with Bioconductor; comprehensive QC metrics [5] [36] |
| Multi-omic Integration | OREML, MOA, METASOFT | Variance partitioning, cross-omics analysis | Accounts for relatedness; handles mixed models [5] [36] |
| Validation Platforms | Pyrosequencing, bisulfite sequencing | Targeted validation of significant CpG sites | Quantitative results; high sensitivity and reproducibility [60] |
The integration of methylation risk scores with traditional polygenic risk scores represents a paradigm shift in endometriosis risk prediction, moving beyond static genetic assessment to incorporate dynamic molecular measures that reflect both genetic predisposition and environmental influences. Current evidence demonstrates that MRS captures significant disease variance independent of PRS, with combined models explaining approximately 37% of endometriosis risk [5] [36].
Future research directions should prioritize several key areas:
For drug development applications, MRS may facilitate patient stratification for clinical trials, particularly for therapies targeting specific molecular subtypes. The identified methylation signatures highlight potential therapeutic targets, including chromatin-modifying enzymes and methylation-sensitive signaling pathways [64] [62]. As epigenetic therapies advance, MRS could guide personalized treatment approaches based on individual methylation profiles, ultimately improving outcomes for women across the endometriosis disease spectrum.
Endometriosis is a complex, chronic inflammatory gynecological disease characterized by the presence of endometrial-like tissue outside the uterine cavity, affecting approximately 10% of women of reproductive age [66] [3]. The diagnostic journey for endometriosis remains challenging, with an average delay of 7-11 years from symptom onset to surgical confirmation, underscoring the critical need for non-invasive diagnostic strategies [31] [3]. This whitepaper explores the integration of polygenic risk scores (PRS) with inflammatory biomarkers and hormonal profiles to enhance risk prediction, disease stratification, and understanding of endometriosis subphenotypes.
The heterogeneous nature of endometriosis manifests in varied clinical presentations, treatment responses, and molecular profiles. Current classification systems based solely on surgical findings fail to predict therapeutic outcomes or correlate well with symptom severity [66] [45]. Emerging evidence suggests that molecular subtyping may provide superior stratification, with recent transcriptomic analyses identifying distinct stroma-enriched and immune-enriched subtypes that demonstrate varying responses to hormone therapy [45]. Within this context, the integration of PRS with inflammatory and hormonal biomarkers offers a promising multidimensional approach to deciphering endometriosis heterogeneity and advancing personalized medicine approaches.
Polygenic risk scores aggregate the effects of numerous genetic variants to quantify an individual's inherited susceptibility to endometriosis. The heritability of endometriosis is estimated at 47-51%, with genome-wide association studies (GWAS) identifying multiple risk loci including genes involved in the development and regulation of the female reproductive tract [31] [10]. PRS derived from these studies has demonstrated consistent predictive value across diverse populations.
Table 1: Performance Characteristics of Endometriosis PRS Across Cohorts
| Cohort | Cases/Controls | Odds Ratio per SD | P-value | Subtypes Assessed |
|---|---|---|---|---|
| Danish Surgical Cohort | 249/348 | 1.59 | 2.57×10⁻⁷ | Ovarian, Infiltrating, Peritoneal |
| Danish Twin Registry | 140/316 | 1.50 | 0.0001 | Population-based |
| Combined Danish Cohorts | 389/664 | 1.57 | 2.5×10⁻¹¹ | All major subtypes |
| UK Biobank | 2,967/256,222 | 1.28 | <2.2×10⁻¹⁶ | Large-scale validation |
PRS demonstrates differential performance across endometriosis subtypes, with the strongest association observed for ovarian endometriosis (OR=1.72) followed by infiltrating (OR=1.66) and peritoneal (OR=1.51) subtypes [39] [67]. Notably, PRS shows no significant association with adenomyosis, suggesting distinct genetic architectures between these related conditions [39]. This specificity underscores the value of PRS in elucidating biological mechanisms underlying different endometriosis subphenotypes.
Endometriosis is characterized by a chronic inflammatory state that drives disease progression and symptom manifestation. The inflammatory microenvironment involves complex interactions between immune cells, cytokines, chemokines, and growth factors.
Table 2: Key Inflammatory Biomarkers in Endometriosis
| Biomarker Category | Specific Analytes | Alteration in Endometriosis | Functional Role |
|---|---|---|---|
| Macrophage Factors | MIF, MCP-1 | Increased in peritoneal fluid | Recruitment of macrophages, promotion of angiogenesis and cell survival |
| Cytokines | IL-1, TNF-α | Elevated in ectopic lesions | Pro-inflammatory signaling, pain mediation |
| Chemokines | CXCL9 | Altered expression | Immune cell recruitment and activation |
| Growth Factors | VEGF, FGF | Increased in ectopic sites | Angiogenesis, lesion establishment |
| Nuclear Factors | NF-κB | Activated | Master regulator of inflammatory response |
Macrophage migration inhibitory factor (MIF) deserves particular attention as it enhances levels of anti-apoptotic proteins during retrograde menstruation, promotes cell survival, and activates pathways involving migration, invasion, and angiogenesis [68]. The nuclear factor kappa B (NF-κB) pathway serves as a central regulator of inflammation in endometriosis, controlling transcription of pro-inflammatory cytokines, cell adhesion molecules, and survival factors [68].
Endometriosis is an estrogen-dependent disorder characterized by hormonal imbalances that create a permissive environment for lesion establishment and growth. Key hormonal alterations include estrogen dominance, progesterone resistance, and perturbations in androgen signaling.
Recent Mendelian randomization studies have identified a causal relationship between lower testosterone levels and endometriosis risk, suggesting that reduced androgen signaling may contribute to disease pathogenesis [31]. This finding is further supported by observations of reduced testosterone concentrations in follicular fluid of endometriosis patients undergoing assisted reproductive technologies [3].
Progesterone resistance represents another hallmark of endometriosis, manifested through reduced expression of progesterone receptors (particularly PR-B), disrupted signaling pathways, and altered regulation of downstream targets [3]. The enzyme aromatase (CYP19A1), responsible for converting androgens to estrogens, shows increased expression in endometrial tissues of endometriosis patients and demonstrates promising diagnostic accuracy with 79% sensitivity and 89% specificity [3].
Integrating PRS with inflammatory biomarkers and hormonal profiles requires sophisticated computational approaches that can handle the multidimensional nature of the data. Machine learning techniques, particularly deep neural networks, have shown promise in enhancing genomic prediction of endometriosis by capturing complex non-linear relationships between genetic variants and disease phenotypes [33].
Statistical methods for data integration include:
The integration of these disparate data types requires careful normalization, dimensionality reduction, and validation in independent cohorts to ensure robustness and generalizability.
A standardized workflow for collecting and analyzing multimodal data is essential for generating comparable results across research studies. The following diagram illustrates an integrated experimental pipeline:
This integrated workflow enables researchers to capture the complex interactions between genetic predisposition, inflammatory processes, and endocrine dysregulation that collectively drive endometriosis pathogenesis and heterogeneity.
Recent transcriptomic analyses have revealed distinct molecular subtypes of endometriosis that transcend traditional anatomical classification systems. Unsupervised clustering of ectopic lesion gene expression profiles identifies two main subtypes: stroma-enriched (S1) and immune-enriched (S2) [45].
The stroma-enriched subtype (S1) is characterized by:
The immune-enriched subtype (S2) demonstrates:
These molecular subtypes show distinct clinical behaviors, particularly in their response to hormone therapy. The immune-enriched subtype is significantly associated with failure or intolerance to conventional hormone treatments, suggesting the potential for alternative therapeutic approaches targeting immune pathways [45].
The integration of PRS, inflammatory markers, and hormonal profiles reveals intricate signaling networks that drive different endometriosis subphenotypes. The following diagram illustrates key pathways and their interactions:
The NF-κB pathway serves as a central integrator of genetic, inflammatory, and hormonal signals in endometriosis. Activation of this pathway promotes cytokine production, cell survival, angiogenesis, and contributes to progesterone resistance—collectively driving disease progression and therapeutic challenges [68].
Sample Preparation and Genotyping:
PRS Calculation:
plink --score prs_weights.txt 1 2 4 headerValidation Approaches:
Multiplex Immunoassays:
Specific Inflammatory Assays:
Sex Steroid Quantification:
Functional Hormonal Assays:
Table 3: Essential Research Reagents and Platforms
| Category | Specific Product/Platform | Application in Endometriosis Research |
|---|---|---|
| Genotyping | Illumina Global Screening Array | Genome-wide SNP profiling for PRS calculation |
| Genotyping | TOPMed Imputation Server | Phasing and imputation of missing genotypes |
| PRS Calculation | PLINK (v1.9+) | PRS generation and basic genetic association analysis |
| PRS Calculation | GCTB (v2.02) | Bayesian methods for SNP weighting (SBayesR) |
| Inflammatory Profiling | Olink Proseek Multiplex Inflammation I | Simultaneous quantification of 92 inflammatory proteins |
| Inflammatory Profiling | RBM InflammationMAP v1.1 | 54-analyte multi-analyte profile for inflammatory patterns |
| Hormonal Assays | LC-MS/MS platforms | Gold standard for steroid hormone quantification |
| Hormonal Assays | Electrochemiluminescence Immunoassays | High-sensitivity measurement of reproductive hormones |
| Cell Analysis | xCell Analysis Package | Estimation of immune cell type enrichment from transcriptomic data |
| Cell Analysis | CIBERSORT | Digital cytometry for estimating immune cell fractions |
| Data Integration | ConsensusClusterPlus | Unsupervised molecular subtyping through consensus clustering |
| Data Integration | WGCNA Package | Weighted gene co-expression network analysis |
The integration of polygenic risk scores with inflammatory biomarkers and hormonal profiles represents a paradigm shift in endometriosis research, moving beyond singular approaches to embrace the multidimensional nature of the disease. This integrated framework enables refined subphenotyping, improved risk prediction, and insights into the biological mechanisms underlying different disease manifestations.
Future research directions should focus on:
As these integrated approaches mature, they hold significant promise for transforming endometriosis from a surgically diagnosed disease to one characterized through molecular signatures, ultimately enabling earlier intervention, personalized treatment strategies, and improved quality of life for affected individuals.
Polygenic risk scores (PRS) have emerged as powerful tools for quantifying an individual's genetic predisposition to complex diseases. However, their clinical utility is severely limited by a critical challenge: population stratification and ancestry-specific effects. The performance of PRS developed primarily in European-ancestry populations deteriorates substantially when applied to individuals of diverse genetic backgrounds [69]. This transferability problem stems from fundamental differences in allele frequencies, linkage disequilibrium (LD) patterns, and population-specific causal variants across ancestrally diverse populations [70].
Within endometriosis research, where subphenotype characterization is crucial for understanding disease mechanisms and progression, these challenges are particularly acute. Endometriosis exhibits a heritability of 47-51% [11] [2], making genetic risk prediction highly promising, yet current PRS explain only a fraction of this heritability and perform suboptimally across global populations. This technical guide provides comprehensive methodologies for addressing population stratification and developing ancestry-aware PRS, with specific application to endometriosis subphenotype research.
The attenuation of PRS performance across diverse populations arises from several interconnected factors:
The ability to correct for population stratification depends critically on demographic history. Recent population structure (originating within the past 100 generations) presents particular challenges:
Figure 1: Differential impact of demographic history on genetic variant informativeness for population structure correction. Recent structure is better captured by rare variants, while perpetual structure is captured by both common and rare variants [71].
Table 1: Characteristics of Population Structure Types
| Structure Type | Time Depth | Common Variant Informativeness | Rare Variant Informativeness | Recommended Correction Approach |
|---|---|---|---|---|
| Recent Structure | ~100 generations | Low (explains ~3% of spatial variance) | High | Rare-variant PCA or IBD-based methods |
| Perpetual Structure | Infinite horizon | High (explains ~50% of spatial variance) | High | Common-variant PCA or LMMs |
| Admixed Populations | Variable | Moderate (depends on admixture timing) | Moderate to High | Local ancestry-aware methods |
Developing robust PRS begins with the GWAS stage, where several strategies can improve cross-ancestry portability:
Several advanced statistical methods have been developed specifically for cross-ancestry PRS construction:
Table 2: Performance Comparison of PRS Methods Across Ancestries in CAD Risk Prediction
| PRS Method | European Ancestry OR/SD | African Ancestry OR/SD | East Asian Ancestry OR/SD | Admixed Calibration (Brier Score) |
|---|---|---|---|---|
| Multi-ancestry PRS-CSx | 1.63 (1.52-1.75) | 1.53 (1.15-2.05) | 1.54 (1.28-1.86) | 0.06085 |
| GPS_CAD (European-centric) | 1.49 (1.40-1.59) | 1.04 (0.79-1.39) | 1.26 (1.05-1.52) | 0.06089 |
| AllelicaCADEUR_2020 | 1.56 (1.46-1.67) | 1.24 (0.92-1.67) | 1.41 (1.17-1.69) | 0.06095 |
| multiGRS_CAD | 1.51 (1.41-1.61) | 1.30 (0.96-1.76) | 1.38 (1.15-1.66) | 0.06107 |
Data adapted from [69] demonstrating superior performance of multi-ancestry methods across diverse populations. OR/SD = Odds Ratio per Standard Deviation.
Endometriosis research presents unique challenges due to disease heterogeneity and subphenotypes:
Robust QC procedures are essential prior to PRS development:
Figure 2: Comprehensive quality control workflow for multi-ancestry PRS development, incorporating sample- and variant-level QC with robust ancestry inference [70].
A standardized protocol for endometriosis GWAS in diverse populations:
Cohort Preparation: Harmonize endometriosis subphenotype definitions across studies using revised American Fertility Society (rAFS) staging where available [11] [28].
Stratified Analysis: Perform GWAS separately in each ancestry group (European, African, East Asian, etc.) with ancestry-appropriate covariates:
Meta-Analysis: Combine ancestry-specific results using sample-size weighted meta-analysis or heterogeneous effects models (e.g., RE2) for loci showing heterogeneity [28].
Finemapping: Apply statistical finemapping methods (e.g., SUSIE, FINEMAP) within each ancestry to identify putative causal variants.
Robust validation of ancestry-aware endometriosis PRS requires:
Table 3: Research Reagent Solutions for Ancestry-Aware Endometriosis PRS Development
| Resource Category | Specific Tools | Application in Endometriosis PRS | Key Considerations |
|---|---|---|---|
| GWAS Processing | REGENIE, SAIGE, PLINK | Case-control association testing for endometriosis and subphenotypes | Account for binary and quantitative subphenotypes; use Firth correction for rare variants |
| Ancestry Inference | PCA, ADMIXTURE, GRAF | Genetic ancestry determination in multi-ethnic cohorts | Use reference panels (1000 Genomes, gnomAD) for projection; assess admixture proportions |
| Fine-mapping | SUSIE, FINEMAP, POLYFUN | Identify causal variants at endometriosis risk loci | Leverage cross-ancestry information to improve resolution; incorporate functional annotations |
| PRS Methods | PRS-CSx, LDpred2, CT-SLEB | Construction of ancestry-aware risk scores | Tune hyperparameters within each ancestry; assess portability metrics |
| Validation | PHEWAS, ROC analysis, NRI | Clinical utility assessment for endometriosis risk prediction | Evaluate across subphenotypes; assess improvement over clinical factors alone |
The genetic architecture of endometriosis informs PRS development strategies:
Endometriosis PRS development must account for its hormonal etiology and pleiotropic effects:
Addressing population stratification and ancestry-specific effects is not merely a statistical challenge but an essential requirement for equitable implementation of PRS in endometriosis research and clinical care. The methodologies outlined in this guide provide a framework for developing ancestry-aware PRS that perform robustly across diverse populations.
Future efforts should focus on: (1) expanding GWAS diversity to include currently underrepresented populations; (2) developing methods that efficiently leverage admixed individuals as biological bridges between ancestry groups; (3) integrating functional genomics data to improve fine-mapping and biological interpretation; and (4) validating PRS in clinical settings for endometriosis risk stratification and early intervention.
As sample sizes continue to grow through initiatives like All of Us, Biobank Japan, and H3Africa, the opportunities for developing clinically useful, ancestrically informed PRS for endometriosis subphenotypes will expand, ultimately enabling more personalized approaches to diagnosis, prevention, and treatment.
Endometriosis is a complex gynecological disorder affecting approximately 10% of reproductive-aged women, characterized by the presence of endometrial-like tissue outside the uterus. The disease presents substantial diagnostic challenges, with average delays of 7-11 years between symptom onset and definitive diagnosis via laparoscopy. Polygenic risk scores (PRS) have emerged as promising tools for quantifying genetic susceptibility by aggregating the effects of multiple genetic variants into a single metric. Understanding how PRS performs across differently ascertained cohorts—from deeply phenotyped clinical samples to large biobank populations—is crucial for developing clinically applicable risk stratification tools. This technical analysis examines the performance characteristics of an endometriosis PRS across Danish clinical and registry-based cohorts compared with replication data from the UK Biobank.
The validation strategy utilized three distinct cohorts to assess PRS performance across different ascertainment methods and population structures.
Table 1: Cohort Characteristics and Endometriosis Definitions
| Cohort | Sample Size (Cases/Controls) | Case Ascertainment Method | Control Definition | Subtype Information |
|---|---|---|---|---|
| Danish Clinical Cohort | 249/348 | Surgical confirmation with histology + ASRM stages II-IV | Age-matched blood donors without ICD-10 N80 diagnosis | Detailed subtype classification available |
| Danish Twin Registry (DTR) | 140/316 | ICD-10 codes (N80.1-N80.9) from Danish National Patient Registry | Age-matched unrelated women without N80 diagnosis | Subtypes derived from ICD-10 codes |
| UK Biobank (Replication) | 2,967/256,222 | ICD-10 codes (N80.1-N80.9) from hospital records + self-report | No N80 diagnosis + no self-reported endometriosis | Limited subtype resolution |
The study employed a standardized approach to classify endometriosis subtypes across cohorts based on ICD-10 codes, with severity ranking from severe to mild:
The polygenic risk score was derived from 14 genome-wide significant lead SNPs identified in a published GWAS meta-analysis comprising 17,045 endometriosis cases and 191,596 controls [6]. One lead SNP (rs760794) failed assay design and was replaced with rs77294520, which showed region-wide association after conditioning on the index SNP in the GREB1 locus.
Genotyping Methods:
PRS Calculation: The PRS was calculated as the sum of risk alleles weighted by their effect sizes (log(odds ratios)) from the discovery GWAS. Each individual's genotype for the 14 SNPs was converted to a dosage of effect alleles (0, 1, or 2) and multiplied by the corresponding weight. The resulting weighted sums were standardized to z-scores for analysis.
The analytical approach employed consistent statistical methods across cohorts to ensure comparability:
Table 2: PRS Performance Across Validation Cohorts
| Cohort | Odds Ratio (per SD) | 95% Confidence Interval | P-value | Discriminative Accuracy |
|---|---|---|---|---|
| Danish Clinical | 1.59 | 1.33-1.89 | 2.57×10^-7 | Moderate |
| Danish Twin Registry | 1.50 | 1.22-1.84 | 0.0001 | Moderate |
| Combined Danish | 1.57 | 1.37-1.80 | 2.5×10^-11 | Moderate |
| UK Biobank | 1.28 | 1.24-1.33 | <2.2×10^-16 | Limited |
The PRS demonstrated significant association with endometriosis across all cohorts, with the strongest effects observed in the surgically confirmed Danish clinical cohort. The effect size attenuation in the UK Biobank likely reflects differences in case ascertainment, with the Danish clinical cohort representing more severe, surgically confirmed cases.
Table 3: Subtype-Specific Associations in Combined Danish Cohorts
| Endometriosis Subtype | Odds Ratio (per SD) | P-value | Case Count |
|---|---|---|---|
| Ovarian (N80.1) | 1.72 | 6.7×10^-5 | 75 |
| Infiltrating (N80.4, N80.5) | 1.66 | 2.7×10^-9 | 210 |
| Peritoneal (N80.2, N80.3) | 1.51 | 2.6×10^-3 | 60 |
| All Endometriosis | 1.57 | 2.5×10^-11 | 389 |
The PRS showed consistent performance across endometriosis subtypes, suggesting it captures genetic risk for endometriosis broadly rather than specificity for particular disease localizations. The similar effect sizes across subtypes indicate shared genetic architecture.
A critical validation analysis tested the PRS association with adenomyosis (N80.0) to evaluate specificity. The PRS showed no significant association with adenomyosis in either the DTR (25 cases) or UK Biobank (1,883 cases), supporting the hypothesis that adenomyosis is not driven by the same common genetic risk variants as endometriosis [6].
Table 4: Essential Research Reagents and Computational Tools
| Resource Category | Specific Tool/Resource | Application in Study |
|---|---|---|
| Genotyping Platforms | Custom SNP arrays, UK Biobank imputed data | Genotype generation and quality control |
| Statistical Software | R, PLINK, METAL, GCTB, SBayesR | PRS calculation, association testing, meta-analysis |
| Cohort Resources | Danish National Patient Registry, UK Biobank | Case ascertainment and phenotype data |
| Bioinformatics Tools | ConsensusClusterPlus, WGCNA, xCell | Subtype classification and functional analysis |
| Reference Data | GWAS catalog, ENCODE cCREs, Epimap | Functional annotation and interpretation |
The differential performance of the endometriosis PRS across cohorts reveals important considerations for clinical translation:
Recent evidence demonstrates significant interactions between endometriosis PRS and diagnosed comorbidities. The absolute increase in endometriosis prevalence conveyed by uterine fibroids, heavy menstrual bleeding, and dysmenorrhea was greater in individuals with high endometriosis PRS compared to low PRS [73] [49]. This supports a model where PRS could enhance existing clinical risk prediction by identifying women who would benefit most from intensive diagnostic investigation.
The 14-SNP PRS represents an early approach to endometriosis genetic risk prediction. More recent combinatorial analytical approaches have identified 1,709 disease signatures comprising 2,957 unique SNPs, revealing 77 novel gene associations beyond previous GWAS findings [74]. Additionally, transcriptomic analyses have identified distinct endometriosis subtypes (stroma-enriched and immune-enriched) with differential responses to hormone therapy [45], suggesting future PRS development could benefit from subtype-specific approaches.
This multi-cohort validation demonstrates that endometriosis PRS performs consistently across different ascertainment methods and populations, with measurable effect sizes that are robust but currently insufficient for standalone clinical prediction. The Danish clinical cohort provided stronger genetic effects while the UK Biobank enabled powerful replication, illustrating the complementary value of both deeply phenotyped clinical samples and large biobanks. Future research should focus on integrating PRS with clinical risk factors, exploring subtype-specific genetic architectures, and leveraging more powerful PRS methods to improve discriminative accuracy for clinical application.
This whitepaper provides a technical comparison of the discriminatory accuracy of three biomarker classes—polygenic risk scores (PRS), circulating microRNAs (miRNAs), and protein biomarkers—within endometriosis research. Endometriosis presents significant diagnostic challenges, with an average latency of 7-11 years from symptom onset to surgical diagnosis, creating an urgent need for non-invasive diagnostic solutions [66] [75]. Current research is increasingly focused on molecular subtyping to predict treatment responses, particularly given that first-line hormone therapy is effective in only approximately 40% of patients [45]. This analysis synthesizes current evidence on biomarker performance, highlighting that while PRS establishes genetic predisposition, circulating miRNAs demonstrate superior diagnostic accuracy, and multi-analyte approaches combining miRNAs with classical proteins show the most promising results for clinical application. The integration of these biomarkers holds potential for transforming endometriosis management through early detection, subphenotype classification, and personalized treatment strategies.
Table 1: Comparative Diagnostic Performance of Biomarker Classes in Endometriosis
| Biomarker Class | Specific Biomarkers | Sensitivity (%) | Specificity (%) | AUC/Other Metrics | Evidence Level |
|---|---|---|---|---|---|
| Polygenic Risk Score (PRS) | 13-SNP weighted score [10] | Not reported | Not reported | Inverse association with disease spread & hormone treatment; Low sensitivity/specificity [10] | Single study (N=140) |
| Circulating miRNAs (Single) | let-7b [76] | ~69.1 | ~69.1 | AUC: 0.691 [76] | Multiple studies |
| miR-451a, miR-20a-5p [77] | Significant differential expression | Significant differential expression | Promising ROC analysis (specific values NR) [77] | Single validation study | |
| Circulating miRNAs (Panels) | let-7b, let-7d, let-7f (proliferative phase) [76] | High | High | AUC: 0.929 [76] | Single study (N=48) |
| miR-200 family, miR-141, others [78] | 83.92 | 89.82 | Bivariate model [78] | Meta-analysis (50 articles) | |
| Protein Biomarkers | CA-125 [78] [76] | Limited alone | Limited alone | Elevated in other conditions [78] | Established use |
| Multi-Analyte | miRNA panels + CA-125/HE4 [78] | 93.39 | 92.71 | Bivariate model [78] | Meta-analysis |
Table 2: Functional Characteristics and Clinical Applicability
| Characteristic | Polygenic Risk Score (PRS) | Circulating miRNAs | Protein Biomarkers |
|---|---|---|---|
| Primary Role | Determines genetic predisposition [10] | Detects active disease; monitors treatment response [79] | Indicates inflammation/disease presence [78] |
| Temporal Dynamics | Static (lifetime risk) | Dynamic (reflects current status) [79] | Dynamic (reflects current status) |
| Stage Detection | Not established | Early and late-stage detection possible [78] | Limited early-stage detection |
| Therapy Guidance | Limited association with hormone treatment found [10] | Predicts progesterone resistance [79] | Limited |
| Key Advantages | Lifetime risk assessment | High stability, minimally invasive, tissue-specific [80] | Standardized assays |
| Major Limitations | Low predictive power for subphenotypes [10] | Lack of standardization [80] [75] | Low specificity [78] |
Objective: To calculate a PRS for endometriosis and investigate its association with clinical presentations and hormone treatment response [10].
Sample Preparation:
PRS Calculation:
Validation: Assess PRS association with clinical traits (disease spread, gastrointestinal involvement, hormone treatment) using logistic regression, calculating odds ratios (OR) with 95% confidence intervals (CI) and p-for-trend [10].
Objective: To identify and validate differentially expressed circulating miRNAs in endometriosis patients versus controls [76] [79] [77].
Sample Collection and Processing:
RNA Extraction:
miRNA Expression Analysis:
Data Analysis:
Objective: To quantify established and novel protein biomarkers in serum/plasma.
Multiplex Immunoassays:
Enzyme-Linked Immunosorbent Assay (ELISA):
Electro-Chemiluminescence Immunoassay (ECLIA):
Diagram 1: Multi-Analyte Biomarker Integration Logic - This workflow illustrates the parallel processing of different biomarker classes from a single patient sample, culminating in an integrated diagnostic model that enhances predictive power for personalized clinical decisions.
Diagram 2: Circulating miRNA Sequencing Workflow - This end-to-end protocol details the process from sample collection to biomarker identification, highlighting critical quality control steps and the transition from discovery to validation phases.
Table 3: Essential Research Tools for Endometriosis Biomarker Studies
| Category | Product/Technology | Manufacturer | Primary Application | Key Features |
|---|---|---|---|---|
| Sample Collection & Storage | DNA/RNA Shield Safe Collection Kit | Zymo Research | Stabilize nucleic acids in saliva during collection/transport [79] | Preserves miRNA integrity, prevents degradation |
| Nucleic Acid Extraction | miRNeasy Advanced Micro Kit | Qiagen | High-quality total RNA extraction from serum/plasma/saliva [79] | Includes RNA cleanup, high recovery of small RNAs |
| miRNA Library Prep | QIAseq microRNA Library Kit | Qiagen | NGS library preparation for miRNA sequencing [79] | Unique molecular indexes for accurate quantification |
| Multiplex Protein Assay | Proseek Multiplex Inflammation I | Olink Bioscience | Simultaneous measurement of 92 inflammatory proteins [10] | Proximity Extension Assay technology, high specificity |
| Genotyping | Illumina Global Screening Array | Illumina | Genome-wide SNP genotyping for PRS calculation [10] | High-density coverage, optimized for imputation |
| miRNA Detection | miRVana RNA Isolation Kit | Applied Biosystems | Total RNA isolation including small RNAs [76] | Efficient recovery of miRNA, compatible with multiple platforms |
| NCode miRNA First-Strand cDNA Synthesis | Life Technologies | Reverse transcription for miRNA qPCR analysis [76] | Poly(A) tailing-based method, high sensitivity |
The comparative analysis presented in this whitepaper reveals a clear hierarchy in discriminatory accuracy among biomarker classes for endometriosis. PRS currently demonstrates limited utility for subphenotype stratification or predicting therapeutic response, though it establishes genetic predisposition [10]. Circulating miRNAs show significantly greater promise, with panels achieving sensitivity and specificity exceeding 83% in meta-analyses [78]. The most compelling results emerge from multi-analyte approaches that combine miRNA signatures with classical protein biomarkers like CA-125, achieving diagnostic performance exceeding 93% for both sensitivity and specificity [78].
Future research should prioritize several key areas: First, standardization of pre-analytical and analytical protocols for miRNA quantification is critical to enable cross-study comparisons and clinical translation [80] [75]. Second, larger validation studies in diverse populations are needed to confirm the preliminary findings reported in many miRNA studies [77]. Third, integrated models that combine PRS for risk stratification with dynamic biomarkers (miRNAs and proteins) for active disease detection and monitoring represent the most promising path forward [45] [80]. Finally, biomarker discovery must be linked to therapeutic implications, particularly for predicting progesterone resistance, which affects approximately one-third of patients [79].
The emerging paradigm of molecular subtyping in endometriosis, such as the identification of stroma-enriched and immune-enriched subtypes, offers a framework for developing truly personalized treatment approaches [45]. By leveraging the complementary strengths of different biomarker classes, researchers and drug developers can advance both the understanding of endometriosis pathophysiology and the clinical management of this complex condition.
Endometriosis and adenomyosis are prevalent gynecological disorders that significantly impact women's health, causing symptoms such as pelvic pain, abnormal uterine bleeding, and infertility. While both conditions involve the presence of endometrial-like tissue outside its normal location, they represent distinct clinical entities with different pathological features and clinical management implications. Endometriosis is characterized by the growth of endometrial tissue outside the uterine cavity, predominantly affecting pelvic structures such as the ovaries, uterosacral ligaments, and pelvic peritoneum [81]. In contrast, adenomyosis involves the invasion of endometrial tissue into the myometrial wall of the uterus [82]. Within the broader thesis on polygenic risk score performance across endometriosis subphenotypes, this review examines the genetic evidence distinguishing these two conditions, with implications for targeted drug development and personalized treatment approaches.
The etiology of both conditions remains incompletely understood, though genetic factors play a substantial role. Family and twin studies estimate the heritability of endometriosis at approximately 50%, with common genetic variants accounting for roughly 26% of disease risk [83]. Until recently, the genetic relationship between endometriosis and adenomyosis remained largely unexplored, but emerging evidence from large-scale genetic studies now provides compelling data on their distinct genetic architectures.
Polygenic risk scores (PRS) aggregate the effects of many genetic variants to quantify an individual's genetic susceptibility to a particular condition. A pivotal study investigating the discriminative ability of a 14-variant PRS for endometriosis found a significant association with endometriosis risk across multiple cohorts, including surgically confirmed cases and population-based biobanks [6] [39]. Each standard deviation increase in the PRS was associated with endometriosis (OR = 1.57, p = 2.5×10⁻¹¹) and its major subtypes: ovarian (OR = 1.72), infiltrating (OR = 1.66), and peritoneal (OR = 1.51) [6].
Crucially, this same PRS demonstrated no significant association with adenomyosis, suggesting that adenomyosis is not driven by the same common genetic risk variants as endometriosis [6] [39]. This differential performance was consistent across both the Danish Twin Registry cohort and the UK Biobank replication analysis, providing robust evidence for distinct genetic architectures.
Table 1: Performance of Endometriosis PRS Across Phenotypes
| Phenotype | Cohort | Odds Ratio | P-value | Sample Size (Cases/Controls) |
|---|---|---|---|---|
| Endometriosis (combined) | Danish cohorts | 1.57 | 2.5×10⁻¹¹ | 389/664 |
| Ovarian endometriosis | Danish cohorts | 1.72 | 6.7×10⁻⁵ | 75/664 |
| Infiltrating endometriosis | Danish cohorts | 1.66 | 2.7×10⁻⁹ | 210/664 |
| Peritoneal endometriosis | Danish cohorts | 1.51 | 2.6×10⁻³ | 60/664 |
| Endometriosis | UK Biobank | 1.28 | <2.2×10⁻¹⁶ | 2,967/256,222 |
| Adenomyosis | UK Biobank | Not significant | - | 1,883/256,222 |
The most recent and largest genome-wide association study (GWAS) for endometriosis and adenomyosis, published as a preprint in 2025, provides further evidence of genetic distinction [52]. This multi-ancestry study of approximately 1.4 million women (including 105,869 cases) identified 80 genome-wide significant associations, with 37 novel loci. Notably, the study reported five loci that represent the first genetic variants specifically associated with adenomyosis, marking a significant advancement in understanding the unique genetic architecture of this condition [52].
Fine-mapping and colocalization analyses in this study uncovered causal loci for over 50 endometriosis-related associations, with multi-omics integration revealing that genetic variation influences endometriosis risk through transcriptomic, epigenetic, and proteomic regulation across multiple tissues [52]. These findings converge on pathways involved in immune regulation, tissue remodeling, and cell differentiation, providing insights into the distinct pathogenic mechanisms underlying these disorders.
A 2025 Mendelian randomization study identified 24 novel protein-coding genes potentially causally linked to adenomyosis, none of which had been previously reported in the context of this disease [82]. The most relevant candidate genes identified were ARHGEF35, AMT, RCVRN, GMPPB, and INTS1. Bioinformatics analysis indicated that these genes play critical roles in essential biological functions, including base-excision repair, negative regulation of various cell cycle processes, and metabolism-related pathways in adenomyosis [82].
Differential gene expression analysis comparing adenomyosis patients and healthy controls revealed that DNA2 and INTS1 displayed high expression levels, whereas EFCAB2, HLA-DQA2, and RPS26 exhibited low expression levels. The receiver operating characteristic curve analysis for the Predictive Diagnostic Index revealed an area under the curve of 0.8 for the combined analysis of the five risk genes, suggesting promise as therapeutic targets and biomarkers for early diagnosis [82].
Table 2: Key Genetic Studies Differentiating Endometriosis and Adenomyosis
| Study Type | Key Findings | Implications |
|---|---|---|
| PRS Analysis | Endometriosis PRS not associated with adenomyosis [6] [39] | Distinct genetic architectures; different underlying biology |
| Multi-ancestry GWAS | Five novel loci specific to adenomyosis identified [52] | First genetic variants specifically linked to adenomyosis |
| Mendelian Randomization | 24 novel genes causally linked to adenomyosis [82] | Potential new therapeutic targets and biomarkers |
| Genetic Correlation | Endometriosis shares genetic architecture with pain conditions [2] [83] | Explains comorbidity patterns and symptom overlap |
The studies cited employed rigorous methodological approaches to ensure reliable differentiation between endometriosis and adenomyosis. The PRS analysis utilized three distinct cohorts: surgically confirmed endometriosis cases from a specialized referral center, cases identified from registry data using ICD-10 codes, and a large replication cohort from the UK Biobank [6]. This multi-cohort approach enhanced the generalizability and robustness of the findings.
Adenomyosis cases were carefully defined to exclude women with coexisting endometriosis, using the ICD-10 code N80.0 (endometriosis of uterus) without other endometriosis diagnoses (N80.1-N80.9) [2]. This precise phenotyping was crucial for ensuring the genetic specificity observed.
The PRS was derived from 14 genetic variants identified in a published GWAS meta-analysis with more than 17,000 endometriosis cases [6]. When one lead SNP (rs760794) failed assay design, researchers included rs77294520, which was region-wide associated after conditioning on the index SNP in the GREB1 locus, demonstrating appropriate methodological adaptation.
For the PRS-PheWAS study, endometriosis PRS weightings were developed using summary statistics from seven European cohorts included in the Sapkota et al. 2017 meta-analysis (14,926 cases; 189,715 controls), meta-analyzed alongside endometriosis GWAS summary statistics from FinnGen Release 8 (13,456 cases, 100,663 controls) [2]. A Bayesian method (SBayesR) was used for adjusting the GWAS summary statistics effect sizes, performed with default settings while excluding the MHC region and imputing sample size [2].
The adenomyosis Mendelian randomization study employed summary data-based MR (SMR) analysis using single nucleotide polymorphisms as instrumental variables, along with expression quantitative trait loci (eQTL) data from whole blood and uterus as exposures and adenomyosis as the outcome [82]. The top cis-eQTL within the cis-region of a probe having the most potent effect on the gene's expression was selected as the instrumental variable. Multi-SNP SMR method was used as a sensitivity analysis to mitigate potential bias from using a single variant [82].
Diagram 1: Mendelian Randomization Approach for Identifying Causal Genes. Solid arrows represent established pathways for endometriosis; dashed arrows represent distinct pathways for adenomyosis.
Multi-omics integration from the large-scale GWAS revealed that genetic variation influences endometriosis risk through transcriptomic, epigenetic, and proteomic regulation across multiple tissues, converging on pathways involved in immune regulation, tissue remodeling, and cell differentiation [52]. Drug-repurposing analyses highlighted potential therapeutic interventions currently used for breast cancer and preterm birth prevention [52].
For adenomyosis, bioinformatics analysis indicates that identified risk genes play critical roles in essential biological functions, including base-excision repair, negative regulation of various cell cycle processes, and metabolism-related pathways [82]. These findings suggest fundamentally different pathogenic mechanisms despite some overlapping symptoms.
A PRS phenome-wide association study revealed an association between genetic liability to endometriosis and lower testosterone levels [2]. Follow-up analysis using Mendelian randomization approaches suggested lower testosterone may be causal for both endometriosis and clear cell ovarian cancer [2]. This finding provides important insights into the hormonal basis of endometriosis and potential avenues for therapeutic intervention.
The association with testosterone levels was not observed for adenomyosis, further supporting distinct endocrine pathways in these two conditions. This differential hormone sensitivity may explain their different response profiles to various hormonal treatments.
Diagram 2: Biological Pathways in Endometriosis and Adenomyosis. Solid arrows represent stronger established pathways for endometriosis; dashed arrows represent modified pathways for adenomyosis.
The differential genetic profiles between endometriosis and adenomyosis have significant implications for diagnostic development. The distinct genetic signatures suggest potential for genetic tests to supplement current diagnostic approaches, which primarily rely on imaging and invasive procedures [84] [81].
Transvaginal ultrasonography remains the first-line imaging modality for assessing adnexal masses and suspected endometriosis, while MRI is used as a secondary diagnostic tool to better characterize these lesions [84]. However, genetic biomarkers could potentially help differentiate between these conditions earlier in the diagnostic process, reducing the current diagnostic delay for endometriosis, which averages 7-11 years [2].
The identification of distinct genetic risk factors and biological pathways for endometriosis and adenomyosis opens new avenues for therapeutic development. Drug-repurposing analyses from the large-scale GWAS highlighted potential therapeutic interventions currently used for breast cancer and preterm birth prevention [52]. The identification of specific genes and pathways for adenomyosis provides new potential targets for this historically understudied condition [82].
The finding that lower testosterone levels may be causal for endometriosis suggests novel endocrine treatment approaches [2]. Furthermore, the shared genetic architecture between endometriosis and pain conditions indicates that treatments targeting pain pathways may be particularly beneficial for endometriosis patients, independent of disease modification [2] [83].
Table 3: Research Reagent Solutions for Genetic Studies
| Research Tool | Application | Key Features |
|---|---|---|
| GWAS Summary Statistics | Genetic association discovery | Large sample sizes (>100,000 participants) |
| eQTL Data (GTEx) | Mapping genetic variants to gene expression | Tissue-specific expression data (uterus, whole blood) |
| SBayesR | PRS effect size adjustment | Bayesian method for improved prediction accuracy |
| SMR Analysis | Causal inference between gene expression and disease | Integrates GWAS and eQTL data; identifies causal genes |
| ICD-10 Code Mapping | Phenotype definition in biobanks | Enables precise case identification (N80.0 vs N80.1-N80.9) |
The accumulating genetic evidence clearly demonstrates that endometriosis and adenomyosis are distinct entities with different genetic risk profiles, despite some overlapping clinical features. Polygenic risk scores developed for endometriosis show no significant association with adenomyosis, and recent large-scale genetic studies have identified variants specific to each condition. These findings have important implications for the development of targeted therapies and diagnostic tools, moving beyond the historical tendency to group these conditions together. Future research should focus on further elucidating the specific biological pathways driving each condition and translating these genetic findings into clinical applications that improve patient care and outcomes.
For researchers in this field, the methodological approaches outlined—including precise phenotyping, advanced PRS calculation methods, and integrative multi-omics analyses—provide a roadmap for continuing to unravel the genetic complexity of these conditions. As genetic datasets continue to expand and diversify, our understanding of the distinct genetic architectures of endometriosis and adenomyosis will further refine, enabling more personalized approaches to diagnosis and treatment.
The integration of polygenic risk scores (PRS) with established clinical risk factors and patient-reported symptoms represents a transformative approach for endometriosis risk stratification. This technical review synthesizes current evidence on the performance of such combined models, detailing their enhanced discriminative ability over PRS alone. We provide comprehensive quantitative comparisons, detailed experimental methodologies for model development, and resource guidance to facilitate implementation in research and clinical trial settings. For drug development professionals, these integrated models offer promising avenues for improved patient cohort stratification in clinical trials and development of personalized therapeutic strategies.
Endometriosis, affecting approximately 10% of women of reproductive age, presents significant diagnostic challenges with current delays averaging 7-10 years [85]. The complex pathogenesis involving genetic, inflammatory, and hormonal factors necessitates multifactorial risk assessment approaches. Polygenic risk scores, which aggregate the effects of multiple genetic variants into a single metric, provide a foundational genetic risk component but demonstrate limited discriminatory power as standalone tools [6] [10]. The integration of PRS with clinical manifestations and symptom profiles creates synergistic models that significantly enhance predictive accuracy and clinical utility across endometriosis subtypes.
Table 1: Performance Metrics of PRS-Clinical Combined Models
| Model Configuration | Cohort | Sample Size (Cases/Controls) | Key Predictive Features | Performance (ROC-AUC) |
|---|---|---|---|---|
| PRS Only (14-SNP) | Danish Surgical Cohort | 249/348 | 14 GWAS-identified SNPs | OR=1.59 per SD [6] |
| PRS Only (14-SNP) | UK Biobank | 2,967/256,222 | 14 GWAS-identified SNPs | OR=1.28 per SD [6] |
| Machine Learning Combined | UK Biobank | 5,924/142,723 | Genetic variants + ICD-10 history + female health data + lifestyle factors | 0.81 [50] |
| PRS + Subtype Analysis | Combined Danish | 389/664 | Infiltrating, ovarian, and peritoneal subtypes | OR=1.57-1.72 per SD [6] |
Table 2: Relative Contribution of Clinical Features in Combined Models
| Feature Category | Specific Features | Impact Measurement | Model Context |
|---|---|---|---|
| Comorbidity Profiles | Irritable bowel syndrome (IBS) | High SHAP value [50] | Machine Learning Model |
| Reproductive History | Menstrual cycle length | High SHAP value [50] | Machine Learning Model |
| Symptom Patterns | Chronic pelvic pain, dysmenorrhea | Clinical assessment correlation [10] | PRS + Clinical Assessment |
| Endometriosis Subtypes | Ovarian, infiltrating, peritoneal | OR=1.72, 1.66, 1.51 respectively [6] | PRS Subtype Stratification |
| Previous Diagnoses | ICD-10 history prior to endometriosis diagnosis | Significantly more diagnoses in cases [50] | Retrospective Model |
Sample Preparation and Quality Control
Imputation and SNP Selection
Clinical Phenotype Assessment
Machine Learning Integration Protocol
The biological plausibility of combined PRS-clinical models is reinforced by emerging research on endometriosis pathogenesis. Recent studies utilizing Mendelian randomization have identified potential causal relationships between specific plasma proteins and endometriosis development.
Key Pathway Insights:
Table 3: Essential Research Materials for PRS-Clinical Model Implementation
| Category | Specific Product/Platform | Application Context | Function |
|---|---|---|---|
| Genotyping | Illumina Global Screening Array | PRS Calculation | Genome-wide SNP profiling [10] |
| Genotyping Platform | Illumina iScan System | PRS Calculation | High-throughput screening system [10] |
| Imputation Server | TOPMed Version R2 | PRS Refinement | Missing genotype imputation [10] |
| Statistical Package | PLINK 1.9 | PRS Calculation | Genetic association analysis [10] |
| Protein Analysis | Proseek Multiplex Inflammation 1 kit | Biomarker Validation | Inflammatory protein profiling [10] |
| Immunoassay | Human R-Spondin3 ELISA Kit | Mechanistic Validation | RSPO3 protein quantification [86] |
| Machine Learning | CatBoost Gradient Boosting | Combined Model Development | Integrated model training [50] |
| Model Interpretation | SHAP (SHapley Additive exPlanations) | Feature Importance Analysis | Model explainability [50] |
The integration of PRS with clinical risk factors and symptoms significantly advances endometriosis risk prediction beyond the limitations of either approach alone. While current PRS alone demonstrates modest effect sizes (OR=1.28-1.59 per standard deviation) [6], combined models achieve substantially improved discriminative performance (ROC-AUC=0.81) [50]. This enhanced accuracy enables meaningful applications in both clinical practice and therapeutic development.
For drug development professionals, these combined models offer particularly valuable applications:
Future development should focus on refining subtype-specific models, expanding diverse ancestral representation in training datasets, and validating combined models in prospective clinical settings. The integration of additional data modalities, including imaging findings and novel circulating biomarkers, will further enhance model performance and clinical utility across the endometriosis spectrum.
Within the complex landscape of endometriosis research, the development of accurate diagnostic and prognostic tools remains a paramount challenge. This technical guide examines two powerful genomic approaches transforming our capabilities: transcriptomic biomarkers identified via machine learning and polygenic risk scores (PRS). Framed within broader thesis research on PRS performance across endometriosis subphenotypes, we provide an in-depth comparison of their methodological foundations, performance metrics, and clinical applicability. The integration of machine learning with high-throughput genomic data is advancing non-invasive diagnostic solutions, potentially overcoming the limitations of current invasive diagnostic standards and the modest predictive power of existing PRS models for this heterogeneous condition [88] [6] [89].
Transcriptomic biomarkers are genes or non-coding RNAs whose expression patterns, measured via technologies like RNA sequencing (RNA-Seq), are characteristic of a disease state. In endometriosis, these biomarkers reflect the active molecular pathways in diseased tissues and can be identified through machine learning classification of transcriptomic data [88] [90].
Polygenic Risk Scores (PRS) aggregate the cumulative effect of many genetic variants (often single nucleotide polymorphisms - SNPs) across the genome, each with small effect size, to quantify an individual's genetic susceptibility to a disease. For endometriosis, PRS are typically derived from genome-wide association study (GWAS) summary statistics [6].
Table 1: Fundamental Characteristics of Transcriptomic Biomarkers and PRS
| Feature | Transcriptomic Biomarkers | Polygenic Risk Score (PRS) |
|---|---|---|
| Basis | Gene expression levels (dynamic) | Genetic sequence variants (static) |
| Data Source | RNA-Sequencing, Microarrays | GWAS summary statistics, Genotyping arrays |
| Temporal Dynamics | Can change with disease state, environment | Fixed at birth, lifelong risk indicator |
| Primary Tissue | Often disease-relevant tissue (e.g., endometrium) or blood | Typically calculated from blood or saliva DNA |
| Machine Learning Role | Core to feature selection and classification model development | Can be integrated as one feature within larger predictive models |
The discovery of transcriptomic biomarkers follows a structured pipeline that integrates bioinformatics with machine learning [88].
Figure 1: Workflow for transcriptomic biomarker discovery using machine learning.
Protocol 1: Transcriptomic Biomarker Discovery [88]
n samples, where n is the smallest group size).The PRS development pipeline relies on large-scale genetic association data [6].
Figure 2: Standard workflow for polygenic risk score development and validation.
Protocol 2: Polygenic Risk Score Construction and Validation [6]
i from the GWAS, and ( \text{Count}i ) is the number of effect alleles (0, 1, 2) the individual carries for SNP i.Direct comparison of performance metrics reveals the distinct strengths of each approach.
Table 2: Performance Comparison of Transcriptomic Biomarkers vs. PRS in Endometriosis
| Metric | Transcriptomic Biomarkers (ML Model) | Polygenic Risk Score (PRS) |
|---|---|---|
| Primary Use Case | Disease classification & diagnosis | Genetic risk stratification |
| Reported Accuracy | 85.7% (Bagged CART) [88] | N/A (Provides odds ratio) |
| Sensitivity/Specificity | 100% / 75% [88] | N/A |
| Key Performance Indicator | Odds Ratio (OR) per SD increase: 1.57 - 1.72 (across subtypes) [6] | |
| Area Under Curve (AUC) | Not reported in cited studies | ~0.6748 (Methylation Risk Score combined with PRS) [91] |
| Validation Cohort | 16 cases, 22 controls (internal cross-validation) [88] | 249 surgically confirmed cases, 348 controls; replicated in UK Biobank (2,967 cases) [6] |
The output of these methodologies are specific genes and genetic variants.
Table 3: Specific Biomarkers and Variants Identified by Transcriptomic and PRS Approaches
| Approach | Identified Biomarkers / Key Findings | Notes / Subtype Associations |
|---|---|---|
| Transcriptomic ML | CUX2, CLMP, CEP131, EHD4, CDH24, ILRUN, LINC01709, HOTAIR, SLC30A2, NKG7 [88] | Biomarkers identified from variable importance in Bagged CART model. |
| PRS (14-SNP) | SNPs from top GWAS loci (e.g., in GREB1 region) [6] | Associated with all subtypes: Ovarian (OR=1.72), Infiltrating (OR=1.66), Peritoneal (OR=1.51). Not associated with adenomyosis. |
| Integrated ML & Genetics | Adenosine kinase, Enoyl-CoA hydratase, CCR4-NOT subunit 7 [92] | Three core biomarkers identified by combining GWAS data with machine learning. |
Successful implementation of these methodologies requires a suite of specialized reagents, software, and analytical tools.
Table 4: Key Research Reagent Solutions for Transcriptomic and PRS Studies
| Category | Item | Function / Application |
|---|---|---|
| Wet-Lab Reagents | Illumina NextSeq RNA-Seq Platform | High-throughput mRNA sequencing for transcriptomic data generation [88]. |
| Tissue Biopsy Kits | Minimally invasive collection of endometrial tissue samples [88]. | |
| DNA Genotyping Arrays | Genome-wide genotyping of single nucleotide polymorphisms (SNPs) for PRS calculation [6]. | |
| Bioinformatics Tools | FastQC | Quality control check on raw sequence data [88]. |
| Cutadapt | Removes adapter sequences and other contaminant sequences from reads [88]. | |
| Bowtie2 / TopHat | Alignment of sequencing reads to a reference genome (e.g., hg38) [88]. | |
| HTSeq | Generation of read count data for each gene [88]. | |
| Analytical Software | R / Python with scikit-learn | Machine learning model development and classification (e.g., Bagged CART, XGBoost) [88] [93]. |
| PLINK | Whole-genome association analysis toolset, used for PRS calculation [6]. | |
| GCTB (SBayesR) | Bayesian tool for adjusting GWAS summary statistics for improved PRS [6]. | |
| Glmnet (R) | Implementation of LASSO regression for feature selection in high-dimensional data [93]. |
The field is moving beyond standalone applications towards integrated models. Evidence shows that combining a Methylation Risk Score (MRS)—capturing non-genetic, environmental influences on DNA—with a PRS yields a classification performance (AUC) consistently higher than using the PRS alone [91]. Furthermore, advanced machine learning techniques like neural networks demonstrate that the predictive value of PRS is maximized when combined with rich, structured clinical data from electronic health records (EHRs), rather than used in isolation [94].
Another innovative approach merges transcriptome-wide association studies (TWAS) with patient-derived transcriptomic data and machine learning (e.g., LASSO, Boruta algorithms) to pinpoint a minimal set of predictive genes with high biological interpretability, as demonstrated in venous thromboembolism research [95]. This methodology is directly applicable to endometriosis investigations.
Both transcriptomic biomarkers, powered by machine learning, and polygenic risk scores offer distinct and valuable paths for advancing endometriosis research. Transcriptomic ML models currently show superior performance for direct disease classification, achieving high diagnostic accuracy by capturing active disease state signals. In contrast, PRS provides a static measure of genetic predisposition, effective for risk stratification across subphenotypes but with lower standalone predictive power.
The future of endometriosis diagnostics and risk prediction lies in multi-modal integration. Combining the dynamic molecular snapshot from transcriptomics (and other omics like methylomics) with the foundational genetic risk from PRS, and contextualizing both within a framework of detailed clinical data, promises to unlock the robust, personalized risk models needed to finally shorten the long diagnostic odyssey for millions of patients. For researchers focused on PRS performance across subphenotypes, these integrative approaches are essential for explaining the significant portion of disease variance that remains unaccounted for by current genetic models.
The current evidence demonstrates that polygenic risk scores show significant but subtype-dependent performance for endometriosis risk prediction, with strongest associations for ovarian and infiltrating subtypes. However, stand-alone PRS models currently lack sufficient discriminative accuracy for specific clinical presentations, highlighting the need for integrated approaches combining genetic risk with epigenetic markers, inflammatory profiles, and detailed clinical phenotyping. Future research directions should prioritize developing subtype-specific PRS models, expanding diverse population representation, and leveraging machine learning for multi-omics integration. For drug development, these findings underscore the potential of PRS for patient stratification in clinical trials and identifying shared biological pathways with comorbid conditions that may reveal novel therapeutic targets. The evolving PRS landscape promises to transform endometriosis from a surgically diagnosed condition to one with pre-symptomatic risk stratification capabilities, ultimately reducing diagnostic delays and enabling personalized intervention strategies.