Polygenic Risk Score Performance Across Endometriosis Subphenotypes: A Comprehensive Review for Researchers and Drug Developers

Isabella Reed Nov 27, 2025 549

This article synthesizes current evidence on the performance of polygenic risk scores (PRS) across diverse endometriosis subphenotypes, addressing a critical gap in precision medicine for this complex disease.

Polygenic Risk Score Performance Across Endometriosis Subphenotypes: A Comprehensive Review for Researchers and Drug Developers

Abstract

This article synthesizes current evidence on the performance of polygenic risk scores (PRS) across diverse endometriosis subphenotypes, addressing a critical gap in precision medicine for this complex disease. We explore the foundational genetic architecture of endometriosis and its subtypes, detail methodological approaches for PRS construction and application, and critically evaluate performance variations across ovarian, peritoneal, infiltrating, and other clinical presentations. The review further examines limitations in current PRS models for predicting specific clinical manifestations and explores integrative approaches combining PRS with epigenetic markers and inflammatory biomarkers. For researchers and drug development professionals, this analysis provides essential insights into the potential of PRS for patient stratification, subtype-specific risk prediction, and guiding targeted therapeutic development.

The Genetic Architecture of Endometriosis: From Heritability to Subtype-Specific Risk Loci

Heritability Estimates and Familial Risk Patterns in Endometriosis

Endometriosis, a chronic gynecological condition affecting approximately 10% of reproductive-aged women, demonstrates a substantial genetic component, with heritability estimates ranging from 47% to 51% based on twin studies [1] [2]. This technical review synthesizes current understanding of heritability patterns, polygenic risk score (PRS) performance across endometriosis subphenotypes, and associated molecular mechanisms. We examine methodologies for estimating genetic contribution, from traditional familial risk assessment to advanced genomic approaches, including methylation risk score (MRS) modeling and expression quantitative trait loci (eQTL) analysis. The integration of polygenic risk scores with epigenetic data demonstrates enhanced predictive power over genetic risk assessment alone, highlighting the complex interplay between inherited variants and regulatory mechanisms in disease pathogenesis. This synthesis provides researchers and drug development professionals with a comprehensive framework for advancing personalized diagnostic and therapeutic strategies in endometriosis.

Endometriosis is characterized by the presence of endometrial-like tissue outside the uterine cavity, leading to chronic pelvic pain, infertility, and reduced quality of life [3]. The diagnostic delay for this condition ranges from 7 to 12 years, contributing to its significant socioeconomic burden and underscoring the urgent need for better understanding of its genetic architecture to enable early detection and intervention [3]. While retrograde menstruation remains a prevailing theory of pathogenesis, this alone cannot explain why only some individuals develop the condition, pointing to substantial genetic predisposition factors [1].

The genetic basis of endometriosis has been elucidated through various study designs, including twin studies, which have established heritability estimates of approximately 50% [1] [2]. More recent genome-wide association studies (GWAS) have identified multiple risk loci, with the largest meta-analysis to date comprising 60,674 cases and 701,926 controls, identifying 42 genome-wide significant loci that explain up to 5.01% of disease variance [4]. This review systematically examines the methodologies for estimating heritability, familiar risk patterns, and the performance of polygenic risk scores across endometriosis subphenotypes, providing critical insights for researchers and drug development professionals working to advance precision medicine in this complex disease.

Heritability Estimation Methods

Traditional Familial and Twin Studies

Traditional approaches to estimating endometriosis heritability have relied on familial aggregation and twin studies, which provide the foundation for understanding the disease's genetic component:

  • Twin Studies: The classic twin study design comparing concordance rates between monozygotic and dizygotic twins has provided fundamental heritability estimates ranging from 47% to 51% [1] [2]. These studies establish that genetic factors explain approximately half of the variation in endometriosis risk within populations.

  • Familial Risk Patterns: First-degree relatives of affected women have a 7- to 10-fold increased risk of developing endometriosis compared to the general population [1]. This increased familial risk provides further evidence for a significant genetic component in disease susceptibility.

Genomic Approaches

Advanced genomic methodologies have refined heritability estimation and enabled dissection of specific genetic contributions:

  • Genome-Wide Complex Trait Analysis (GCTA): This method uses genome-wide SNP data to estimate the proportion of phenotypic variance explained by all common SNPs. For endometriosis, SNP-based heritability estimates are approximately 26%, indicating that common genetic variants account for about half of the overall heritability [4].

  • Omics Residual Maximum Likelihood (OREML): This approach quantifies the variance captured by different relationship matrices. Analyses using OREML have demonstrated that DNA methylation profiles in endometrial tissue capture 19.58% of variance in endometriosis status, while common genetic variants capture 28.83% [5]. When both are included in the model, DNAm accounts for 12.18% of variance independent of genetics [5].

Table 1: Heritability Estimates from Different Methodological Approaches

Methodology Heritability Estimate Sample Characteristics Reference
Twin Studies 47-51% Australian twin cohort [1] [2]
SNP-based Heritability ~26% GWAS meta-analysis (60,674 cases) [4]
DNA Methylation Capture 19.58% Endometrial tissue (908 samples) [5]
Common Genetic Variants 28.83% Endometrial tissue (908 samples) [5]

Polygenic Risk Score Performance Across Subphenotypes

PRS Development and Validation

Polygenic risk scores aggregate the effects of multiple genetic variants to quantify individual disease susceptibility. The development of PRS for endometriosis involves specific methodological considerations:

  • SNP Selection and Weighting: PRS typically incorporates genome-wide significant variants from large-scale GWAS. A 14-SNP PRS derived from a meta-analysis of 17,045 cases and 191,596 controls has been validated across multiple cohorts [6]. More recent approaches utilize Bayesian methods (SBayesR) for effect size adjustment, excluding the MHC region to avoid spurious associations [2].

  • Performance Metrics: In surgically confirmed cases, the 14-SNP PRS demonstrated an odds ratio (OR) of 1.59 per standard deviation increase (p = 2.57×10⁻⁷) [6]. When validated in the UK Biobank, the same PRS showed an OR of 1.28 (p < 2.2×10⁻¹⁶) [6]. The discriminative accuracy, while statistically significant, remains insufficient for standalone clinical utility, highlighting the need for integration with other biomarkers.

Subphenotype-Specific Performance

Endometriosis encompasses diverse subphenotypes with distinct genetic architectures:

  • Anatomic Subtypes: PRS performance varies across endometriosis locations. The strongest associations are observed for ovarian endometriosis (OR = 1.72, p = 6.7×10⁻⁵) and infiltrating disease (OR = 1.66, p = 2.7×10⁻⁹), compared to peritoneal endometriosis (OR = 1.51, p = 2.6×10⁻³) [6]. This differential performance indicates subtype-specific genetic architectures.

  • Disease Stages: Genetic correlation analyses reveal that advanced-stage endometriosis has a stronger genetic component than minimal/mild disease [4]. Notably, ovarian endometriosis demonstrates a different genetic basis compared to superficial peritoneal disease [4].

  • Comorbidity Patterns: PRS phenome-wide association studies (PheWAS) reveal shared genetic architecture between endometriosis and other pain conditions, including migraine, back pain, and multi-site pain [2] [4]. This suggests that genetic factors contribute to the central sensitization observed in chronic pain patients with endometriosis.

Table 2: Polygenic Risk Score Performance Across Endometriosis Subphenotypes

Subphenotype Odds Ratio per SD PRS P-value Cohort
Overall Endometriosis 1.59 2.57×10⁻⁷ Surgically confirmed cohort
Ovarian Endometriosis 1.72 6.7×10⁻⁵ Combined Danish cohorts
Infiltrating Endometriosis 1.66 2.7×10⁻⁹ Combined Danish cohorts
Peritoneal Endometriosis 1.51 2.6×10⁻³ Combined Danish cohorts
UK Biobank Validation 1.28 <2.2×10⁻¹⁶ UK Biobank (2,967 cases)
Integration with Epigenetic Markers

The combination of polygenic risk scores with epigenetic markers enhances predictive power:

  • Methylation Risk Scores (MRS): MRS developed from endometrial methylation data can achieve an area under the receiver-operating characteristic curve (AUC) of 0.6748 [5]. The combination of MRS and PRS consistently outperforms PRS alone in classification accuracy [5].

  • Tissue-Specific Effects: Analyses of endometriosis-associated genetic variants acting as expression quantitative trait loci (eQTLs) reveal tissue-specific regulatory effects [7]. In reproductive tissues (uterus, ovary), regulated genes are enriched for hormonal response, tissue remodeling, and adhesion pathways, whereas in intestinal tissues and blood, immune and epithelial signaling genes predominate [7].

Molecular Mechanisms and Signaling Pathways

Key Signaling Pathways

Genetic and epigenetic studies have implicated several key molecular pathways in endometriosis pathogenesis:

  • Hormonal Signaling Pathways: Genes involved in estrogen biosynthesis (CYP19A1) and signaling (ESR1, GREB1) show strong associations with endometriosis [8] [3]. Progesterone resistance is mediated through reduced progesterone receptor expression and disrupted signaling pathways [3].

  • Inflammatory and Immune Pathways: Regulatory variants in immune-related genes (IL-6, MICB) demonstrate significant enrichment in endometriosis cohorts [9]. These variants modulate inflammatory responses and may contribute to the immune dysregulation characteristic of endometriosis.

  • Developmental Pathways: WNT4, a critical gene in reproductive tract development, contains polymorphisms associated with increased endometriosis risk [8]. This suggests that disruptions in developmental programming may contribute to disease susceptibility.

The following diagram illustrates the integration of genetic and environmental factors in endometriosis pathogenesis through these key signaling pathways:

endometriosis_pathways cluster_pathways Molecular Pathways in Endometriosis Genetic_predisposition Genetic Predisposition (PRS, Heritability ~50%) Hormonal_signaling Hormonal Signaling (ESR1, GREB1, WNT4, CYP19A1) Genetic_predisposition->Hormonal_signaling Immune_dysregulation Immune Dysregulation (IL-6, MICB) Genetic_predisposition->Immune_dysregulation Tissue_remodeling Tissue Remodeling & Adhesion Genetic_predisposition->Tissue_remodeling Pain_pathways Pain Pathways (Shared genetics with migraine, back pain) Genetic_predisposition->Pain_pathways Environmental_factors Environmental Factors (EDCs, pollutants) Environmental_factors->Hormonal_signaling Environmental_factors->Immune_dysregulation Disease_subphenotypes Disease Subphenotypes (Ovarian, Infiltrating, Peritoneal) Hormonal_signaling->Disease_subphenotypes Immune_dysregulation->Disease_subphenotypes Tissue_remodeling->Disease_subphenotypes Pain_pathways->Disease_subphenotypes

Gene-Environment Interplay

Environmental factors modulate genetic risk through epigenetic mechanisms:

  • Endocrine-Disrupting Chemicals (EDCs): Exposure to EDCs can alter gene expression via DNA methylation changes in endometriosis-associated genes [9]. Regulatory variants in genes like IL-6 and CNR1 overlap with EDC-responsive regions, suggesting gene-environment interactions exacerbate disease risk [9].

  • Ancient Regulatory Variants: Some endometriosis-associated regulatory variants, including Neandertal-derived methylation sites in IL-6, show significant enrichment in modern patients [9]. These ancient variants may modulate immune and inflammatory responses that interact with contemporary environmental exposures.

Experimental Protocols and Research Workflows

PRS-PheWAS Methodology

The polygenic risk score phenome-wide association study (PRS-PheWAS) approach enables comprehensive assessment of pleiotropic effects:

  • Cohort Definition: The workflow involves curating unrelated European individuals from biobanks (e.g., 159,855 males and 188,221 females from UK Biobank), with sensitivity analyses in females without endometriosis diagnoses (n = 182,789) [2].

  • PRS Calculation: SBayesR weightings are applied to adjusted GWAS summary statistics, excluding the MHC region. PRS is calculated using plink1.9's score function and converted to z-scores for analysis [2].

  • Association Testing: Associations with phecodes, blood/urine biomarkers, and reproductive factors are tested using logistic regression (for phecodes) or linear regression (for biomarkers), adjusting for age and the first 10 genetic principal components [2].

The following workflow diagram illustrates the key steps in PRS-PheWAS analysis:

prs_phewas cluster_cohorts Analysis Cohorts GWAS_summary GWAS Summary Statistics (Meta-analysis of multiple cohorts) SBayesR SBayesR Bayesian Fine-mapping (Excluding MHC region) GWAS_summary->SBayesR PRS_calculation PRS Calculation (plink1.9 score function with z-score conversion) SBayesR->PRS_calculation Female_cohort Female Cohort (n = 188,221) PRS_calculation->Female_cohort Male_cohort Male Cohort (n = 159,855) PRS_calculation->Male_cohort Sensitivity_cohort Sensitivity Cohort (Females without diagnosis n = 182,789) PRS_calculation->Sensitivity_cohort Association_testing Association Testing (Logistic/linear regression adjusted for PCs and age) Female_cohort->Association_testing Male_cohort->Association_testing Sensitivity_cohort->Association_testing Phenotype_data Phenotype Data (ICD10 codes, biomarkers, reproductive factors) Phenotype_data->Association_testing Pleiotropic_effects Pleiotropic Effects Identification (e.g., testosterone association) Association_testing->Pleiotropic_effects

Methylation Risk Score Modeling

Methylation risk score development for endometriosis involves specific analytical steps:

  • Quality Control and Covariate Adjustment: Following methylation quality control, samples are assessed for technical covariates (age, processing institution, genetic ancestry) significantly associated with DNA methylation principal components [5]. Surrogate variable analysis removes batch effects and hidden sources of variation.

  • MRS Construction: MRS is developed using multiple models (elastic net, lasso, ridge regression) with performance evaluation through training/test set splits based on independent cohort institutions [5]. The best-performing MRS incorporates 746 DNAm sites.

  • Variance Partitioning: Omics residual maximum likelihood (OREML) analyses quantify the proportion of variance in endometriosis status captured by DNA methylation independent of common genetic variants [5].

The Scientist's Toolkit: Research Reagent Solutions

Table 3: Essential Research Reagents and Resources for Endometriosis Genetic Studies

Resource/Reagent Specification Research Application Reference
GWAS Summary Statistics Sapkota et al. 2017 meta-analysis (14,926 cases; 189,715 controls) combined with FinnGen Release 8 (13,456 cases, 100,663 controls) PRS weight derivation using SBayesR [2]
GTEx v8 Database Tissue-specific eQTL data from 53 non-diseased human tissues, including uterus, ovary, vagina, colon, ileum, and whole blood Functional mapping of endometriosis-associated variants [7]
DNA Methylation Array Genome-wide methylation profiling of endometrial tissue (318 controls, 590 cases) Methylation risk score development and epigenetic quantitative trait loci detection [5]
UK Biobank Comprehensive health records and genetic data of ~500,000 individuals, including ICD10 diagnoses, biomarker data, and reproductive histories PRS-PheWAS analysis and validation across multiple subphenotypes [2]
Genomics England 100,000 Genomes Whole-genome sequencing data from rare disease programs, including endometriosis patients Identification of regulatory variants and ancient introgressed alleles [9]

The integration of heritability estimates with polygenic risk scoring across endometriosis subphenotypes provides powerful insights for advancing precision medicine approaches. While significant progress has been made in identifying genetic risk variants, several challenges and opportunities remain:

  • Improved Subphenotype Stratification: Future research should focus on refining genetic risk prediction for specific endometriosis subtypes, particularly deep infiltrating and ovarian endometriosis, which demonstrate distinct genetic architectures [6] [4].

  • Multi-omics Integration: Combining PRS with epigenetic markers, such as methylation risk scores, consistently enhances predictive power over genetic information alone [5]. The development of integrated risk models that incorporate genetic, epigenetic, and environmental factors will be essential for improving early detection and risk stratification.

  • Functional Validation: Advanced techniques including CRISPR-based screening and organoid models will be critical for validating the functional impact of identified genetic variants and their role in disease pathogenesis [9] [3].

  • Diverse Population Applications: Current genetic studies predominantly focus on European and East Asian populations. Expanding research to include diverse ancestral backgrounds is essential for ensuring equitable application of genetic discoveries across all populations.

The field of endometriosis genetics has evolved from initial heritability estimates to sophisticated polygenic risk assessment across subphenotypes. By leveraging these advances, researchers and drug development professionals can accelerate the development of personalized diagnostic and therapeutic strategies for this complex condition.

Endometriosis, a chronic and inflammatory gynecological condition affecting approximately 10% of reproductive-aged women, represents a substantial healthcare challenge characterized by diagnostic delays and complex etiology [10]. The disease demonstrates a significant heritable component, estimated at 47-52% from twin and family studies, prompting extensive research to uncover its genetic underpinnings [11]. Genome-wide association studies (GWAS) have emerged as a powerful hypothesis-free approach to identify common genetic variants underlying this complex condition. To date, multiple GWAS and meta-analyses across diverse populations have identified several genome-wide significant loci, with WNT4, VEZT, and GREB1 representing consistently replicated regions [11]. These discoveries provide crucial insights into biological pathways dysregulated in endometriosis and form the foundation for developing polygenic risk scores (PRS) aimed at predicting individual disease risk and understanding its clinical subphenotypes.

The translation of GWAS findings into clinically useful tools requires careful consideration of effect sizes, population-specific frequencies, and functional mechanisms of identified variants. This technical review comprehensively examines the key endometriosis susceptibility loci, their biological mechanisms, and their collective contribution to polygenic risk prediction across disease subphenotypes. We integrate fine-mapping data, functional genomic evidence, and multi-ancestry validation to provide researchers and drug development professionals with a rigorous resource for understanding endometriosis genetics and its applications in stratified medicine.

Key Endometriosis Susceptibility Loci and Their Functional Mechanisms

Established GWAS Loci and Their Characteristics

Table 1: Key Endometriosis Susceptibility Loci Identified through GWAS

Locus/ Gene Lead SNP Risk Allele (Frequency) Odds Ratio (95% CI) P-value Primary Function
WNT4 rs7521902 A (0.49) 1.20 (1.14-1.26) 1.8×10⁻¹⁵ Reproductive tract development, estrogen response
VEZT rs10859871 C (0.74) 1.19 (1.14-1.25) 4.7×10⁻¹⁵ Cell adhesion, tumor suppressor
GREB1 rs13394619 NA NA 4.5×10⁻⁸ Estrogen-regulated gene growth, steroid receptor cofactor
CDKN2B-AS1 rs1537377 C (0.57) 1.17 (1.11-1.23) 1.5×10⁻⁸ Cell cycle regulation
FN1 rs1250248 A (0.18) 1.87 (1.34-2.61) 0.002 Extracellular matrix formation
7p15.2 rs12700667 A (0.79) 1.22 (1.13-1.32) 1.6×10⁻⁹ Intergenic regulatory region

Table 2: Sub-phenotype Associations for Key Endometriosis Loci

Locus Stage I/II Association Stage III/IV Association Ovarian Endometriosis Infiltrating Endometriosis
WNT4 Moderate Stronger Yes Yes
GREB1 Limited Stronger Yes Not reported
FN1 Significant (P=0.0066) Less pronounced Not reported Not reported
VEZT Moderate Stronger Yes Not reported

Meta-analyses of GWAS datasets encompassing over 11,500 cases and 32,600 controls have confirmed six loci with genome-wide significance (P < 5 × 10⁻⁸), with most showing consistent directional effects across populations of European and Japanese ancestry [11]. The WNT4 locus demonstrates particularly strong and consistent associations, with fine-mapping studies identifying rs3820282 as a likely causal variant that introduces a high-affinity estrogen receptor alpha-binding site, dramatically increasing WNT4 transcription in endometrial stroma following estrogen stimulation [12] [13]. This mechanism represents a classic example of how non-coding regulatory variants can influence disease susceptibility through altered hormone response.

The GREB1 locus exhibits equally sophisticated regulation, functioning as a steroid receptor cofactor in a feedforward mechanism that governs differential hormone action in endometrial function versus endometriosis pathology [14]. In normal endometrial physiology, GREB1 controls progesterone responses in uterine stroma, affecting receptivity and decidualization, while in endometriosis, estrogen-induced GREB1 modulates estrogen-dependent gene expression to promote lesion growth [14]. This cell-type and context-specific functionality highlights the complexity of translating GWAS signals into mechanistic understanding.

Functional Characterization of Key Loci

Figure 1: Molecular Mechanisms of WNT4 and GREB1 in Endometriosis Pathogenesis

Functional genomic approaches have been essential for moving from statistical associations to biological mechanisms. For the WNT4 locus, CRISPR/Cas9-generated mouse models demonstrate that the human risk allele increases uterine Wnt4 transcription in proestrus and estrus by 1.48-3.27 log2 fold, specifically in endometrial stromal fibroblasts underlying the luminal epithelium [12]. This spatiotemporal specificity highlights the importance of the uterine microenvironment in mediating genetic risk. RNAscope in situ hybridization confirms this stromal-specific upregulation, which subsequently downregulates epithelial proliferation and induces progesterone-regulated pro-implantation genes [12].

For the GREB1 locus, chromatin immunoprecipitation sequencing (ChIP-seq) in human endometrial stromal cells (HESCs) reveals that GREB1 binds to over 2,000 genomic regions, approximately 50% of which are co-occupied by the progesterone receptor [14]. GREB1 knockdown impairs progesterone-induced FOXO1 expression and reduces PR occupancy on target genes, demonstrating its role as a essential PR cofactor [14]. This molecular function explains why GREB1 loss severely compromises female fertility in mouse models through impaired uterine responses to steroid hormones.

Polygenic Risk Score Performance Across Endometriosis Subphenotypes

PRS Construction and Validation

Table 3: Polygenic Risk Score Performance Across Endometriosis Studies

Study Population Sample Size (Cases/Controls) Number of SNPs in PRS Odds Ratio per SD (95% CI) Variance Explained (R²)
Surgically Confirmed (Danish) 249/348 14 1.59 (1.32-1.91) Not reported
Danish Twin Registry 140/316 14 1.50 (1.22-1.84) Not reported
UK Biobank 2,967/256,222 14 1.28 (1.23-1.33) Not reported
Combined Danish 389/664 14 1.57 (1.37-1.80) Not reported
Greek Population 166/168 2 (FN1, GREB1) 1.87 (FN1 rs1250248) Not reported

Polygenic risk scores for endometriosis aggregate the effects of multiple susceptibility variants into a single predictive metric. Most studies have utilized 14 genome-wide significant SNPs identified from large meta-analyses, achieving statistically significant but clinically modest risk discrimination [6]. In Danish populations, each standard deviation increase in PRS was associated with 1.57-fold increased odds of endometriosis (P = 2.5×10⁻¹¹), with similar effects across major subtypes: ovarian (OR=1.72), infiltrating (OR=1.66), and peritoneal (OR=1.51) [6]. Notably, the same PRS was not associated with adenomyosis, suggesting distinct genetic architectures for these related gynecological conditions [6].

The discriminative accuracy of current endometriosis PRS remains insufficient for standalone clinical utility, with one study finding inverse associations between PRS and disease spread that lost significance when calculated as p-for-trend [10]. This indicates that current PRS constructions may not adequately capture the genetic basis of severe disease presentations. However, PRS consistently demonstrate association with endometriosis risk irrespective of clinical diagnosis, suggesting they measure genetic liability beyond manifested disease [15].

Methodological Considerations for PRS Implementation

Robust PRS analysis requires stringent quality control procedures for both base GWAS data and target genotypes. Key considerations include:

  • Heritability check: Base data should have chip-heritability (h²snp) > 0.05 to avoid misleading conclusions [16]
  • Standard GWAS QC: Genotyping rate > 0.99, sample missingness < 0.02, MAF > 1%, imputation info score > 0.8 [16]
  • Ancestry considerations: Population stratification must be controlled via principal components or ancestry-matched reference panels [17]
  • Sample size: Association testing requires minimum 100 individuals (effective sample size) to avoid underpowered results [16]

Advanced PRS methods that incorporate multi-ancestry data and functional annotations show promise for improving predictive performance. For coronary artery disease, such approaches have increased the proportion of individuals identified with 3-fold increased risk from 8.3% to 20.0% of the population [17]. Similar methodologies applied to endometriosis could substantially enhance risk stratification, particularly if integrated with clinical risk factors and biomarkers.

Experimental Protocols for Functional Validation

In Vivo Modeling of Endometriosis Risk Variants

CRISPR/Cas9-mediated genome editing provides a powerful approach for validating human genetic associations in mouse models. The protocol for modeling the WNT4 rs3820282 variant involves:

  • Design of guide RNAs targeting the mouse genomic region homologous to human rs3820282, with 98% sequence conservation between species [12]
  • Microinjection of CRISPR components into fertilized mouse oocytes to introduce the precise nucleotide substitution
  • Genotype confirmation of live-born pups by PCR and Sanger sequencing across modified regions
  • Phenotypic characterization across estrous cycle stages, focusing on uterine Wnt4 expression patterns via qPCR and in situ hybridization [12]

This approach confirmed that the human risk allele significantly upregulates uterine Wnt4 expression specifically during proestrus and estrus, mirroring the estrogen-responsive regulation suspected in humans. Two independent knock-in lines showed consistent phenotypes, strengthening evidence for causality [12].

In Vitro Mechanistic Studies in Primary Human Cells

Primary human endometrial stromal cell (HESC) models enable detailed molecular characterization of risk variants:

  • Isolation and culture of primary stromal cells from endometrial biopsies
  • Hormone treatment with estradiol (E2) or progestin (MPA) to simulate hormonal milieus
  • Gene knockdown using siRNA targeting candidate genes (e.g., GREB1) to assess functional requirements
  • Chromatin immunoprecipitation (ChIP) to map transcription factor binding and histone modifications
  • Co-immunoprecipitation assays to detect protein-protein interactions (e.g., GREB1-PR complex formation) [14]

Application of this pipeline demonstrated that GREB1 physically interacts with progesterone receptor following progestin treatment and is required for optimal PR occupancy at key target genes like FOXO1 [14]. Cut&Run sequencing further defined the GREB1 cistrome, revealing extensive overlap with PR binding sites in endometrial stroma.

Essential Research Reagents and Tools

Table 4: Key Research Reagents for Endometriosis Genetic Studies

Reagent/Tool Specific Application Function/Utility Example Use Case
CRISPR/Cas9 Genome editing Introduction of precise human risk variants into mouse genome WNT4 rs3820282 functional validation [12]
Primary HESCs In vitro modeling Patient-derived stromal cells for hormone response studies GREB1-PR interaction analysis [14]
RNAscope Spatial transcriptomics Localization of gene expression in tissue context WNT4 stromal-specific expression [12]
ChIP/Cut&Run Epigenomic profiling Mapping transcription factor binding and chromatin states GREB1 and PR cistrome definition [14]
TaqMan assays Genotyping Accurate SNP allele discrimination Case-control association studies [18]
Illumina arrays Genotyping Genome-wide variant detection PRS calculation and validation [10]

Discussion and Future Directions

The integration of GWAS discoveries with functional genomics has illuminated key pathways in endometriosis pathogenesis, particularly those involving steroid hormone response, developmental patterning, and cellular growth regulation. The well-established loci near WNT4, VEZT, and GREB1 represent the tip of the genetic iceberg, with emerging evidence suggesting numerous additional loci await discovery through expanded sample sizes and diverse population inclusion.

Future research priorities should include:

  • Multi-ancestry GWAS meta-analyses to improve variant discovery and PRS portability across populations
  • Single-cell epigenomic profiling of endometriotic lesions to define cell-type-specific regulatory mechanisms
  • Integration of functional genomics data to prioritize causal variants and genes at association signals
  • Development of subtype-specific PRS that can predict disease severity and progression
  • Mendelian randomization studies to identify causal risk factors and potential therapeutic targets

The observation that lower testosterone levels may be causal for endometriosis highlights how genetic studies can reveal unexpected biological insights with translational potential [15]. As GWAS sample sizes expand and functional characterization methods advance, genetic discoveries will increasingly inform diagnostic stratification, prognostic assessment, and targeted therapeutic development for this complex condition.

Figure 2: Research Pipeline from Genetic Discovery to Clinical Translation in Endometriosis

Endometriosis represents a common inflammatory gynecological disorder affecting approximately 10% of reproductive-aged women worldwide, characterized by the presence of endometrial-like tissue outside the uterine cavity [19] [20]. This complex disease manifests through distinct subphenotypes that demonstrate unique clinical and molecular characteristics: superficial peritoneal endometriosis (PE), ovarian endometrioma (OE), and deep infiltrating endometriosis (DIE) [21] [22]. These subphenotypes are increasingly recognized as clinicopathologically distinct entities with potentially different underlying pathophysiological mechanisms [19]. Within the context of polygenic risk score (PRS) performance research, understanding these subphenotypes becomes paramount, as genetic susceptibility may vary across different manifestations of the disease. The traditional revised American Society for Reproductive Medicine (rASRM) classification system stages endometriosis from minimal (Stage I) to severe (Stage IV) based on surgical findings but correlates poorly with pain symptoms and treatment outcomes [22] [20]. This limitation has driven research toward molecular stratification approaches that may better reflect disease heterogeneity and inform personalized therapeutic strategies.

The recognition of distinct subphenotypes has emerged from observations that lesions at different anatomical locations exhibit varied clinical behavior, histopathological features, and molecular profiles [19]. Superficial peritoneal implants represent the earliest and most common form, while ovarian endometriomas form cysts within the ovaries, and deep infiltrating endometriosis penetrates into retroperitoneal structures [22]. This subphenotype framework provides a critical foundation for investigating the genetic architecture of endometriosis, particularly as it relates to PRS performance across different disease manifestations. Research indicates that these subphenotypes may represent distinct molecular entities rather than a disease continuum, with implications for both biomarker development and therapeutic targeting [21] [19].

Molecular Characterization of Endometriosis Subphenotypes

Cytokine Signatures in Peritoneal Fluid

The peritoneal fluid microenvironment reflects the inflammatory milieu associated with endometriosis and reveals distinct molecular profiles across subphenotypes. Multiplex immunoassays of 48 cytokines in peritoneal fluid from laparoscopically-confirmed cases have identified unique cytokine signatures that distinguish endometriosis subphenotypes with greater accuracy than traditional staging systems (p < 0.0001) [21] [19].

Table 1: Distinct Cytokine Signatures Differentiating Endometriosis Subphenotypes

Comparison Signature Size Key Cytokines Pathway Associations
PE vs. OE 6 cytokines IL-1α, IL-7, IL-8, MCP-1, MIF, TNF-α Angiogenesis, immune cell recruitment
OE vs. DIE 7 cytokines IL-1α, IL-1RA, IL-8, IL-12p40, IL-12p70, IL-16, TNF-α Inflammation, cell proliferation
PE vs. DIE 6 cytokines IL-8, IL-12p70, IL-16, MCP-1, MIF, TNF-α ERK1/2, AKT, MAPK, STAT4 signaling

Pathway analysis of these cytokine signatures has revealed associations with critical signaling pathways including ERK1/2, AKT, MAPK, and STAT4, which are linked to angiogenesis, cell proliferation, migration, and inflammation in the subphenotypes [19]. The clear separation of subphenotypes based on peritoneal fluid cytokines (cumulative principal component scores: 77% to 92%) significantly outperforms separation based on disease stages (43% to 59%), highlighting the molecular distinctness of these clinical entities [21]. These findings suggest that the subphenotypes may represent different biological processes and inflammatory microenvironments rather than a continuum of disease severity.

Experimental Protocol: Cytokine Profiling

The identification of subphenotype-specific cytokine signatures follows a standardized experimental workflow:

Sample Collection: Peritoneal fluid (PF) is collected during laparoscopic surgery from women with and without endometriosis. Participants are stratified according to subphenotype: PE, OE, or DIE, with confirmation by histological examination [19].

Cytokine Analysis: PF samples are analyzed using validated multiplex immunoassays (e.g., Luminex platform) capable of simultaneously quantifying 48 cytokines, chemokines, and growth factors. The assay includes technical replicates and appropriate controls to ensure reproducibility [21] [19].

Data Processing: Raw fluorescence data is converted to concentration values using standard curves for each analyte. Values below detection limits are handled using appropriate statistical methods, and data normalization is performed to account for technical variability.

Statistical Analysis: Partial least squares regression (PLSR) is employed to identify cytokine signatures that optimally distinguish between subphenotypes. Model performance is evaluated using cumulative principal component scores, with significance testing via permutation tests [19].

Pathway Analysis: Bioinformatic tools (e.g., Ingenuity Pathway Analysis, DAVID) are used to map differentially expressed cytokines to biological pathways and networks, revealing subphenotype-specific molecular mechanisms [21].

G Cytokine Signature Analysis Workflow SampleCollection Sample Collection (Laparoscopic surgery) CytokineAnalysis Multiplex Immunoassay (48 cytokines) SampleCollection->CytokineAnalysis DataProcessing Data Processing & Normalization CytokineAnalysis->DataProcessing PLSRAnalysis PLSR Modeling (Subphenotype separation) DataProcessing->PLSRAnalysis SignatureID Signature Identification (Discriminatory cytokines) PLSRAnalysis->SignatureID PathwayAnalysis Pathway Analysis (Biological processes) SignatureID->PathwayAnalysis

Genetic Architecture Across Endometriosis Subphenotypes

Polygenic Risk Score Performance

Polygenic risk scores aggregate the effects of multiple genetic variants to quantify an individual's genetic susceptibility to a disease. For endometriosis, PRS has demonstrated utility across all major subphenotypes, though with varying effect sizes. Research using a 14-variant PRS derived from genome-wide association studies (GWAS) has revealed that genetic risk factors contribute to all types of endometriosis rather than specific locations [6].

Table 2: Polygenic Risk Score Performance Across Endometriosis Subphenotypes

Subphenotype Odds Ratio (OR) P-value Cohort Sample Size
Overall Endometriosis 1.57 2.5 × 10⁻¹¹ Danish Combined 389 cases, 664 controls
Ovarian Endometrioma 1.72 6.7 × 10⁻⁵ Danish Combined 75 cases
Deep Infiltrating 1.66 2.7 × 10⁻⁹ Danish Combined 210 cases
Peritoneal 1.51 2.6 × 10⁻³ Danish Combined 60 cases
Overall Endometriosis 1.28 < 2.2 × 10⁻¹⁶ UK Biobank 2,967 cases, 256,222 controls

Notably, the PRS was not associated with adenomyosis (OR = 1.07, p = 0.71), suggesting that while adenomyosis shares histological features with endometriosis, it is not driven by the same common genetic risk variants [6]. This specificity supports the biological distinction between these conditions and highlights the potential of PRS for differential risk prediction. The somewhat lower odds ratio in the UK Biobank (1.28) likely reflects differences in case ascertainment, as this cohort relied on ICD-10 codes from hospital records rather than surgical confirmation [6].

Pleiotropic Effects and Testosterone Association

PRS phenome-wide association studies (PheWAS) have revealed intriguing pleiotropic effects of endometriosis genetic risk, including an association with lower testosterone levels [2]. This relationship was consistent across sexes, suggesting fundamental biological connections rather than consequences of the disease itself. Mendelian randomization analysis further supported a potential causal effect of lower testosterone on endometriosis risk, with implications for understanding disease mechanisms [2].

The genetic correlation between endometriosis and testosterone levels highlights the potential for endocrine pathways in disease pathogenesis. Lower testosterone may create a permissive environment for the establishment or growth of ectopic lesions, possibly through effects on inflammation, immune function, or cellular proliferation [2]. These findings align with clinical observations of altered hormonal profiles in endometriosis patients and suggest that genetic risk may operate partially through endocrine mechanisms.

G PRS Performance Across Subphenotypes cluster_0 Subphenotype Analysis GWASMeta GWAS Meta-Analysis (17,045 cases, 191,596 controls) PRSDevelopment PRS Derivation (14 SNP weighting) GWASMeta->PRSDevelopment CohortTesting Cohort Validation (Surgically confirmed cases) PRSDevelopment->CohortTesting PeritonealPRS Peritoneal Endometriosis OR = 1.51 CohortTesting->PeritonealPRS OvarianPRS Ovarian Endometrioma OR = 1.72 CohortTesting->OvarianPRS DeepPRS Deep Infiltrating OR = 1.66 CohortTesting->DeepPRS PleiotropicEffects Pleiotropic Effects (Lower testosterone) CohortTesting->PleiotropicEffects

Signaling Pathways in Endometriosis Subphenotypes

Pathway analysis of molecular data from endometriosis subphenotypes has revealed activation of distinct signaling cascades that may drive disease pathogenesis and progression. These pathways represent potential targets for subphenotype-specific therapeutic interventions and provide mechanistic insights into the observed clinical differences.

The ERK1/2, AKT, and MAPK pathways emerge as central regulators across subphenotypes, with variations in their activation patterns and downstream effects [19]. These pathways integrate signals from cytokines, growth factors, and hormonal stimuli to control critical cellular processes including proliferation, survival, and invasion. In deep infiltrating endometriosis, which demonstrates the most aggressive behavior, these pathways show heightened activation, potentially explaining the invasive characteristics of this subphenotype [21].

STAT4 signaling, particularly prominent in peritoneal and deep infiltrating endometriosis, links inflammatory cytokines to transcriptional programs that may perpetuate the disease microenvironment [19]. This pathway plays important roles in immune cell differentiation and function, suggesting immune involvement in subphenotype determination. Additionally, angiogenesis-related pathways driven by VEGF-A and other factors appear differentially activated across subphenotypes, reflecting variations in vascularization requirements for different lesion environments [19].

G Key Signaling Pathways by Subphenotype cluster_0 Shared Signaling Pathways cluster_1 Subphenotype-Specific Outcomes Cytokines Subphenotype-Specific Cytokine Milieu MAPK MAPK Pathway Cytokines->MAPK AKT AKT Pathway Cytokines->AKT ERK ERK1/2 Pathway Cytokines->ERK STAT4 STAT4 Pathway Cytokines->STAT4 PE Peritoneal: Inflammation & Early Angiogenesis MAPK->PE OE Ovarian: Cyst Formation & Tissue Remodeling MAPK->OE DIE Deep Infiltrating: Tissue Invasion & Fibrosis MAPK->DIE AKT->PE AKT->OE AKT->DIE ERK->DIE STAT4->PE STAT4->DIE

Research Reagent Solutions for Endometriosis Subphenotype Studies

Table 3: Essential Research Reagents for Endometriosis Subphenotype Investigation

Reagent Category Specific Examples Research Application Considerations
Multiplex Immunoassay Kits Luminex 48-plex cytokine panels Simultaneous quantification of inflammatory mediators in peritoneal fluid Validate detection limits for low-abundance analytes
DNA Genotyping Arrays Illumina Global Screening Array, Infinium Asian Screening Array Genome-wide SNP data for PRS calculation Ensure coverage of endometriosis-associated loci
RNA Extraction Kits Qiagen RNeasy, TRIzol-based methods Gene expression analysis from lesion tissue Address challenges of fibrotic tissue in DIE samples
Pathway Inhibitors ERK1/2 inhibitors (SCH772984), AKT inhibitors (MK-2206) Functional validation of signaling pathways in model systems Test specificity to avoid off-target effects
Antibody Panels CD45 (immune cells), CD31 (endothelial cells), cytokeratin (epithelial cells) Immunophenotyping of lesion microenvironment Optimize for formalin-fixed paraffin-embedded tissue
Cell Culture Media Specific formulations for endometrial stromal cells In vitro models of lesion establishment and growth Consider hormone supplementation to mimic menstrual cycle

The selection of appropriate research reagents is critical for investigating the molecular distinctions between endometriosis subphenotypes. Multiplex immunoassay platforms enable comprehensive cytokine profiling that has been instrumental in identifying subphenotype-specific inflammatory signatures [21] [19]. For genetic studies, high-density genotyping arrays provide the data necessary for polygenic risk score calculation, with careful consideration of ancestry-matched reference panels to ensure accurate risk prediction across diverse populations [6] [23].

Functional studies require well-validated pathway inhibitors and cell culture models that recapitulate key aspects of each subphenotype. For instance, deep infiltrating endometriosis models should prioritize invasive capacity, while ovarian endometrioma models might focus on cyst formation mechanisms [22]. Antibody panels for tissue staining must be optimized for the unique microenvironment of endometriosis lesions, which often contain mixed cell populations and substantial fibrotic components [22] [20].

The delineation of endometriosis subphenotypes—ovarian, peritoneal, and deep infiltrating—represents a crucial advance in understanding this heterogeneous condition. Molecular evidence increasingly supports the concept that these are distinct entities with unique cytokine signatures, signaling pathway activation, and partially non-overlapping genetic architectures [21] [6] [19]. The performance of polygenic risk scores across all subphenotypes indicates shared genetic susceptibility, while variation in effect sizes suggests additional subphenotype-specific genetic factors yet to be fully characterized [6].

Future research directions should include larger subphenotype-stratified GWAS to identify genetic variants with subtype-specific effects, potentially revealing biological mechanisms unique to each form of the disease. Integration of multi-omics approaches—genomics, transcriptomics, proteomics—will provide a more comprehensive understanding of the molecular networks underlying each subphenotype [23]. Additionally, development of refined PRS models that incorporate subphenotype information could enhance predictive accuracy and clinical utility.

From a translational perspective, these findings highlight the potential for subphenotype-specific therapeutic approaches targeting the distinct signaling pathways and inflammatory environments characteristic of each form [19]. The association between endometriosis genetic risk and testosterone levels further suggests endocrine pathways that might be modulated for prevention or treatment [2]. As our understanding of endometriosis subphenotypes continues to evolve, so too will opportunities for personalized risk assessment and targeted interventions aligned with the specific molecular drivers of each patient's disease.

Endometriosis, a chronic systemic disease characterized by the presence of endometrial-like tissue outside the uterine cavity, affects approximately 10% of women of reproductive age worldwide [22] [3]. The diagnostic pathway for this condition remains challenging, with an average delay of 7 to 12 years from symptom onset to definitive surgical diagnosis [22] [3]. This condition exhibits a substantial genetic component, with heritability estimates ranging from 47% to 51% [2]. Beyond its flagship symptoms of pelvic pain and infertility, endometriosis frequently co-occurs with multiple other conditions, including pain disorders, osteoarthritis, and various autoimmune diseases [2]. Understanding the shared genetic architecture between endometriosis and these comorbid conditions provides not only insights into underlying biological mechanisms but also opportunities for improving polygenic risk score (PRS) performance across endometriosis subphenotypes.

The complex genetic landscape of endometriosis is characterized by polygenic inheritance, where numerous genetic variants collectively contribute to disease susceptibility. Recent genome-wide association studies (GWAS) have identified multiple risk loci associated with endometriosis, including genes such as WNT4, VEZT, and GREB1 [6] [3]. The aggregation of these susceptibility variants into polygenic risk scores offers a powerful approach to quantify genetic predisposition. However, the performance of endometriosis PRS varies across different disease subtypes and comorbid conditions, reflecting the underlying genetic heterogeneity [6]. This technical review examines the genetic sharing between endometriosis and its comorbid conditions, with particular emphasis on methodological approaches for investigating these relationships and their implications for refining PRS stratification in endometriosis research.

Genetic Architecture of Endometriosis and Shared Pathways

Established Genetic Risk Factors for Endometriosis

The genetic basis of endometriosis has been elucidated through large-scale GWAS meta-analyses, revealing significant associations across multiple genomic loci. The most recent and well-powered GWAS have identified 42 independent loci associated with endometriosis risk, which collectively explain up to 5.01% of disease variance [2]. These findings build upon earlier studies that initially identified 14 genome-wide significant single nucleotide polymorphisms (SNPs) from a meta-analysis comprising over 17,000 cases [6]. The identified loci implicate genes involved in sex hormone signaling (ESR1, GREB1, WNT4), developmental processes (HOXA10), and inflammatory pathways (IL1A) [6] [2] [3].

Table 1: Key Genetic Loci Associated with Endometriosis Risk

Genomic Locus Nearest Gene Putative Function Associated Endometriosis Subtypes
1p36.12 WNT4 Sex hormone signaling, development Ovarian, infiltrating [6]
2p25.1 GREB1 Estrogen regulation All subtypes [6]
12q21.2 VEZT Cell adhesion Ovarian, peritoneal [3]
6q25.1 ESR1 Estrogen receptor All subtypes [2]
7p15.2 HOXA10 Developmental processes Infiltrating [3]
2q13 IL1A Inflammatory response Peritoneal [3]

The performance of polygenic risk scores derived from these established loci varies across endometriosis subtypes. In a study evaluating a 14-SNP PRS, each standard deviation increase in PRS was associated with endometriosis overall (OR = 1.57, p = 2.5×10-11), with varying effect sizes across subtypes: ovarian (OR = 1.72, p = 6.7×10-5), infiltrating (OR = 1.66, p = 2.7×10-9), and peritoneal (OR = 1.51, p = 2.6×10-3) [6]. This differential performance across subtypes highlights the genetic heterogeneity within endometriosis and underscores the need for subtype-specific PRS optimization.

Hormonal Pathways: Testosterone as a Shared Genetic Factor

A key finding from recent PRS phenome-wide association studies (PheWAS) is the association between genetic liability to endometriosis and altered testosterone levels [2]. This relationship was identified through a multi-step analytical approach that integrated PRS-PheWAS with Mendelian randomization to infer causal directionality. The findings suggest that lower testosterone levels may be causal for both endometriosis and clear cell ovarian cancer, revealing a shared hormonal mechanism that extends beyond traditional estrogen-centric models of endometriosis pathophysiology [2].

The hormonal interplay involves multiple pathways, including altered steroidogenesis in endometrial stromal cells, estrogen-induced overexpression of nicotinamide N-methyltransferase (NNMT), and progesterone resistance mediated through dysregulated FKBP4 expression and microRNA-29c regulation [3]. These pathways not only contribute to endometriosis pathogenesis but also represent potential shared mechanisms with other hormone-sensitive conditions.

G A Genetic Risk Variants for Endometriosis B Altered Steroidogenesis A->B F Immune Dysregulation A->F C Decreased Testosterone Levels B->C D Estrogen Dominance B->D E Endometriosis Risk and Progression C->E H Autoimmune Comorbidities C->H D->E G Chronic Inflammation F->G G->E G->H

Diagram 1: Shared Genetic Pathways in Endometriosis and Comorbid Conditions. This diagram illustrates how genetic risk variants for endometriosis influence hormonal and immune pathways, contributing to both endometriosis and related autoimmune comorbidities through shared biological mechanisms.

Methodological Approaches for Investigating Genetic Sharing

Polygenic Risk Score Phenome-Wide Association Studies (PRS-PheWAS)

PRS-PheWAS represents a powerful methodological approach for investigating the pleiotropic effects of genetic liability to endometriosis across a broad spectrum of traits and conditions. This technique involves testing the association between an endometriosis PRS and multiple phenotypes in large biobanks, such as the UK Biobank [2]. The fundamental advantage of this approach is that it can identify genetic associations irrespective of disease diagnosis status, thereby capturing effects that may be present in undiagnosed individuals.

The standard workflow for PRS-PheWAS in endometriosis research involves several key steps: (1) derivation of PRS weightings from large-scale GWAS summary statistics using Bayesian methods such as SBayesR; (2) calculation of individual PRS in target cohorts; (3) association testing between PRS and multiple phenotypic categories, including ICD-10 diagnostic codes, blood and urine biomarkers, and reproductive factors; and (4) stratification by sex to identify sex-specific effects [2]. This approach has revealed numerous associations between genetic liability to endometriosis and other conditions, including migraine, irritable bowel syndrome, and depression [2].

Mendelian Randomization for Causal Inference

Mendelian randomization (MR) has emerged as a essential technique for investigating potential causal relationships between endometriosis and comorbid conditions. MR utilizes genetic variants as instrumental variables to assess causality while minimizing confounding from environmental factors [24] [25]. The standard MR approach requires that genetic variants satisfy three key assumptions: (1) robust association with the exposure, (2) independence from confounders, and (3) affect the outcome only through the exposure [24].

In the context of autoimmune disease and osteoarthritis, recent MR studies have employed univariable, multivariable, and two-step mediation analyses [24] [25]. These analyses have identified several autoimmune diseases with potential causal relationships to osteoarthritis, including celiac disease (OR = 1.061, 95% CI = 1.018-1.105, p = 0.005), Crohn's disease (OR = 1.235, 95% CI = 1.149-1.327, p = 9.44E-09), ankylosing spondylitis (OR = 2.63, 95% CI = 1.21-5.717, p = 0.015), rheumatoid arthritis (OR = 1.082, 95% CI = 1.034-1.133, p = 0.001), and ulcerative colitis (OR = 1.175, 95% CI = 1.068-1.294, p = 0.001) [24] [25]. These findings demonstrate the utility of MR for elucidating shared genetic mechanisms across seemingly distinct disease domains.

G A Genetic Instrument Variables (SNPs) B Exposure (e.g., Autoimmune Disease) A->B Assumption 1 C Confounders D Outcome (e.g., Osteoarthritis) A->D Assumption 3 (Only via exposure) B->D Assumption 2 C->B C->D

Diagram 2: Mendelian Randomization Framework for Causal Inference. This diagram illustrates the three key assumptions of Mendelian randomization used to investigate causal relationships between autoimmune diseases and osteoarthritis, applicable to endometriosis comorbidity research.

Cross-Phenotype Meta-Analysis (CPMA) for Genetic Cluster Identification

Cross-phenotype meta-analysis represents an advanced methodological approach for identifying shared genetic architecture across multiple autoimmune diseases. This technique, applied to ten pediatric autoimmune diseases, has revealed 27 genome-wide loci, with 22 shared by at least two diseases and 19 shared by at least three [26]. These shared loci predominantly map to biological pathways involved in immune processes, including cell activation, proliferation, and signaling systems [26].

The CPMA approach enables researchers to identify clusters of autoimmune diseases with similar genetic architectures, providing insights into potential shared therapeutic targets. For instance, one study identified that rheumatoid arthritis and ankylosing spondylitis form one distinct cluster, while multiple sclerosis and autoimmune thyroid disease form another, with type 1 diabetes showing similarities to both groups [27]. These patterns of genetic sharing have important implications for understanding the comorbid relationships between endometriosis and specific autoimmune conditions.

Genetic Overlap with Specific Comorbid Conditions

Autoimmune Diseases: Shared Genetic Architecture

Endometriosis demonstrates significant genetic overlap with various autoimmune diseases, suggesting shared etiological pathways. Large-scale genetic studies have identified substantial pleiotropy, with many endometriosis risk loci also conferring susceptibility to autoimmune conditions [2] [26]. The genetic relationship appears to be particularly strong for certain autoimmune diseases, including rheumatoid arthritis, systemic lupus erythematosus, and inflammatory bowel disease [2] [27].

Table 2: Patterns of Genetic Sharing Between Endometriosis and Autoimmune Diseases

Autoimmune Disease Shared Genetic Loci Key Shared Biological Pathways Implications for Endometriosis PRS
Rheumatoid Arthritis PTPN22, CTLA4, TNFAIP3 T-cell signaling, immune regulation Informs inflammatory subphenotype PRS [27]
Systemic Lupus Erythematosus IL1A, IL-1 family genes Innate immunity, cytokine signaling Relevant for systemic manifestations [2]
Inflammatory Bowel Disease NOD2, ATG16L1 Autophagy, microbial defense Guides GI symptom subphenotyping [27]
Celiac Disease SH2B3, IL2-IL21 region Immune cell differentiation Informs nutrient malabsorption comorbidity [24]
Ankylosing Spondylitis IL23R, ERAP1 IL-23/Th17 pathway, antigen presentation Relevant for axial pain components [24]

The shared genetic architecture between endometriosis and autoimmune diseases predominantly involves pathways related to immune cell differentiation, cytokine signaling, and innate immunity [27]. Notably, many of the shared loci demonstrate stronger expression in specific immune cell types, such as B cells, offering potential targets for therapeutic interventions that could simultaneously address both endometriosis and comorbid autoimmune conditions [26].

Osteoarthritis: Inflammation as a Shared Mechanism

The relationship between endometriosis and osteoarthritis represents a compelling example of genetic sharing across traditionally distinct disease categories. Recent Mendelian randomization studies have provided evidence for a potential causal relationship between several autoimmune diseases and osteoarthritis [24] [25]. While osteoarthritis has historically been considered a degenerative "wear-and-tear" condition, these genetic findings support the involvement of inflammatory processes in its pathogenesis, creating conceptual overlap with endometriosis.

Transcriptome analysis has revealed that metabolism-related pathways play a key role in the comorbidity between autoimmune diseases and osteoarthritis [24] [25]. This observation aligns with findings in endometriosis, where metabolic reprogramming has been increasingly recognized as a contributor to disease pathogenesis. The genetic overlap between these conditions suggests the potential for shared therapeutic approaches targeting inflammatory and metabolic pathways.

Pain Disorders: Central Sensitization and Genetic Vulnerability

Chronic pain represents a central feature of endometriosis that frequently co-occurs with other chronic pain conditions, suggesting shared genetic vulnerability to pain sensitization. While specific genetic variants underlying this relationship remain to be fully elucidated, PRS-PheWAS studies have identified associations between genetic liability to endometriosis and other pain-related conditions, including migraine and irritable bowel syndrome [2].

The genetic sharing between endometriosis and other pain disorders may involve pathways related to neuroinflammation, central sensitization, and altered pain processing. Emerging evidence suggests that the genetic liability to endometriosis can influence pain perception and sensitization independent of the physical disease manifestation, as demonstrated by associations observed in males who carry endometriosis genetic risk factors but do not develop the condition [2]. This finding highlights the potential for using genetic data to identify individuals at risk for chronic pain syndromes beyond traditional diagnostic boundaries.

Experimental Protocols for Genetic Overlap Studies

Protocol 1: Polygenic Risk Score PheWAS Implementation

Objective: To identify pleiotropic associations between genetic liability to endometriosis and a broad range of phenotypes in large biobanks.

Materials and Reagents:

  • GWAS summary statistics for endometriosis from large consortia
  • Quality-controlled genotype data from biobank resources (e.g., UK Biobank)
  • Phenotypic data mapped to standardized coding systems (e.g., ICD-10, phecodes)
  • Computational resources for large-scale genetic analysis

Procedure:

  • PRS Construction: Derive PRS weightings using Bayesian methods (e.g., SBayesR) applied to endometriosis GWAS summary statistics, excluding the MHC region to avoid confounding [2].
  • Target Cohort Preparation: Curate genetically homogeneous subgroups (e.g., unrelated European individuals) from biobank resources, with separate analyses for females, males, and females without endometriosis diagnosis [2].
  • PRS Calculation: Compute individual PRS in the target cohort using plink1.9's score function with the SBayesR weightings [2].
  • Phenotype Processing: Map diagnostic codes to phecodes, excluding codes with insufficient case numbers (<100 participants) [2].
  • Association Testing: Conduct logistic regression for binary traits and linear regression for continuous biomarkers, adjusting for age and genetic principal components [2].
  • Multiple Testing Correction: Apply false discovery rate (FDR) correction to account for the number of phenotypes tested.

Interpretation: Associations identified in all three cohorts (females, males, and females without diagnosis) suggest pleiotropic effects independent of disease manifestation, while sex-specific associations highlight the importance of hormonal or anatomical factors.

Protocol 2: Mendelian Randomization Analysis

Objective: To assess potential causal relationships between endometriosis and comorbid conditions using genetic instruments.

Materials and Reagents:

  • GWAS summary statistics for exposure (endometriosis) and outcome (comorbid condition)
  • Reference panel for linkage disequilibrium estimation
  • MR software packages (TwoSampleMR, MR-PRESSO)

Procedure:

  • Instrument Selection: Identify independent (r2 < 0.01) genome-wide significant (p < 5×10-8) SNPs associated with the exposure [24] [25].
  • Data Harmonization: Align effect alleles for exposure and outcome datasets, excluding palindromic SNPs with intermediate allele frequencies [24] [25].
  • Primary MR Analysis: Apply inverse variance weighted (IVW) method as primary analysis, with supplementary methods including MR-Egger, weighted median, and simple mode [24] [25].
  • Sensitivity Analyses: Assess heterogeneity using Cochran's Q statistic, horizontal pleiotropy using MR-Egger intercept, and outliers using MR-PRESSO [24] [25].
  • Reverse MR Analysis: Perform bidirectional MR to assess potential reverse causation.
  • Multivariable MR: Adjust for potential confounders when appropriate data are available.

Interpretation: A consistent effect across multiple MR methods with no evidence of directional pleiotropy supports a potential causal relationship. Significant heterogeneity may indicate subtype-specific effects or pleiotropic mechanisms.

Table 3: Key Research Reagents and Resources for Genetic Sharing Studies

Resource Category Specific Examples Application in Endometriosis Genetics Key Features
GWAS Summary Statistics Sapkota et al. 2017 meta-analysis [2], FinnGen endometriosis data [2] PRS construction, genetic correlation Large sample sizes, diverse endometriosis subphenotypes
Biobank Resources UK Biobank [6] [2], Danish Twin Registry [6] PRS-PheWAS, validation studies Deep phenotyping, genetic data, longitudinal follow-up
Analysis Software PLINK [2], GCTB (for SBayesR) [2], TwoSampleMR [24] [25] PRS calculation, MR analysis Efficient processing of large genetic datasets
Genetic Instruments 14-SNP PRS [6], 42-locus PRS [2] Genetic overlap studies Established effect sizes, validated associations
Pathway Analysis Tools MAGMA, DEPICT Biological mechanism elucidation Gene set enrichment, tissue-specific expression

Implications for Endometriosis Subphenotype Research and Therapeutics

The patterns of genetic sharing between endometriosis and its comorbid conditions have profound implications for refining PRS performance across endometriosis subphenotypes. The differential effect sizes of PRS across ovarian, infiltrating, and peritoneal subtypes [6] suggest that subtype-specific PRS optimization may enhance predictive accuracy and clinical utility. Furthermore, the identification of specific genetic overlaps with particular comorbid conditions may enable the development of PRS that not only predict endometriosis risk but also the likelihood of specific symptom profiles or comorbid conditions.

From a therapeutic perspective, the shared genetic architecture between endometriosis and autoimmune diseases offers opportunities for drug repurposing and novel target development. The identification of shared pathways, such as those involving B-cell activation or specific cytokine signaling, may guide the application of existing immunomodulatory therapies to endometriosis treatment [26]. Additionally, the genetic relationship with testosterone levels [2] suggests potential for hormonal interventions that extend beyond traditional estrogen-focused approaches.

Future research directions should include the development of integrated PRS that incorporate variants associated with both endometriosis and its key comorbidities, potentially offering improved stratification of patients based on their likely disease presentation and progression. Furthermore, investigation of the genetic relationships between endometriosis and comorbid conditions across diverse ancestral backgrounds represents a critical priority for addressing health disparities in endometriosis diagnosis and care.

The genetic sharing between endometriosis and comorbid conditions, including pain disorders, osteoarthritis, and autoimmune diseases, reveals complex pleiotropic relationships that extend beyond traditional diagnostic boundaries. Methodological advances in PRS-PheWAS, Mendelian randomization, and cross-phenotype meta-analysis have provided powerful tools for elucidating these relationships and their underlying biological mechanisms. The integration of these genetic insights into endometriosis subphenotype research holds significant promise for improving risk prediction, understanding disease heterogeneity, and developing targeted therapeutic approaches that address the multifaceted nature of this complex condition.

Endometriosis is a complex gynecological disorder with a significant genetic component, characterized by a polygenic architecture where numerous common genetic variants of small effect size collectively contribute to disease susceptibility. This whitepaper synthesizes current understanding of how genome-wide association studies (GWAS) have identified endometriosis risk loci and how their cumulative effects are quantified through polygenic risk scores (PRS). We examine the performance of PRS across different endometriosis subphenotypes, highlighting the increased genetic burden associated with moderate-to-severe disease stages. The functional characterization of risk variants through expression quantitative trait loci (eQTL) analysis reveals tissue-specific regulatory effects, providing insights into biological mechanisms. Emerging evidence suggests that genetic liability to endometriosis has pleiotropic effects on other traits, including hormonal factors. While current PRS models show promising discriminative ability, they have not yet reached clinical utility as stand-alone tools. Integration with clinical risk factors and symptoms may enable development of risk stratification tools to reduce diagnostic delays and improve patient outcomes.

Endometriosis is a common, estrogen-dependent inflammatory condition affecting approximately 10% of women of reproductive age, with familial clustering indicating a strong genetic component. Twin studies estimate the heritability of endometriosis at approximately 51% [11], while common single nucleotide polymorphisms (SNPs) account for approximately 26% of the disease variance [28]. The condition demonstrates complex inheritance patterns consistent with a polygenic architecture, where multiple genetic variants interact with environmental factors to influence disease risk.

The polygenic nature of endometriosis presents both challenges and opportunities for understanding its pathophysiology. Genome-wide association studies (GWAS) have successfully identified multiple risk loci, though individual variants confer only modest increases in risk. The cumulative effect of these variants can be quantified through polygenic risk scores (PRS), which aggregate the effects of many risk alleles into a single metric. These scores show particular utility for stratifying patients by disease subphenotypes, with stronger genetic effects observed in more severe disease forms [29].

This technical review examines the current state of knowledge regarding the polygenic architecture of endometriosis, with particular focus on PRS performance across disease subphenotypes. We provide detailed methodological frameworks for GWAS and PRS construction, analyze the functional characterization of risk variants, and discuss applications in both research and clinical contexts.

Genetic Architecture of Endometriosis

Historical Development of Genetic Studies

Initial genetic investigations of endometriosis employed candidate gene approaches, which were largely unsuccessful due to limited genomic coverage and inadequate sample sizes [11]. The advent of genome-wide association studies (GWAS) enabled hypothesis-free identification of common variants, revealing the highly polygenic nature of the condition. Early GWAS in Japanese and European populations identified the first robust associations, including variants in CDKN2B-AS1 and an intergenic region on 7p15.2 [11]. Subsequent meta-analyses substantially increased discovery power, identifying additional loci and highlighting the genetic correlation between European and Asian populations [30].

Table 1: Key GWAS and Meta-Analyses in Endometriosis Genetics

Study Sample Size (Cases/Controls) Ancestry Novel Loci Identified Key Findings
Uno et al. 2010 [11] 1,907/5,292 Japanese CDKN2B-AS1 First GWAS in Japanese population
Painter et al. 2011 [11] 3,194/7,060 European 7p15.2 First GWAS in European ancestry
Nyholt et al. 2012 [30] 4,604/9,393 Trans-ancestry 6 novel loci Demonstrated genetic correlation between populations
Sapkota et al. 2017 [28] 17,045/191,596 Trans-ancestry 5 novel loci Implicated genes in sex steroid hormone pathways
Recent meta-analysis [7] 17,045+/191,596+ Trans-ancestry 42 loci Identified tissue-specific regulatory effects

Established Risk Loci and Biological Pathways

Current GWAS have identified 42 genetic loci associated with endometriosis risk [2]. The majority of these variants reside in non-coding regions, suggesting they exert their effects through gene regulation rather than protein coding changes. Several key biological pathways are enriched among the implicated genes:

  • Sex steroid hormone signaling: WNT4, GREB1, ESR1, FSHB, CCDC170 [28]
  • Developmental processes: WNT4, HOX genes [11]
  • Cell proliferation and carcinogenesis: CDKN2A/CDKN2B [11]
  • Cell adhesion and extracellular matrix: VEZT, FN1 [28]

Notably, many endometriosis risk loci show pleiotropic effects with other reproductive traits and hormonal cancers, suggesting shared biological mechanisms [2].

Polygenic Risk Score Development and Validation

Methodological Framework for PRS Construction

Polygenic risk scores are calculated as the weighted sum of risk alleles an individual carries, with weights typically derived from GWAS effect sizes. The standard approach involves:

  • GWAS Summary Statistics: Effect sizes (beta coefficients or odds ratios) and p-values from large-scale GWAS meta-analyses [28]
  • Clumping and Thresholding: Selecting independent SNPs via linkage disequilibrium pruning and applying p-value thresholds [6]
  • Bayesian Methods: Advanced approaches like SBayesR that incorporate prior assumptions about genetic architecture [2]
  • Validation: Testing PRS performance in independent cohorts not included in the discovery GWAS

The PRS for an individual is calculated as:

[ PRSi = \sum{j=1}^{M} wj \times G{ij} ]

Where (wj) is the weight for SNP (j) (typically the log odds ratio from GWAS), (G{ij}) is the genotype of individual (i) for SNP (j) (coded as 0, 1, or 2 copies of the effect allele), and (M) is the number of SNPs included in the score [6].

G cluster_1 Discovery Phase cluster_2 PRS Construction cluster_3 Application Phase GWAS GWAS QC QC GWAS->QC Summary statistics Clumping Clumping QC->Clumping QC'd SNPs Weighting Weighting Clumping->Weighting Independent SNPs PRS PRS Weighting->PRS Weighting scheme Target Target Target->PRS Genotype data Validation Validation PRS->Validation PRS per individual

PRS Performance Across Populations and Subphenotypes

Multiple studies have validated PRS for endometriosis across diverse cohorts. A study using a 14-variant PRS demonstrated significant association with endometriosis risk in both Danish cohorts (OR = 1.57 per standard deviation increase, p = 2.5×10^-11) and the UK Biobank (OR = 1.28, p < 2.2×10^-16) [6]. The PRS was associated with all major subtypes of endometriosis, with the strongest effect for ovarian endometriosis (OR = 1.72) [6].

Table 2: Polygenic Risk Score Performance Across Endometriosis Subphenotypes

Subphenotype Cohort Odds Ratio per SD P-value Sample Size (Cases/Controls)
All endometriosis Combined Danish 1.57 2.5×10^-11 389/664
All endometriosis UK Biobank 1.28 <2.2×10^-16 2,967/256,222
Ovarian (N80.1) Combined Danish 1.72 6.7×10^-5 75/664
Infiltrating (N80.4-5) Combined Danish 1.66 2.7×10^-9 210/664
Peritoneal (N80.2-3) Combined Danish 1.51 2.6×10^-3 60/664
Stage B (rAFS III-IV) Australian/UK 1.38* 5.8×10^-12 1,357/8,075
Stage A (rAFS I-II) Australian/UK 1.15* 0.015 1,680/8,075

*Genetic risk score based on increasing number of SNPs at p-value threshold <0.5 [29]

Notably, PRS was not associated with adenomyosis (N80.0), suggesting distinct genetic architecture despite clinical similarities [6]. This specificity supports the hypothesis that PRS captures endometriosis-specific risk rather than general susceptibility to gynecological disorders.

Polygenic Risk Score Performance Across Subphenotypes

Genetic Burden by Disease Severity

A key finding in endometriosis genetics is the differential genetic loading across disease stages. Multiple studies have demonstrated that common genetic variants contribute more substantially to moderate-severe (rAFS Stage III-IV) endometriosis compared to minimal-mild disease (rAFS Stage I-II) [29]. The common SNP-based heritability is significantly higher for Stage B endometriosis (0.35) than for Stage A disease (0.15) [29].

Further analysis refining the staging to four categories (minimal, mild, moderate, and severe) revealed a gradient of genetic burden, with increasing contribution of common genetic variation from minimal to severe disease [29]. This gradient effect suggests that more severe forms of endometriosis may represent a more genetically determined subset of the condition.

Tissue-Specific Regulatory Effects

Functional characterization of endometriosis risk variants through expression quantitative trait loci (eQTL) analysis reveals tissue-specific regulatory patterns. A recent study investigating 465 endometriosis-associated variants across six relevant tissues (uterus, ovary, vagina, sigmoid colon, ileum, and peripheral blood) found distinct regulatory profiles [7]:

  • In reproductive tissues (ovary, uterus, vagina): enrichment of genes involved in hormonal response, tissue remodeling, and adhesion
  • In intestinal tissues (colon, ileum) and blood: predominance of immune and epithelial signaling genes

Key regulated genes include MICB (immune evasion), CLDN23 (barrier function), and GATA4 (proliferative signaling) [7]. This tissue-specific functional annotation provides mechanistic insights into how genetic variants might contribute to different disease manifestations.

Pleiotropic Effects and Comorbidities

PRS-phenome wide association studies (PheWAS) demonstrate that genetic liability to endometriosis has pleiotropic effects on numerous other traits, even in individuals without diagnosed endometriosis [2]. Notable associations include:

  • Reproductive factors: Earlier age at menarche, shorter menstrual cycles [2]
  • Hormonal profiles: Lower testosterone levels in both females and males [2]
  • Pain conditions: Chronic pain syndromes, migraine [2]
  • Immune-mediated disorders: Asthma, allergic rhinitis [2]

Mendelian randomization analyses suggest that lower testosterone levels may be causal for endometriosis risk, revealing a potentially modifiable hormonal risk factor [2].

Functional Characterization of Risk Variants

From Association to Mechanism

The translation of GWAS findings into biological insights requires comprehensive functional characterization. Most endometriosis-associated variants reside in non-coding regions, suggesting they influence gene regulation rather than protein function [7]. Integration with multi-omics data provides a powerful approach to understanding variant function:

  • Expression quantitative trait loci (eQTL) mapping: Identifies associations between genetic variants and gene expression levels [7]
  • Chromatin interaction mapping: Reveals physical connections between risk variants and promoter regions
  • Epigenomic profiling: Identifies variants overlapping regulatory elements in relevant cell types

A systematic eQTL analysis of endometriosis risk variants found that 35% showed significant regulatory effects in at least one tissue, with the strongest effects observed in uterus and ovary [7].

G cluster_1 Data Integration cluster_2 Analytical Pipeline GWAS GWAS eQTL eQTL GWAS->eQTL Lead variants Functional Functional eQTL->Functional Candidate genes Pathways Pathways Functional->Pathways Mechanisms Priorities Prioritized genes: - MICB (immune evasion) - CLDN23 (angiogenesis) - GATA4 (proliferation) Pathways->Priorities eQTL_data GTEx v8 database 6 endometriosis-relevant tissues eQTL_data->eQTL Functional_data MSigDB Hallmark gene sets Cancer Hallmarks collections Functional_data->Functional

Hormonal Pathways and Therapeutic Implications

Several novel endometriosis risk loci implicate genes with established roles in sex steroid hormone pathways, including FN1, CCDC170, ESR1, SYNE1, and FSHB [28]. Hormonal regulation appears to be a central mechanism in endometriosis genetics:

  • ESR1 (Estrogen Receptor Alpha): Contains multiple independent risk signals, highlighting the central role of estrogen signaling [28]
  • FSHB (Follicle-Stimulating Hormone Beta Subunit): Links pituitary gonadotropin regulation to endometriosis risk [28]
  • CCDC170: A estrogen-regulated gene adjacent to ESR1 with potential roles in cell migration [28]

These findings not only illuminate disease mechanisms but also highlight potential targets for therapeutic intervention, particularly for patients with specific genetic profiles.

Research Reagent Solutions

Table 3: Essential Research Tools for Endometriosis Genetic Studies

Reagent/Resource Function Example Use Key Features
GTEx Database v8 Tissue-specific eQTL reference Mapping regulatory consequences of risk variants 6 endometriosis-relevant tissues; significance threshold FDR <0.05 [7]
MSigDB Hallmark Gene Sets Curated biological pathway database Functional interpretation of regulated genes 50 well-defined biological states; Cancer Hallmarks collections [7]
GWAS Catalog (EFO_0001065) Repository of published GWAS results Variant selection and functional annotation 465 unique endometriosis-associated variants with p<5×10^-8 [7]
UK Biobank Population-based cohort with genetic data PRS validation and phenome-wide association ~500,000 participants; extensive phenotype data [2]
SBayesR Bayesian method for PRS calculation Effect size adjustment for PRS weighting Accounts for genetic architecture; improves prediction accuracy [2]
Ensembl VEP Variant effect predictor Functional annotation of risk variants Genomic location, functional region, associated gene [7]

Experimental Protocols

Standardized PRS Development Protocol

Objective: To develop and validate a polygenic risk score for endometriosis using GWAS summary statistics and independent target cohorts.

Materials:

  • GWAS summary statistics from endometriosis meta-analysis
  • Genotype and phenotype data from independent validation cohort(s)
  • Computational resources for genetic analysis (PLINK, GCTB, R)

Procedure:

  • Data Preprocessing: Apply quality control filters to GWAS summary statistics (INFO > 0.8, MAF > 0.01, removal of strand-ambiguous SNPs)
  • PRS Model Training: Use SBayesR method with default parameters, excluding MHC region (chr6:25-35Mb) due to complex linkage disequilibrium [2]
  • Target Data Preparation: Perform standard genotype QC on target samples, including relatedness filtering, heterozygosity checks, and population structure assessment
  • PRS Calculation: Generate scores using PLINK1.9's score function with SBayesR-derived weights [2]
  • Association Testing: Fit logistic regression models with PRS as predictor and endometriosis status as outcome, adjusting for age and genetic principal components

Validation: Assess discriminative accuracy via odds ratios per standard deviation increase in PRS and area under the receiver operating characteristic curve (AUC-ROC)

Tissue-Specific eQTL Analysis Protocol

Objective: To characterize the regulatory effects of endometriosis-associated variants across relevant tissues.

Materials:

  • List of endometriosis-associated variants (p < 5×10^-8) from GWAS Catalog
  • Tissue-specific eQTL data from GTEx Portal v8
  • Functional annotation resources (MSigDB Hallmark gene sets)

Procedure:

  • Variant Selection: Curate 465 endometriosis-associated variants with valid rsIDs from GWAS Catalog [7]
  • eQTL Mapping: Cross-reference variants with significant eQTLs (FDR < 0.05) in six endometriosis-relevant tissues: uterus, ovary, vagina, sigmoid colon, ileum, and whole blood [7]
  • Effect Size Extraction: Record regulated gene, slope (effect size), adjusted p-value, and tissue for each significant eQTL
  • Gene Prioritization: Apply two complementary criteria: (a) genes regulated by the highest number of eQTL variants, (b) genes with the strongest regulatory effects (based on slope values) [7]
  • Functional Interpretation: Map prioritized genes to biological pathways using MSigDB Hallmark gene sets and Cancer Hallmarks collections

Analysis: Compare regulatory profiles across tissues and identify tissue-specific enriched pathways.

The polygenic architecture of endometriosis encompasses hundreds of common variants with small effect sizes that collectively contribute to disease risk. Polygenic risk scores effectively capture this cumulative genetic burden and demonstrate utility for subphenotype stratification, with stronger effects observed in moderate-to-severe disease. Functional characterization of risk variants reveals tissue-specific regulatory mechanisms, particularly in hormonal pathways. While current PRS models show significant associations with endometriosis risk, their discriminative accuracy remains insufficient for standalone clinical application. Future research directions should include: (1) development of trans-ancestry PRS to improve equity in genetic risk prediction, (2) integration of rare variants from whole-exome sequencing studies, (3) multi-omics approaches combining genomic, transcriptomic, and epigenomic data, and (4) implementation in longitudinal cohorts to assess utility for risk prediction and early intervention. The ongoing expansion of GWAS sample sizes and functional annotation resources will continue to enhance our understanding of endometriosis genetics and move the field toward precision medicine approaches.

Constructing and Validating Endometriosis PRS: Methodological Frameworks and Subtype Applications

SNP Selection and Weighting Strategies for Endometriosis PRS

Polygenic risk scores (PRS) have emerged as a powerful tool for quantifying an individual's genetic liability to complex diseases. For endometriosis, a condition with a significant genetic component accounting for 47-51% of heritability, PRS offers promising avenues for risk prediction, patient stratification, and understanding of shared disease aetiology [31] [10]. The performance and clinical utility of PRS for endometriosis are fundamentally dependent on two core technical aspects: the selection of single nucleotide polymorphisms (SNPs) included in the score and the statistical methods used to weight their individual effects. This technical guide provides an in-depth examination of current SNP selection and weighting strategies within the broader context of endometriosis subphenotype research, addressing the critical need for standardized methodologies in genetic risk prediction for this complex gynecological disorder.

Established SNP Selection and Weighting Methods

Fundamental Approaches to SNP Selection

The selection of SNPs for inclusion in endometriosis PRS has traditionally followed two primary pathways: genome-wide significant SNP selection and clumping and thresholding methods.

Genome-wide Significant SNP Selection involves curating variants that surpass the conventional genome-wide significance threshold (P < 5 × 10-8) from large-scale genome-wide association studies (GWAS). For endometriosis, multiple studies have utilized this approach with a focused set of SNPs. A 2021 study derived a PRS based on 14 genome-wide significant lead SNPs identified from a published GWAS meta-analysis comprising over 17,000 endometriosis cases [6]. Similarly, a 2022 clinical presentation study calculated PRS using 13 SNPs with p-values < 5 × 10-8 that were present in their dataset [10]. While this approach ensures the inclusion of robustly associated variants, it potentially omits SNPs with smaller effect sizes that collectively contribute to disease risk.

Clumping and Thresholding (C+T) methods represent a more inclusive approach by incorporating SNPs below the genome-wide significance threshold. This method involves clumping SNPs to account for linkage disequilibrium (LD) and setting p-value thresholds for inclusion. The standard clumping parameters typically include an LD r2 threshold of 0.1-0.2 within a specified genomic distance (e.g., 250-500 kb) [32] [16]. PRSice-2, a widely used software for PRS analysis, automates this process and allows for optimization of p-value thresholds using the target dataset [32]. This method enables the capture of a broader polygenic signal beyond genome-wide significant hits, potentially improving predictive power.

Table 1: Standard Quality Control Parameters for PRS Analysis

Data Type QC Parameter Threshold Rationale
Base Data (GWAS Summary Statistics) Heritability (h²snp) > 0.05 Ensures sufficient genetic signal for meaningful PRS [16]
Effect allele identification Must be specified Prevents spurious results from strand mismatches [16]
INFO score > 0.8 Ensures high imputation quality [10]
Target Data (Genotypes) Sample missingness < 0.02 Removes poor-quality samples [16]
Minor allele frequency > 0.01 Reduces noise from rare variants [16]
Hardy-Weinberg equilibrium P > 1×10-5 Excludes variants with genotyping errors [10]
Heterozygosity rate ±3 SD from mean Removes contaminated samples [10]
Conventional Weighting Strategies

The weighting of SNP effects in PRS construction has evolved from simple to increasingly sophisticated methods.

Unaligned Effect Size Weighting represents the most straightforward approach, where SNP effect sizes (beta coefficients or odds ratios) from GWAS summary statistics are directly applied as weights in the PRS calculation. The basic PRS formula is expressed as:

[ PRSj = \sum{i=1}^{n} wi \times G{ij} ]

where ( PRSj ) is the polygenic risk score for individual ( j ), ( wi ) is the weight of SNP ( i ) derived from GWAS summary statistics, ( G_{ij} ) is the genotype of SNP ( i ) for individual ( j ) (coded as 0, 1, or 2 copies of the effect allele), and ( n ) is the total number of SNPs in the score [32].

Alternative Genetic Models can be implemented in PRS calculation software such as PRSice-2, which allows for different genetic models including additive (standard), dominant, recessive, and heterozygous models [32]. The coding of genotypes varies according to the selected model, affecting how the weighted scores are computed.

Scoring Methods include options beyond simple summation. PRSice-2 implements multiple scoring approaches: --score sum for the standard weighted sum, --score avg which divides the sum by the number of alleles included, --score std for standardized scores, and --score con-std for conditional standardization [32].

G start Start PRS Analysis base_data Base Data Preparation (GWAS Summary Statistics) start->base_data target_data Target Data Preparation (Individual Genotypes) start->target_data qc1 Quality Control: - Heritability check - Effect allele verification - INFO score filtering base_data->qc1 snp_selection SNP Selection Method qc1->snp_selection qc2 Quality Control: - Sample missingness - MAF filtering - HWE testing target_data->qc2 qc2->snp_selection gw_sig Genome-wide Significant SNPs snp_selection->gw_sig clump Clumping & Thresholding snp_selection->clump weighting Effect Size Weighting gw_sig->weighting clump->weighting direct Unaligned Effects weighting->direct bayesian Bayesian Methods weighting->bayesian prs_calc PRS Calculation direct->prs_calc bayesian->prs_calc validation Validation & Optimization prs_calc->validation

Advanced Methods for Enhanced Prediction

Bayesian Polygenic Methods

Advanced Bayesian methods have demonstrated improved performance for endometriosis PRS by applying shrinkage to SNP effect sizes to account for linkage disequilibrium and varying genetic architectures.

SBayesR Approach was employed in a 2023 PRS-PheWAS study of endometriosis, where summary statistics from multiple European cohorts were meta-analyzed and subsequently adjusted using SBayesR implemented in GCTB 2.02 [31]. This method uses a mixture of normal distributions with different variances to shrink SNP effects, effectively assigning more weight to SNPs with stronger evidence of association while shrinking others toward zero. The study excluded the MHC region due to its complex LD structure, a common practice in PRS analysis to avoid spurious associations [31].

LD Pred is another Bayesian method that infers the posterior mean effect size of each SNP by using a prior that reflects assumptions about the distribution of effect sizes across the genome, leveraging LD information from a reference panel. While not explicitly mentioned in the endometriosis-focused search results, it represents a widely used approach in the PRS methodology toolkit that could be applied to endometriosis risk prediction [16].

Machine Learning Approaches

Emerging machine learning techniques show promise for enhancing genomic prediction of endometriosis beyond traditional methods.

Deep Neural Network Approaches are being explored to capture complex, non-additive genetic effects in endometriosis. A 2025 study described "an extensive multi-variant deep neural network approach to enhance genomic prediction of endometriosis," suggesting the potential for machine learning to improve prediction accuracy by modeling higher-order interactions between genetic variants [33]. These methods can incorporate thousands of genetic variants without relying on strict p-value thresholds, potentially capturing a more comprehensive genetic signal.

Table 2: Comparison of PRS Performance Across Endometriosis Studies

Study SNP Selection Method Weighting Approach Sample Size Performance (OR per SD)
Søgaard et al. (2021) [6] 14 genome-wide significant SNPs Unaligned effect sizes 249 cases, 348 controls (clinical cohort) OR = 1.59, p = 2.57×10^-7
Same 14 SNPs Unaligned effect sizes 2,967 cases, 256,222 controls (UK Biobank) OR = 1.28, p < 2.2×10^-16
Law et al. (2023) [31] Multi-threshold SBayesR 159,855 males, 188,221 females (UK Biobank) Significant associations with multiple health conditions
León et al. (2022) [10] 13 genome-wide significant SNPs Weighted and unweighted 172 patients Inverse associations with disease spread
Subphenotype-Specific Strategies

Endometriosis exhibits considerable heterogeneity in clinical presentation, and emerging evidence suggests that genetic burden varies across disease stages and subtypes, necessitating tailored PRS approaches.

Stage-Specific Genetic Burden was demonstrated in a genetic burden analysis that found increasing polygenic contribution from minimal to severe endometriosis [29]. The study revealed that moderate and severe endometriosis (rAFS Stage III/IV) showed greater genetic burden than minimal or mild disease (rAFS Stage I/II), suggesting that PRS constructed from GWAS of advanced disease may have better predictive power for severe forms [29].

Subtype-Specific PRS Performance was evaluated in a 2021 study that tested PRS association across endometriosis subtypes, finding that the PRS was associated with ovarian (OR = 1.72), infiltrating (OR = 1.66), and peritoneal (OR = 1.51) endometriosis [6]. This indicates that genetic risk factors contribute to all major subtypes rather than being specific to certain locations, supporting the use of a unified PRS for general endometriosis risk prediction.

Differential PRS Associations with clinical presentations were explored in a 2022 study that identified inverse associations between endometriosis PRS and spread of endometriosis, involvement of the gastrointestinal tract, and hormone treatment, though with limited specificity and sensitivity [10]. This suggests that specific PRS may need to be developed to predict clinical presentations in patients with endometriosis.

Technical Implementation and Research Toolkit

Essential Research Reagents and Computational Tools

Implementation of robust PRS analysis requires specific computational tools and quality-controlled datasets.

Table 3: Research Reagent Solutions for Endometriosis PRS Analysis

Tool/Reagent Function Implementation Example
PRSice-2 [32] PRS calculation and clumping Command-line tool for automated C+T analysis with p-value threshold optimization
PLINK 1.9/2.0 [31] [10] Genotype data management and PRS calculation --score function for applying SNP weights to target genotypes
GCTB [31] Bayesian PRS modeling Implementation of SBayesR for effect size shrinkage using summary statistics
Illumina Global Screening Array [10] Genotyping platform Used in clinical studies for generating target genotype data
TOPMed Imputation Server [10] Genotype imputation Reference-based imputation to increase SNP coverage using TOPMed panel
FlashPCA2 [10] Principal component analysis Population structure correction in target datasets
Workflow Integration and Validation

G cluster_validation Validation & Interpretation val1 Association Testing val2 Subphenotype Stratification val3 Clinical Correlation Analysis val4 Pleiotropy Assessment input1 Base GWAS Data (Endometriosis Summary Statistics) method PRS Method Selection input1->method input2 Target Genotype Data (Research Cohort) input2->method output PRS Output method->output output->val1 output->val2 output->val3 output->val4

Validation Procedures for endometriosis PRS must include association testing in independent cohorts, with particular attention to subtype-specific performance. The 2021 study by Søgaard et al. demonstrated a rigorous validation approach, testing PRS performance in three different cohorts: surgically confirmed cases from a specialized endometriosis center, cases from a twin registry based on ICD-10 codes, and a large replication analysis in the UK Biobank [6]. This multi-cohort approach strengthens the evidence for PRS validity across different ascertainment methods.

Pleiotropy Assessment through PRS-PheWAS represents an advanced application, as demonstrated in a 2023 study that investigated associations between endometriosis PRS and numerous health conditions, biomarkers, and reproductive factors across males, females, and females without endometriosis diagnoses [31]. This approach helps elucidate the broader phenomic impact of endometriosis genetic risk factors and reveals potential shared biological pathways with comorbid conditions.

SNP selection and weighting strategies for endometriosis PRS have evolved from simple approaches using a handful of genome-wide significant SNPs to sophisticated methods incorporating thousands of variants with Bayesian shrinkage or machine learning algorithms. The genetic architecture of endometriosis, with its subtype-specific burden and varying heritability across disease stages, necessitates careful consideration of both SNP selection parameters and weighting schemes. Optimal PRS construction for endometriosis research should account for the specific research question—whether predicting general risk, specific subphenotypes, or exploring genetic correlations with comorbid conditions. As GWAS sample sizes continue to grow and methods become more refined, PRS is poised to play an increasingly important role in endometriosis research, from elucidating biological mechanisms to potentially informing clinical stratification in the future.

The accuracy of endpoint ascertainment is a fundamental methodological consideration in endometriosis research, particularly for studies evaluating polygenic risk score (PRS) performance across disease subphenotypes. The diagnostic gold standard for endometriosis remains surgical visualization with histological confirmation, yet research practicality often necessitates using registry-based diagnoses from administrative health data or self-reporting [22] [34]. This technical guide examines the operational characteristics, validation evidence, and methodological implications of these divergent ascertainment approaches for genetic epidemiological studies.

The prolonged diagnostic delay of 7-11 years from symptom onset to surgical diagnosis exacerbates ascertainment challenges, as many cases remain undetected in population-based registries [22] [3]. Furthermore, endometriosis manifests as heterogeneous subphenotypes—superficial peritoneal endometriosis (SPE), ovarian endometriomas (OMA), and deep infiltrating endometriosis (DIE)—each with distinct clinical presentations and potentially different genetic architectures [22]. Understanding how diagnostic ascertainment methods capture this heterogeneity is crucial for interpreting PRS performance across subtypes.

Methodological Frameworks for Endometriosis Ascertainment

Surgical Confirmation as the Reference Standard

Surgical confirmation represents the diagnostic reference standard, characterized by direct visualization of lesions during laparoscopy or laparotomy, often accompanied by histological examination. The procedural methodology typically involves:

  • Preoperative preparation: Patients undergo standardized clinical assessment including symptom documentation and often preoperative imaging.
  • Surgical protocol: Systematic exploration of the pelvic cavity including uterosacral ligaments, pouch of Douglas, ovarian surfaces, and peritoneal surfaces.
  • Documentation: Completion of standardized operative forms such as the revised American Society for Reproductive Medicine (rASRM) worksheet documenting lesion location, size, depth, and morphology [34].
  • Histological confirmation: Excision of suspected lesions for pathological examination confirming endometrial glands and/or stroma.

This method allows for precise subphenotype classification according to established systems including rASRM, ENZIAN, and AAGL classifications [22]. However, surgical confirmation introduces selection biases as it typically captures patients with more severe symptoms, infertility, or those who have failed conservative management.

Registry-Based Diagnostic Approaches

Registry-based diagnoses utilize International Classification of Diseases (ICD) codes from administrative health databases, typically coded as N80.0-N80.9 for endometriosis and its subtypes. The methodological framework involves:

  • Data extraction: Retrieval of ICD codes from hospital discharge records, specialist visits, and procedural claims databases.
  • Case identification: Application of algorithm-based definitions using code frequency, encounter type, and code combinations.
  • Subtype classification: Categorization based on ICD subcodes (e.g., N80.1 for ovarian endometriosis, N80.3 for pelvic peritoneal endometriosis) [6].
  • Linkage capacity: Integration with other health datasets for comprehensive phenotyping, though this varies by registry infrastructure.

This approach enables large sample sizes and population-based sampling but is subject to coding inaccuracies, healthcare access biases, and variability in clinical diagnostic practices preceding code assignment.

Quantitative Validation Evidence

Agreement Between Ascertainment Methods

Recent validation studies provide quantitative metrics for interpreting registry-based diagnoses against the surgical gold standard. A 2024 analysis of the ENDO Study cohort (n=412) linked with the Utah Population Database offers key validation statistics [34]:

Table 1: Validation Metrics for Endometriosis Diagnoses in Administrative Health Data Versus Surgical Confirmation

Endometriosis Category Sensitivity Specificity Agreement (Kappa) Sample Size
Overall Endometriosis 0.88 0.87 0.74 (Substantial) 173
Superficial Endometriosis 0.86 0.83 0.65 (Substantial) 143
Ovarian Endometriomas 0.82 0.92 0.58 (Moderate) 38
Deep Infiltrating Endometriosis 0.12 0.99 0.17 (Slight) 58

These data reveal critical patterns: while overall endometriosis diagnosis shows substantial agreement between administrative data and surgical confirmation, deep infiltrating endometriosis is markedly under-ascertained in registry data [34]. This has profound implications for genetic studies targeting this specific subphenotype.

Self-Reported Endometriosis Validation

Emerging research also addresses the validity of self-reported endometriosis. A 2025 validation study within the Australian Longitudinal Study on Women's Health (ALSWH) found high agreement between self-report and clinical diagnosis, though specific metrics were not provided in the available excerpt [35]. Previous literature cited in the validation studies suggests confirmation rates between 84-95% for self-reported endometriosis when verified against surgical records [34].

Implications for Polygenic Risk Score Performance

Methodological Impact on Genetic Studies

The choice of ascertainment method significantly influences PRS performance metrics and downstream analyses. Evidence from recent studies demonstrates:

Table 2: PRS Performance Across Diagnostic Ascertainment Methods in Endometriosis

Study Cohort Ascertainment Method Sample Size PRS Odds Ratio per SD P-value Subtype Information
Clinical Cohort [6] Surgical confirmation 249 cases, 348 controls 1.59 2.57×10⁻⁷ Complete subphenotyping
Danish Twin Registry [6] ICD-10 codes from patient registry 140 cases, 316 controls 1.50 0.0001 Limited to ICD subcodes
UK Biobank [6] ICD-10 codes + self-report 2,967 cases, 256,222 controls 1.28 <2.2×10⁻¹⁶ Basic subtype differentiation

The pattern of decreasing odds ratios with increasing sample size and less stringent ascertainment reflects the dilution effect of including misclassified cases and etiologically heterogeneous phenotypes. Notably, the PRS showed association with all endometriosis subtypes in surgically confirmed cases (ovarian: OR=1.72, infiltrating: OR=1.66, peritoneal: OR=1.51) [6], highlighting the value of precise phenotyping for elucidating subtype-specific genetic architectures.

Differential Misclassification Across Subphenotypes

The validation evidence demonstrates that misclassification varies substantially across endometriosis subphenotypes. Deep infiltrating endometriosis shows particularly poor sensitivity in administrative data (12%) despite high specificity (99%) [34]. This differential misclassification introduces substantial bias in genetic association studies:

  • Spectrum bias: Registry-based studies systematically under-ascertain deep infiltrating disease, potentially missing genetic variants specific to this aggressive subphenotype.
  • Effect size attenuation: Inclusion of false positives (despite high specificity) and exclusion of false negatives reduces statistical power and attenuates genetic effect size estimates.
  • Subtype misclassification: ICD coding systems do not perfectly align with clinically meaningful subphenotypes, potentially grouping etiologically distinct entities.

G Surgical Surgical Confirmation Subphenotypes Endometriosis Subphenotypes Surgical->Subphenotypes Registry Registry-Based Diagnosis SPE_sens Sensitivity: 86% Registry->SPE_sens OMA_sens Sensitivity: 82% Registry->OMA_sens DIE_sens Sensitivity: 12% Registry->DIE_sens SPE Superficial Peritoneal Endometriosis (SPE) Subphenotypes->SPE OMA Ovarian Endometriomas (OMA) Subphenotypes->OMA DIE Deep Infiltrating Endometriosis (DIE) Subphenotypes->DIE SPE->SPE_sens OMA->OMA_sens DIE->DIE_sens PRS PRS Performance SPE_sens->PRS OMA_sens->PRS DIE_sens->PRS

Diagram 1: Differential ascertainment of endometriosis subphenotypes impacts PRS performance. Registry-based diagnoses show markedly lower sensitivity for deep infiltrating disease (12%) compared to superficial (86%) and ovarian (82%) forms [34].

Integrated Ascertainment Framework

For optimal PRS development and validation, a hybrid ascertainment approach leveraging the complementary strengths of both methods is recommended:

  • Primary cohort: Surgically confirmed cases for discovery-phase PRS development with complete subphenotype information.
  • Validation cohort: Large registry-based populations for PRS validation and generalizability assessment.
  • Statistical correction: Application of quantitative bias analysis techniques to account for differential misclassification.
  • Sensitivity analyses: Conducting analyses across multiple ascertainment definitions to assess robustness of findings.

Standardized Operational Protocols

To maximize data quality within each ascertainment framework, implement standardized protocols:

Surgical confirmation protocols:

  • Utilize standardized operative forms (rASRM) completed immediately post-procedure
  • Document lesion location, size, appearance, and depth of infiltration
  • Collect photographic documentation where feasible
  • Obtain histopathological confirmation for excised lesions

Registry-based ascertainment protocols:

  • Apply algorithm-based case definitions requiring multiple codes across encounters
  • Incorporate procedure codes for laparoscopic confirmation where available
  • Implement subtype definitions using specific ICD subcodes
  • Link with prescription data to identify endometriosis-specific treatments

The Scientist's Toolkit: Essential Research Reagents

Table 3: Key Research Reagents and Resources for Endometriosis Cohort Studies

Resource Category Specific Examples Research Application Technical Considerations
Validation Cohorts ENDO Study (Utah operative cohort) [34] Provides gold-standard phenotyping for algorithm validation Limited diversity; surgical population
Biobanks UK Biobank [6] [2], Danish Twin Registry [6] Large-scale genetic studies with health record linkage Heterogeneous phenotyping across sites
Genetic Arrays Illumina Infinium MethylationEPIC BeadChip [36] Epigenomic profiling of endometrial tissue Cellular heterogeneity impacts interpretation
PRS Methods SBayesR [2], LDpred Polygenic risk score calculation Sensitivity to ancestral background
Phenotyping Tools rASRM operative forms [34], ENZIAN classification [22] Standardized surgical documentation Inter-rater variability requires training
Data Linkage Systems Utah Population Database [34] Links surgical data with longitudinal health records Privacy protections limit granular data

The choice between surgical confirmation and registry-based diagnoses represents a fundamental trade-off between phenotyping precision and sample size in endometriosis genetic research. Surgical confirmation enables precise subphenotype characterization essential for elucidating subtype-specific genetic architectures but introduces selection biases and limits sample size. Registry-based diagnoses facilitate large-scale genetic studies but suffer from differential misclassification across subphenotypes, particularly for deep infiltrating disease.

For PRS studies targeting specific endometriosis subphenotypes, surgical confirmation remains preferable despite practical limitations. In registry-based studies, researchers should implement validated case definitions, acknowledge differential misclassification, and conduct sensitivity analyses to assess robustness of findings. The integration of novel data sources, including molecular markers and advanced imaging, promises to enhance future phenotyping approaches beyond this traditional dichotomy.

As endometriosis research advances toward personalized risk prediction and targeted interventions, precise phenotype ascertainment will remain the foundation upon which valid genetic discoveries are built.

This whitepaper provides a comprehensive technical analysis of performance metrics, specifically odds ratios, associated with various subtypes of endometriosis and their corresponding risks for ovarian cancer. Framed within a broader thesis on polygenic risk score performance across endometriosis subphenotypes, this guide synthesizes cutting-edge genetic epidemiology and clinical cohort studies to elucidate distinct risk profiles. Endometriosis, a complex inflammatory condition affecting approximately 6.3% to 11% of reproductive-aged women, is now recognized not as a single entity but as a spectrum of diseases with potentially divergent etiologies and oncogenic potentials [37]. Understanding these subtype-specific risk profiles is critical for refining genetic risk models and developing targeted surveillance and prevention strategies for at-risk populations. This document serves as a critical resource for researchers, scientists, and drug development professionals working to translate genetic discoveries into clinically actionable insights.

Quantitative Risk Profiles: Odds Ratios Across Subtypes

Endometriosis-Associated Ovarian Cancer Risk by Histotype

Table 1 summarizes the causal relationships between genetically proxied endometriosis and major ovarian cancer histotypes, as determined by two-sample Mendelian randomization analysis. These data establish that endometriosis significantly increases risk for specific, but not all, ovarian cancer subtypes [38].

Table 1: Causal Effects of Endometriosis on Ovarian Cancer Histotypes via Mendelian Randomization

Ovarian Cancer Histotype Odds Ratio (OR) 95% Confidence Interval P-value
Overall Ovarian Cancer 1.18 1.10-1.28 < 0.001
High-Grade Serous 1.12 1.01-1.23 0.03
Clear Cell Carcinoma 1.87 1.44-2.43 < 0.001
Endometrioid Carcinoma 1.48 1.30-1.69 < 0.001
Low-Grade Serous Not Significant - -
Invasive Mucinous Not Significant - -

Anatomic Subtype-Specific Risk Profiles for Ovarian Cancer

Different anatomic subtypes of endometriosis demonstrate differential oncogenic potential. Table 2 presents odds ratios for ovarian cancer histotypes based on specific endometriosis locations, revealing distinct patterns of association that underscore their etiological heterogeneity [38].

Table 2: Anatomic Subtype-Specific Causal Effects on Ovarian Cancer Histotypes

Endometriosis Subtype High-Grade Serous OR (95% CI) Clear Cell OR (95% CI) Endometrioid OR (95% CI)
Pelvic Peritoneal Not Significant 1.81 (1.52-2.16) Not Significant
Deep Infiltrating 1.10 (1.04-1.17) Not Significant 1.25 (1.13-1.40)
Ovarian 1.09 (1.02-1.15) 1.65 (1.46-1.86) 1.48 (1.30-1.69)
Rectovaginal Not Significant Not Significant 1.25 (1.04-1.51)

Polygenic Risk Score Performance Across Endometriosis Subtypes

Table 3 illustrates the association of a 14-SNP polygenic risk score with endometriosis and its major subtypes across multiple cohorts. These findings demonstrate that PRS captures increased risk for all types of endometriosis rather than site-specific susceptibility [6] [39] [40].

Table 3: Polygenic Risk Score Associations Across Endometriosis Subtypes

Study Cohort Overall Endometriosis OR (95% CI) Ovarian Endometriosis OR (95% CI) Infiltrating Endometriosis OR (95% CI) Peritoneal Endometriosis OR (95% CI)
Combined Danish Cohorts 1.57 (p = 2.5×10⁻¹¹) 1.72 (p = 6.7×10⁻⁵) 1.66 (p = 2.7×10⁻⁹) 1.51 (p = 2.6×10⁻³)
UK Biobank 1.28 (p < 2.2×10⁻¹⁶) - - -

Experimental Protocols and Methodologies

Mendelian Randomization Framework for Causal Inference

The most robust evidence for subtype-specific risks comes from Mendelian randomization (MR) studies, which utilize genetic variants as instrumental variables to infer causality while minimizing confounding bias inherent in observational studies [38]. The MR approach relies on three core assumptions: (1) genetic variations must be strongly associated with the exposure (endometriosis subtypes), (2) genetic variations must not be associated with confounders, and (3) genetic variations must affect the outcome (ovarian cancer) only through the exposure [38].

Instrumental Variable Selection: Genome-wide association study (GWAS) summary data for endometriosis subtypes were obtained from the FinnGen Consortium (20,190 cases, 130,160 controls of European ancestry) [38]. Ovarian cancer GWAS data came from the Ovarian Cancer Association Consortium (25,509 cases, 40,941 controls) [38]. Single nucleotide polymorphisms (SNPs) significantly associated with endometriosis (p < 5 × 10⁻⁸) were selected as instrumental variables, with linkage disequilibrium clumping (r² < 0.001) to ensure independence [38]. Weak instrument bias was assessed via F-statistics (range: 30.01-228.09), with all values >10 indicating robust instruments [38].

Statistical Analysis: The primary analysis used inverse variance weighted (IVW) meta-analysis to combine SNP-specific causal estimates [38]. Sensitivity analyses included MR-Egger regression (to assess directional pleiotropy), weighted median method (providing consistent estimates when up to 50% of information comes from invalid instruments), and MR-PRESSO (to identify and correct for outliers) [38]. Heterogeneity was assessed using Cochran's Q statistic, with random-effects models applied when significant heterogeneity was detected [38].

Polygenic Risk Score Construction and Validation

Polygenic risk scores aggregate the effects of multiple genetic risk variants into a single measure of genetic susceptibility [6] [40]. The PRS methodology employed in the cited studies followed this protocol:

Variant Selection: A 14-variant PRS was derived from the largest endometriosis GWAS meta-analysis published at the time, comprising over 17,000 cases [6] [40]. These SNPs represented genome-wide significant lead variants from the discovery GWAS.

Score Calculation: The PRS was calculated as the weighted sum of risk alleles: PRS = β₁SNP₁ + β₂SNP₂ + ... + β₁₄SNP₁₄, where β represents the effect size (log odds ratio) of each SNP from the original GWAS [6]. Scores were standardized to a mean of 0 and standard deviation of 1 for analysis.

Validation Cohorts: The PRS was validated across three independent cohorts: (1) surgically confirmed cases from a Western Danish endometriosis referral center (249 cases, 348 controls), (2) cases identified from the Danish Twin Registry based on ICD-10 codes (140 cases, 316 controls), and (3) replication in the UK Biobank (2,967 cases, 256,222 controls) [6] [40]. Association analyses used logistic regression with PRS as predictor and endometriosis status as outcome, adjusting for principal components to account for population stratification.

Clinical Cohort Studies with Long-Term Follow-Up

Long-term hospital-based cohort studies provide critical insights into the natural history and recurrence patterns of different endometriosis subtypes [41]. The methodology for these studies typically includes:

Patient Recruitment: Medical records of all patients undergoing surgery for endometriosis during a defined period (e.g., 1997-2018) are reviewed [41]. Inclusion criteria typically require surgically confirmed endometriosis recurrence, defined as subsequent surgery for endometriosis after previous complete surgical excision [41].

Subtype Classification: Three primary subtypes are defined based on surgical and histopathological findings: superficial peritoneal endometriosis (SUP), ovarian endometrioma (OMA), and deep infiltrating endometriosis (DIE) [41]. Each subtype is confirmed through visual inspection during laparoscopy and histopathological examination of excised tissue.

Outcome Measures: The primary outcomes are time to recurrence and variation in endometriosis subtype between first and recurrent surgeries [41]. Statistical analyses include Kaplan-Meier survival curves for recurrence-free survival, Cox proportional hazards models for time-to-event data, and logistic regression for subtype transitions.

Pathophysiological Mechanisms and Signaling Pathways

The association between specific endometriosis subtypes and distinct ovarian cancer histotypes suggests underlying biological mechanisms that may drive malignant transformation. The following diagram illustrates the proposed signaling pathways linking endometriosis subtypes to ovarian cancer development:

G Genetic Risk Variants Genetic Risk Variants Endometriosis Subtypes Endometriosis Subtypes Genetic Risk Variants->Endometriosis Subtypes Inflammatory Microenvironment Inflammatory Microenvironment DNA Damage & Genomic Instability DNA Damage & Genomic Instability Inflammatory Microenvironment->DNA Damage & Genomic Instability Pelvic Peritoneal Endometriosis Pelvic Peritoneal Endometriosis Chronic Local Inflammation Chronic Local Inflammation Pelvic Peritoneal Endometriosis->Chronic Local Inflammation Ovarian Endometrioma Ovarian Endometrioma Iron-Induced Oxidative Stress Iron-Induced Oxidative Stress Ovarian Endometrioma->Iron-Induced Oxidative Stress Deep Infiltrating Endometriosis Deep Infiltrating Endometriosis Tissue Remodeling & Fibrosis Tissue Remodeling & Fibrosis Deep Infiltrating Endometriosis->Tissue Remodeling & Fibrosis Clear Cell Carcinoma Clear Cell Carcinoma Endometrioid Carcinoma Endometrioid Carcinoma High-Grade Serous High-Grade Serous Endometriosis Subtypes->Pelvic Peritoneal Endometriosis Endometriosis Subtypes->Ovarian Endometrioma Endometriosis Subtypes->Deep Infiltrating Endometriosis Chronic Local Inflammation->Inflammatory Microenvironment Iron-Induced Oxidative Stress->Inflammatory Microenvironment Tissue Remodeling & Fibrosis->Inflammatory Microenvironment Somatic Mutations Somatic Mutations DNA Damage & Genomic Instability->Somatic Mutations Malignant Transformation Malignant Transformation Somatic Mutations->Malignant Transformation Malignant Transformation->Clear Cell Carcinoma Malignant Transformation->Endometrioid Carcinoma Malignant Transformation->High-Grade Serous

Figure 1: Proposed Pathophysiological Pathways from Endometriosis Subtypes to Ovarian Cancer

This mechanistic model illustrates how different endometriosis subtypes create distinct microenvironments conducive to specific ovarian cancer histotypes. Pelvic peritoneal endometriosis is strongly linked to chronic inflammation and shows particular specificity for clear cell carcinoma development [38] [42]. Ovarian endometriomas involve iron-induced oxidative stress from recurrent hemorrhage, creating conditions favorable for both clear cell and endometrioid carcinomas [38]. Deep infiltrating endometriosis promotes tissue remodeling and fibrosis, with associations spanning multiple histotypes including endometrioid and high-grade serous carcinomas [38].

The Scientist's Toolkit: Essential Research Reagents and Materials

Table 4 catalogs key reagents and methodologies essential for investigating endometriosis subtypes and their associated ovarian cancer risks.

Table 4: Research Reagent Solutions for Endometriosis Subtype Studies

Reagent/Methodology Function/Application Example Implementation
FinnGen Consortium GWAS Data Provides genetic association summary statistics for endometriosis subtypes Source of genetic instruments for Mendelian randomization studies [38]
OCAC Ovarian Cancer GWAS Offers genomic data for ovarian cancer histotype analysis Outcome data for causal inference analyses [38]
14-SNP Polygenic Risk Score Quantifies aggregated genetic susceptibility to endometriosis PRS construction using effect sizes from largest endometriosis GWAS meta-analysis [6]
CD138/Syndecan-1 Immunohistochemistry Identifies plasma cells for diagnosis of chronic endometritis Marker for endometrial inflammatory profile in peritoneal endometriosis [42]
Laparoscopic Visualization & Staging Gold standard for endometriosis diagnosis and subtyping Surgical confirmation of SUP, OMA, and DIE subtypes according to rASRM classification [41] [42]
MR-PRESSO Statistical Package Detects and corrects for horizontal pleiotropy in MR studies Outlier removal and distortion testing in causal inference analyses [38]
Utah Population Database Population-based resource linking pedigrees with health data Retrospective cohort studies of endometriosis-ovarian cancer associations [37]

The comprehensive analysis of performance metrics across endometriosis subtypes reveals a complex landscape of subtype-specific ovarian cancer risks. The data demonstrate that pelvic peritoneal lesions show particular specificity for clear cell carcinoma, while deep infiltrating endometriosis exhibits broader associations across multiple histotypes. The polygenic risk scores currently available capture general endometriosis susceptibility rather than subtype-specific risk, highlighting an important limitation in current genetic prediction models. These findings underscore the necessity for refined classification systems that integrate anatomical, molecular, and genetic data to improve risk stratification. For drug development professionals, these insights suggest potential opportunities for subtype-targeted therapeutic strategies and prevention protocols. Future research should focus on elucidating the precise molecular mechanisms driving the subtype-specific malignant transformation and developing more sophisticated polygenic risk models that can accurately predict not just overall endometriosis risk, but specifically the high-risk subtypes associated with ovarian cancer development.

Polygenic Risk Score Prediction for Endometriosis

Endometriosis is a complex, chronic inflammatory condition affecting approximately 10% of women of reproductive age, characterized by the growth of endometrial-like tissue outside the uterus [31] [43] [44]. It presents a substantial diagnostic challenge, with an average delay of 7-10 years from symptom onset to definitive diagnosis, primarily because the current gold standard requires invasive laparoscopic surgery [31] [44]. The disease demonstrates significant heterogeneity in its clinical presentation, anatomical location, and treatment response, creating an pressing need for better stratification tools and early detection methods [45] [44].

The genetic component of endometriosis is substantial, with heritability estimates ranging from 47% to 51% [31] [10]. This strong genetic basis has motivated the development of polygenic risk scores (PRS) which aggregate the effects of numerous genetic variants into a single measure of genetic liability [6] [43]. Large-scale biobanks, particularly the UK Biobank and Danish health registries, have become invaluable resources for developing and validating these PRS, providing the extensive genotyped populations necessary for robust statistical analysis [31] [6] [43].

This technical guide examines the application of PRS for endometriosis within biobank populations, focusing specifically on methodologies and findings from the UK Biobank and Danish registry studies. Framed within broader research on PRS performance across endometriosis subphenotypes, this review provides researchers and drug development professionals with a comprehensive analysis of current capabilities, methodological considerations, and clinical translational potential.

Polygenic Risk Score Performance in Biobank Populations

Discriminatory Performance Across Cohorts

Studies conducted in both Danish and UK Biobank populations have consistently demonstrated the association between endometriosis PRS and disease risk, though effect sizes vary across cohorts and endometriosis subtypes.

Table 1: Performance Metrics of Endometriosis PRS Across Biobank Studies

Cohort Case Definition Sample Size (Cases/Controls) Odds Ratio per SD P-value Subtypes Analyzed
Danish Clinical Cohort Surgically confirmed 249/348 1.59 2.57×10^-7 Ovarian, Infiltrating, Peritoneal
Danish Twin Registry ICD-10 codes 140/316 1.50 0.0001 Ovarian, Infiltrating, Peritoneal
Combined Danish Cohorts Mixed 389/664 1.57 2.5×10^-11 All major subtypes
UK Biobank ICD-10 codes 2,967/256,222 1.28 <2.2×10^-16 All major subtypes

The Danish cohorts, particularly those with surgically confirmed cases, demonstrated higher effect sizes compared to the UK Biobank [6] [43]. This difference may reflect variations in case ascertainment, with surgical confirmation potentially identifying more severe cases. When analyzing specific subtypes, infiltrating endometriosis showed the strongest genetic association (OR = 1.66), followed by ovarian (OR = 1.72) and peritoneal (OR = 1.51) subtypes in the combined Danish cohorts [43]. Importantly, PRS was not associated with adenomyosis (N80.0), suggesting distinct genetic architectures between these related conditions [6] [43].

Performance Across Genetic Ancestries

PRS performance shows significant variation across the genetic ancestry continuum, an important consideration for equitable application [46]. A comprehensive evaluation of the UK Biobank PRS Release demonstrated that accuracy decreases individual-to-individual along the continuum of genetic distances from the training data, with a Pearson correlation of -0.95 between genetic distance and PRS accuracy averaged across 84 traits [46].

Table 2: PRS Performance Across Genetic Ancestries in UK Biobank Testing Subgroup

Genetic Ancestry Sample Size in Testing Subgroup Relative Performance* Key Considerations
European 97,608 Reference Best performance due to match with training population
South Asian 9,542 Moderate decrease Portability affected by genetic distance
East Asian 2,864 Significant decrease Substantial portability gap
African 9,476 Largest decrease Greatest need for diverse reference data

*Relative performance compared to European ancestry based on multiple traits [47]

This ancestry-based performance decay highlights the critical need for diverse training populations and careful interpretation of PRS across different genetic backgrounds [46] [47]. When applying PGS models trained on individuals labelled as white British in the UK Biobank to individuals with European ancestries in external cohorts, individuals in the furthest genetic distance decile have 14% lower accuracy relative to the closest decile [46].

Methodological Approaches

PRS Development and Validation Workflows

The development of polygenic risk scores for endometriosis follows a structured pipeline from genotyping to clinical application:

G cluster_GWAS Data Preparation cluster_Validation Validation Phase GWAS GWAS Meta-Analysis QC Quality Control GWAS->QC Imputation Genotype Imputation QC->Imputation PRS_Generation PRS Generation Imputation->PRS_Generation Validation Biobank Validation PRS_Generation->Validation Subtype_Analysis Subtype Analysis Validation->Subtype_Analysis Clinical_Correlation Clinical Correlation Subtype_Analysis->Clinical_Correlation

Genotyping and Quality Control Protocols

Robust quality control procedures are essential for reliable PRS calculation. The standard pipeline includes:

  • Sample Quality Control: Exclusion of samples with ≥15% missing rates, followed by exclusion of samples with ≥5% missing rates after marker QC [10]. Related samples (PI-HAT > 0.1875) are excluded, along with samples whose genotyped sex cannot be determined and those with high heterozygosity rates (exceeding three standard deviations from the mean) [10].

  • Marker Quality Control: Removal of markers with non-called alleles, missing call rates > 0.05, Hardy-Weinberg equilibrium P-value < 1×10^−5, and those showing significant differential missingness between cases and controls (P < 1×10^−5) [10]. Only autosomal SNPs are retained for analysis.

  • Population Stratification: Principal components are calculated using pruned SNP sets without linkage disequilibrium, with outliers excluded (deviation >6 times interquartile range) [10]. These components are included as covariates in association analyses to control for population stratification.

PRS Calculation Methods

Two primary approaches have been employed in endometriosis PRS studies:

  • SBayesR Method: A Bayesian method implemented in GCTB 2.02 for adjusting GWAS summary statistics effect sizes, performed with default settings while excluding the MHC region and imputing sample size [31]. This method was used in the PRS-PheWAS analysis of UK Biobank data.

  • Clumping and Thresholding: A more straightforward method implemented in PLINK software, calculating both unweighted (counting risk alleles) and weighted scores (using beta values of effect sizes) [10]. The Danish registry studies utilized a 14-SNP PRS derived from lead SNPs identified in a large endometriosis GWAS meta-analysis [6] [43].

Advanced Analytical Applications

PRS-PheWAS for Pleiotropy Mapping

A polygenic risk score phenome-wide association study (PRS-PheWAS) was conducted in the UK Biobank to investigate the pleiotropic effects of genetic liability to endometriosis [31]. This approach tested associations between the endometriosis PRS and numerous health conditions, biomarkers, and reproductive factors across females, males, and females without an endometriosis diagnosis.

The workflow for this analysis proceeded systematically:

G cluster_Subgroups Analysis Subgroups cluster_Phenotypes Phenotype Categories PRS_Data Endometriosis PRS Stratification Group Stratification PRS_Data->Stratification Phenotype_Data Phenotype Data Phenotype_Data->Stratification ICD_Codes ICD-10 Diagnoses Biomarkers Blood/Urine Biomarkers Reproductive Reproductive Factors Association_Testing Association Testing Stratification->Association_Testing Females Females Males Males MR_Analysis Mendelian Randomization Association_Testing->MR_Analysis Females_No_Dx Females without Diagnosis Stratography Stratography

Key findings from this PRS-PheWAS included:

  • Multiple health conditions, biomarkers, and reproductive factors were associated with genetic liability to endometriosis across all groups, including males and females without diagnosed endometriosis [31].

  • Differences in associated traits between males and females highlighted the importance of sex-specific pathways in the overlap of endometriosis with many other traits [31].

  • A particularly significant association was identified between genetic liability to endometriosis and lower testosterone levels, with Mendelian randomization analyses suggesting that lower testosterone may be causal for both endometriosis and clear cell ovarian cancer [31].

Molecular Subtyping and Clinical Correlations

Transcriptomic analyses have revealed distinct molecular subtypes of endometriosis with implications for treatment response:

  • Stroma-Enriched Subtype (S1): Characterized by fibroblast activation and extracellular matrix remodeling in the ectopic milieu [45].

  • Immune-Enriched Subtype (S2): Marked by upregulation of immune pathways and higher positive correlation with immunotherapy response, strongly associated with failure of/intolerance to hormone therapy [45].

These molecular subtypes demonstrate the potential for PRS to inform not just disease risk but also therapeutic strategy, particularly given the association between the S2 subtype and hormone therapy resistance [45].

Research Reagent Solutions

Table 3: Essential Research Reagents and Resources for Endometriosis PRS Studies

Resource Category Specific Examples Application in Endometriosis PRS Research
Genotyping Arrays Illumina Global Screening Array [10] Initial genotyping of samples for PRS calculation
Imputation Reference Panels TOPMed Version R2 on GRC38 [10] Genotype imputation to increase SNP coverage
Analysis Software PLINK 1.9/2.0 [31] [10], GCTB 2.02 [31], FlashPCA [10] PRS calculation, quality control, population stratification
Biobank Resources UK Biobank PRS Release [47], Danish National Patient Register [6] [43] Validation cohorts with extensive phenotype data
Laboratory Assays Proseek Multiplex Inflammation 1 kit [10], ELISA for AXIN1 [10] Analysis of inflammatory proteins associated with endometriosis
Statistical Analysis Tools R statistical environment, SPSS [10], CONSENSUSClusterPlus [45] Statistical analysis and subtype identification

The application of polygenic risk scores for endometriosis in biobank populations has substantially advanced our understanding of the genetic architecture of this complex condition. Research utilizing the UK Biobank and Danish health registries has demonstrated that PRS can effectively stratify endometriosis risk across different subtypes, with particularly strong performance for infiltrating and ovarian forms of the disease.

However, important limitations remain. The current discriminative accuracy of endometriosis PRS is not yet sufficient for standalone clinical utility [43]. Furthermore, performance varies significantly across the genetic ancestry continuum, raising equity concerns that must be addressed through more diverse training populations [46] [47]. Additionally, the association between PRS and specific clinical presentations or symptoms remains unclear, with one study finding no correlation between PRS and inflammatory proteins or TSH receptor antibodies [10].

Future research directions should focus on developing more sophisticated PRS that incorporate rare variants, epigenetic markers, and clinical risk factors [44]. Additionally, increasing diversity in genetic studies is imperative to ensure equitable benefits across all populations [46] [47]. As these tools evolve, integration of PRS with transcriptomic subtyping and clinical biomarkers promises to enable truly personalized approaches to endometriosis risk prediction, prevention, and treatment.

Machine Learning Approaches for Enhanced PRS Modeling in Endometriosis

The integration of machine learning (ML) with polygenic risk score (PRS) modeling represents a transformative frontier in endometriosis research. This technical guide details how ML methodologies are addressing the limitations of traditional PRS by enhancing predictive accuracy, elucidating subtype-specific risk architectures, and integrating multifactorial data streams. Deploying these advanced models requires meticulously curated genomic data, robust computational frameworks, and specialized analytical pipelines. The ensuing protocols and resources provide a foundational toolkit for researchers and drug development professionals aiming to translate genetic discoveries into refined stratification tools and targeted therapeutic strategies.

Current Landscape of Endometriosis PRS and Its Limitations

Endometriosis is a complex gynecological disorder with a significant genetic component, exhibiting a heritability estimated between 47% and 51% [48] [31]. Polygenic risk scores, which aggregate the effects of many genetic variants into a single measure of genetic liability, have become a standard tool for quantifying this risk. However, traditional PRS models for endometriosis face several critical challenges that limit their clinical utility and biological insight.

Table 1: Performance of Traditional Endometriosis PRS Across Cohorts

Cohort Cases/Controls Odds Ratio (OR) per SD increase in PRS p-value Key Finding
Surgically Confirmed (Danish) 249 / 348 1.59 2.57 × 10⁻⁷ Validates PRS in a clinical cohort [6]
Danish Twin Registry 140 / 316 1.50 0.0001 Confirms association in registry data [6]
UK Biobank (Replication) 2,967 / 256,222 1.28 < 2.2 × 10⁻¹⁶ Replicates in a large, independent biobank [6]
Combined Danish Cohorts 389 / 664 1.57 2.5 × 10⁻¹¹ Demonstrates consistent effect [6]

A primary limitation is the modest predictive power of existing scores. As shown in Table 1, while PRS consistently shows a significant association with endometriosis risk, the discriminative accuracy is not yet sufficient for standalone clinical diagnosis [6] [10]. Furthermore, traditional PRS often fails to capture the heterogeneity of the disease. For instance, a PRS based on 14 genome-wide significant SNPs was associated with all major subtypes of endometriosis (ovarian, infiltrating, peritoneal) but was not associated with adenomyosis, suggesting a distinct genetic etiology for this related condition [6]. This underscores the need for models that can differentiate between disease subphenotypes.

Another layer of complexity arises from the interaction between genetic risk and comorbid conditions. A recent study using the UK and Estonian Biobanks found that the comorbidity burden was positively correlated with endometriosis PRS in women without endometriosis, but negatively correlated in women with the disease [49]. This indicates a complex interplay where the clinical manifestation of genetic risk is modified by other physiological factors. ML approaches are uniquely positioned to model these non-linear interactions and integrate diverse data types, paving the way for more powerful, personalized risk assessment.

Machine Learning-Enhanced Workflows for Endometriosis PRS

Machine learning algorithms move beyond the linear assumptions of traditional PRS by identifying complex, non-additive interactions between genetic variants and integrating genetic data with clinical and molecular phenotypes. Below are detailed methodologies for key experimental approaches.

Multimodal Data Integration with Gradient Boosting

Objective: To develop a unified predictive model for endometriosis by integrating PRS with a wide array of clinical diagnoses, lifestyle factors, and female health-relevant data. Experimental Workflow:

  • Data Curation and Cohort Definition: From a resource like the UK Biobank, extract genotyping and phenotypic data for a cohort of individuals with an ICD-10 diagnosis of endometriosis (e.g., N80) and a matched control group. Critical preprocessing steps include:
    • Relatedness Removal: Retain only one individual from each kinship group to ensure sample independence.
    • Age Matching: Implement a stochastic matching protocol to align the birth year distribution of cases and controls, minimizing age-based confounding.
    • Data Field Selection: Collate over 1,000 variables encompassing ICD-10 medical history, self-reported questionnaires, lifestyle data, and female-specific health factors [50].
  • PRS Calculation and Feature Engineering:
    • Calculate the endometriosis PRS for each individual using state-of-the-art methods (e.g., SBayesR for effect size weighting) [31].
    • Handle missing data using model-specific methods (e.g., CatBoost's native handling) or imputation.
    • The PRS is then treated as one feature among many in the larger dataset.
  • Model Training and Validation:
    • Algorithm Selection: Employ gradient boosting frameworks such as CatBoost or XGBoost, which are particularly adept at handling tabular data with mixed types and missing values.
    • Training: Train the model to classify endometriosis cases from controls using the combined feature set.
    • Validation: Perform rigorous cross-validation and hold-out validation to assess model performance, reporting metrics like the Area Under the ROC Curve (ROC-AUC).
  • Model Interpretation:
    • Utilize explainable AI (XAI) tools like SHAP (SHapley Additive exPlanations) to quantify the marginal contribution of each feature (including PRS, specific ICD-10 codes like irritable bowel syndrome, and menstrual cycle length) to the final prediction [50]. This provides biological and clinical insights beyond mere prediction.

G Data Multi-modal Data Sources PRS Genotyping Data Data->PRS Clinical Clinical & Lifestyle Data Data->Clinical Model Gradient Boosting Model (e.g., CatBoost) PRS->Model Clinical->Model Output Integrated Risk Prediction Model->Output Insight SHAP Analysis for Feature Importance Model->Insight

Diagram 1: Multimodal ML workflow for endometriosis risk prediction.

PRS-PheWAS for Pleiotropy and Comorbidity Analysis

Objective: To systematically identify the pleiotropic effects of genetic liability for endometriosis on other diagnoses, biomarkers, and reproductive factors, independent of disease diagnosis. Experimental Workflow:

  • Cohort and PRS Preparation:
    • Calculate the endometriosis PRS for all individuals in a large biobank (e.g., UK Biobank), creating three distinct analysis groups: all females, males (to reveal sex-agnostic effects), and females without an endometriosis diagnosis (a sensitivity analysis) [31].
  • Phenome-Wide Association Study (PheWAS):
    • Phenotype Processing: Map ICD-10 codes to curated phecodes. Include continuous biomarkers from blood/urine assays (log-transformed and adjusted for confounders like statin use) and female reproductive factors.
    • Association Testing: For each phecode and biomarker, run a regression model with the phenotype as the dependent variable and the standardized PRS as the independent variable, adjusting for age and genetic principal components.
  • Causal Inference Follow-up:
    • For significant associations discovered in the PheWAS, employ Mendelian Randomization (MR) to investigate potential causal relationships. For example, this approach has been used to suggest a causal effect of lower testosterone levels on endometriosis risk [31].
Non-Genomic Screening to Triage for PRS Validation

Objective: To develop a rapid, cost-effective pre-screening tool using non-genomic data to identify individuals at high risk, who could then be prioritized for genetic testing or more invasive diagnostics. Experimental Workflow:

  • Sample and Data Collection:
    • Recruit symptomatic patients presenting with pelvic pain and an indication for MRI.
    • Collect non-invasive samples (e.g., urine) and acquire high-dimensional data. For example, use Attenuated Total Reflection Fourier-Transform Infrared (ATR-FTIR) spectroscopy on urine samples, generating ~1,700 spectral variables per patient that represent a complete biochemical profile [51].
  • ML Model Development for Triage:
    • Algorithm Training: Use a portion of the data (e.g., 70%) to train ML classifiers. One effective method is Linear Discriminant Analysis combined with a Genetic Algorithm and Monte Carlo resampling (MC-GA-LDA).
    • Model Tuning: Develop algorithm variants tuned for high sensitivity (e.g., 93%) to minimize false negatives, making it suitable for a screening context where the goal is to rule out the disease [51].
  • Clinical Utility Assessment:
    • Validate the model on a held-out test set (30%). The high-sensitivity model can significantly reduce unnecessary MRI referrals (e.g., by 42%) by prioritizing patients most likely to have endometriosis for further workup, which could include PRS analysis [51].

Signaling Pathways and Functional Genomics

Understanding the functional mechanisms through which endometriosis-risk variants operate is critical for refining PRS and identifying druggable pathways. Integrating GWAS findings with functional genomic data reveals the tissue-specific regulatory architecture of genetic risk.

Table 2: Tissue-Specific Regulatory Profiles of Endometriosis Risk Loci

Tissue Prominent Biological Hallmarks Key Regulator Genes
Sigmoid Colon & Ileum Immune response, epithelial signaling MICB, CLDN23
Ovary, Uterus, Vagina Hormonal response, tissue remodeling, cell adhesion GATA4
Peripheral Blood Systemic immune and inflammatory signals -

A systematic analysis of endometriosis-associated GWAS variants against the GTEx eQTL database reveals distinct tissue-specific regulatory profiles [7]. As summarized in Table 2, risk variants exert their effects through different biological processes in reproductive versus intestinal tissues. This suggests that a more powerful PRS could be constructed by prioritizing variants based on their functional activity in disease-relevant tissues.

G GWAS GWAS Hit List eQTL GTEx eQTL Analysis GWAS->eQTL Repro Reproductive Tissues (Ovary, Uterus) eQTL->Repro Intest Intestinal Tissues (Colon, Ileum) eQTL->Intest Blood Peripheral Blood eQTL->Blood Path1 Hallmark Pathways: Hormonal Response, Tissue Remodeling Repro->Path1 Path2 Hallmark Pathways: Immune Regulation, Epithelial Signaling Intest->Path2

Diagram 2: Tissue-specific functional characterization of risk variants.

The pathways identified through these integrative analyses point toward specific therapeutic opportunities. Drug-repurposing analyses based on the implicated biological systems have highlighted potential interventions currently used for breast cancer and preterm birth prevention [52]. Furthermore, the finding that genetic liability to lower testosterone may be causal for endometriosis opens up novel avenues for therapeutic targeting [31].

The Scientist's Toolkit: Research Reagent Solutions

Table 3: Essential Research Reagents and Resources for ML-PRS Studies

Resource Category Specific Tool / Assay Function in Experimental Pipeline
Genotyping & Biobanks UK Biobank, Estonian Biobank, FinnGen Provide large-scale genomic and phenotypic data for model training and validation [49] [50] [31].
Genotyping Technology Illumina Global Screening Array High-throughput genotyping of research samples to generate PRS input data [10].
Functional Genomics GTEx eQTL Database (v8) Annotates GWAS variants with tissue-specific gene regulatory information [7].
Phenotype Processing Phecode Map (v1.2) Standardizes ICD-10 codes into analyzable phenotype groups for PheWAS [31].
Proteomic Analysis Proseek Multiplex Inflammation I Kit (Olink) Quantifies 92 inflammatory serum proteins for integration with PRS models [10].
Spectroscopic Analysis ATR-FTIR Spectrometer (e.g., Bruker ALPHA II) Generates biochemical profile spectra from urine for non-genomic ML models [51].
ML & Statistical Software PLINK, CatBoost, XGBoost, SHAP, GCTB (for SBayesR) Software packages for core genetics, machine learning, and model interpretation [50] [31].

The application of machine learning to polygenic risk scoring is fundamentally advancing the research landscape for endometriosis. By moving beyond simple, additive genetic models, ML enables the development of integrated, multifactorial risk stratification systems that account for disease heterogeneity, complex comorbidities, and tissue-specific biology. The experimental frameworks and tools detailed in this guide provide a roadmap for building more predictive and biologically interpretable models. The continued growth of large, diverse biobanks, coupled with advances in functional genomics and explainable AI, will be crucial for translating these sophisticated models into tangible benefits for patient stratification and the development of novel therapeutics.

Limitations and Enhancement Strategies for Subtype-Specific Risk Prediction

Endometriosis is a complex, chronic inflammatory condition affecting approximately 10% of women of reproductive age, characterized by the presence of endometrial-like tissue outside the uterine cavity [6] [10]. Despite significant advances in understanding its genetic architecture, the clinical translation of polygenic risk scores (PRS) remains challenged by the disease's substantial heterogeneity. Current PRS models, derived from genome-wide association studies (GWAS), demonstrate promising but incomplete discriminative ability across diverse clinical presentations [6] [10]. This technical analysis examines the performance gaps of endometriosis PRS across different subphenotypes, exploring the molecular foundations of this heterogeneity and proposing methodological frameworks for enhanced risk stratification.

The fundamental limitation of current PRS approaches lies in their predominantly generalized construction, which often fails to capture the spectrum of molecular mechanisms driving distinct clinical manifestations. While GWAS have identified numerous susceptibility loci, these variants are primarily non-coding and likely exert tissue-specific regulatory effects that remain poorly characterized in the context of different endometriosis subphenotypes [7]. This gap is particularly problematic for drug development, where targeted therapeutic strategies require precise patient stratification based on underlying disease drivers rather than blanket genetic risk assessment.

Current Performance Data of Endometriosis PRS

Discriminative Ability Across Populations

Table 1: Performance of a 14-SNP PRS for Endometriosis Across Cohorts

Cohort Cases/Controls Odds Ratio (per SD) P-value Subtype Analysis
Danish Surgical Cohort 249/348 1.59 2.57×10^-7 Surgically confirmed (ASRM II-IV)
Danish Twin Registry 140/316 1.50 0.0001 ICD-10 coded cases
Combined Danish Cohorts 389/664 1.57 2.5×10^-11 All major subtypes
- Ovarian (N80.1) - 1.72 6.7×10^-5 Specific subtype
- Infiltrating (N80.4-N80.5) - 1.66 2.7×10^-9 Specific subtype
- Peritoneal (N80.2-N80.3) - 1.51 2.6×10^-3 Specific subtype
UK Biobank Replication 2,967/256,222 1.28 <2.2×10^-16 Large-scale validation

Data adapted from [6]

As demonstrated in Table 1, while PRS consistently shows association with endometriosis risk across diverse cohorts, the effect sizes vary considerably. The reduction in odds ratio observed in the larger UK Biobank cohort (OR=1.28) compared to the smaller Danish surgical cohort (OR=1.59) suggests potential spectrum bias or differences in case ascertainment methods [6]. Importantly, the PRS showed no significant association with adenomyosis (N80.0), indicating some specificity to endometriosis pathogenesis mechanisms [6] [39].

Limitations in Clinical Presentation Prediction

Table 2: PRS Associations with Clinical Presentation Features

Clinical Feature Association Direction Statistical Significance Cohort Details
Disease Spread Inverse Lost significance (p-trend) 172 patients, surgical confirmation [10]
Gastrointestinal Involvement Inverse Lost significance (p-trend) 172 patients, surgical confirmation [10]
Hormone Treatment Response Inverse Lost significance (p-trend) 172 patients, surgical confirmation [10]
Inflammatory Proteins (AXIN1, ST1A1, CXCL9) No correlation Non-significant Multiplex immunoassay [10]
TSH Receptor Antibodies (TRAb) No correlation Non-significant Electro Chemi Luminescence Immunoassay [10]

Data adapted from [10]

A critical limitation emerges when examining specific clinical presentations. As summarized in Table 2, a dedicated study of 172 surgically confirmed endometriosis patients found that PRS showed inverse associations with disease spread, gastrointestinal involvement, and hormone treatment that failed to maintain statistical significance when calculated as p for trend [10]. This indicates that current PRS models lack the sensitivity to predict disease severity or specific phenotypic manifestations, severely limiting their clinical utility for personalized treatment approaches.

Molecular Subtypes: Explaining PRS Limitations

Transcriptomically Defined Endometriosis Subtypes

Recent transcriptomic profiling has revealed fundamental molecular heterogeneity in endometriosis that likely explains the limitations of current PRS approaches. Unsupervised clustering of 198 ectopic endometriosis lesions identified two distinct subtypes:

  • Stroma-Enriched Subtype (S1): Characterized by fibroblast activation and extracellular matrix remodeling pathways [53]
  • Immune-Enriched Subtype (S2): Dominated by immune pathway upregulation and stronger correlation with immunotherapy response [53]

These subtypes demonstrate significant clinical relevance, with the S2 subtype strongly associated with failure of or intolerance to hormone therapy [53]. This stratification provides a biological basis for the observed poor correlation between PRS and treatment response noted in Table 2.

Tissue-Specific Regulatory Mechanisms

The functional characterization of endometriosis-associated genetic variants reveals additional complexity. An analysis of 465 genome-wide significant variants found that they exhibit tissue-specific regulatory effects as expression quantitative trait loci (eQTLs) [7]:

  • Reproductive tissues (uterus, ovary, vagina): Enrichment of genes involved in hormonal response, tissue remodeling, and adhesion
  • Intestinal tissues (sigmoid colon, ileum) and peripheral blood: Predominance of immune and epithelial signaling genes

This tissue-specific regulatory pattern suggests that current PRS models, which typically aggregate genetic effects across tissues, may obscure important subtype-specific risk mechanisms.

G GWAS GWAS-Identified Risk Variants TissueSpecific Tissue-Specific eQTL Effects GWAS->TissueSpecific Subtype1 Stroma-Enriched Subtype (S1) TissueSpecific->Subtype1 Subtype2 Immune-Enriched Subtype (S2) TissueSpecific->Subtype2 Clinical1 ECM Remodeling Fibroblast Activation Subtype1->Clinical1 Clinical2 Immune Activation Hormone Therapy Resistance Subtype2->Clinical2

Molecular Heterogeneity in Endometriosis Pathogenesis

Experimental Protocols for Subtype-Specific PRS Development

Transcriptomic Subtyping Methodology

Protocol: Identification of Molecular Subtypes via Unsupervised Clustering

  • Data Acquisition and Preprocessing

    • Obtain endometriosis lesion transcriptomic data from GEO (e.g., GSE141549 with 198 samples)
    • Log2-transform expression values and remove batch effects using ComBat function from SVA package
    • Validate batch effect removal via principal component analysis (PCA)
  • Consensus Clustering

    • Apply ConsensusClusterPlus package with parameters: maxK=10, reps=10,000, pItem=0.8, pFeature=1
    • Use K-means cluster algorithm with Euclidean distance
    • Determine optimal cluster number (k=2) based on consensus matrix and cluster consensus score
  • Biological Characterization

    • Perform weighted gene co-expression network analysis (WGCNA) to identify subtype-specific modules
    • Estimate cell type composition using xCell analysis and CIBERSORT
    • Conduct functional enrichment (GO, KEGG) using clusterProfiler
    • Predict immunotherapy response using EaSIeR package [53]

PRS Construction and Validation Workflow

Protocol: Development of Subtype-Informed PRS

  • Variant Selection and Functional Annotation

    • Curate endometriosis-associated variants from GWAS Catalog (EFO_0001065, p<5×10^-8)
    • Annotate variants using Ensembl Variant Effect Predictor (VEP)
    • Cross-reference with GTEx v8 database for tissue-specific eQTL effects (FDR<0.05)
  • Subtype-Stratified PRS Calculation

    • Perform GWAS meta-analysis using METAL with genomic control
    • Adjust effect sizes using Bayesian methods (SBayesR in GCTB 2.02)
    • Calculate PRS using plink1.9 score function for overall and subtype-specific models
  • Validation in Phenotyped Cohorts

    • Test association with endometriosis subtypes defined by ICD-10 codes (Table 1)
    • Assess correlation with clinical presentation features (Table 2)
    • Validate in independent cohorts (e.g., UK Biobank, Danish registries) [6] [2]

G Start Patient Recruitment & Sample Collection A Surgical Confirmation & Phenotypic Data Collection Start->A B Genotyping & Variant Annotation A->B C Transcriptomic Profiling & Molecular Subtyping A->C D Subtype-Specific Variant Selection B->D C->D E PRS Calculation & Weight Optimization D->E F Validation in Independent Cohorts E->F

Subtype-Informed PRS Development Workflow

The Scientist's Toolkit: Research Reagent Solutions

Table 3: Essential Research Reagents for Endometriosis Subtype Studies

Reagent/Technology Application Key Features Representative Use
Olink Target 96 (Inflammation) Multiplex protein quantification 92 inflammatory proteins, proximity extension assay Serum protein analysis in PRS-clinical correlation studies [10] [54]
Illumina Global Screening Array Genotyping High-throughput SNP array PRS calculation in cohort studies [10]
Proseek Multiplex Inflammation 1 kit Inflammation biomarker analysis 92 protein panels, normalized protein expression (NPX) Correlation of inflammatory proteins with clinical symptoms [10]
TOPMed Imputation Server Genotype imputation Reference panel: TOPMed Version R2 on GRC38 Imputation of missing genotypes for PRS calculation [10]
GTEx v8 Database Tissue-specific eQTL analysis Normalized effect sizes (slope) across 54 tissues Functional annotation of endometriosis risk variants [7]
xCell & CIBERSORT Cell type decomposition Tissue infiltration scores from transcriptomic data Immune-stromal characterization of endometriosis subtypes [53]

The current performance gaps in endometriosis PRS stem fundamentally from the molecular heterogeneity of the disease and the limitations of one-size-fits-all genetic risk models. The identification of distinct molecular subtypes (stroma-enriched and immune-enriched) with differential treatment responses provides both an explanation for these limitations and a pathway forward [53]. Future PRS development must incorporate tissue-specific regulatory information [7] and stratify by molecular subtypes to achieve the precision required for meaningful clinical application, particularly in drug development contexts where targeting specific pathogenic mechanisms is paramount. The integration of transcriptomic subtyping with genetic risk assessment represents the most promising approach for developing predictive models that can genuinely inform personalized therapeutic strategies for endometriosis patients.

Inverse Associations with Disease Spread and Gastrointestinal Involvement

Emerging research reveals a paradoxical relationship in endometriosis wherein a higher genetic predisposition, quantified by polygenic risk scores (PRS), is inversely associated with the spread of the disease and its involvement of the gastrointestinal (GI) tract. This whitepaper synthesizes evidence from recent clinical and genetic studies, detailing the quantitative data, experimental methodologies, and putative biological mechanisms underlying this counterintuitive phenomenon. Framed within a broader thesis on PRS performance across endometriosis subphenotypes, this review provides researchers and drug development professionals with a technical guide to the current state of the art, highlighting the potential for genetic profiling to refine patient stratification and uncover novel pathophysiology.

Endometriosis is a common, estrogen-dependent chronic inflammatory gynecological disorder, characterized by the presence of endometrial-like tissue outside the uterine cavity [55] [3]. It affects approximately 5–15% of women of reproductive age, with a heritability estimated at 47–51% [2]. The clinical presentation is profoundly heterogeneous, ranging from superficial peritoneal lesions to deep infiltrating disease that can involve the ovaries, pelvic peritoneum, and gastrointestinal tract [56].

The development of polygenic risk scores (PRS)—a weighted sum of an individual's risk alleles derived from genome-wide association studies (GWAS)—has provided a powerful tool to quantify genetic susceptibility to complex diseases like endometriosis [6]. A compelling and counterintuitive finding is emerging from PRS research: a higher genetic load for endometriosis is associated with less severe disease manifestations in specific anatomical contexts, particularly concerning disease spread and GI tract involvement [57]. This inverse association challenges simple linear models of genetic risk and suggests the existence of distinct genetic architectures underlying different disease subphenotypes. Understanding this relationship is critical for refining predictive models and developing targeted therapies.

Quantitative Evidence of Inverse Associations

Key studies have systematically investigated the association between PRS and specific clinical presentations of endometriosis, quantifying the relationship with disease spread and GI involvement.

Table 1: Summary of Key Studies on Inverse Associations with Endometriosis Subphenotypes

Study Cohort Sample Size (Cases) PRS Construction Association with Disease Spread Association with GI Involvement Key Findings
Clinical Cohort (2022) [57] 172 Based on previous GWAS Inverse association identified with the spread of endometriosis. Inverse association identified with involvement of the GI tract. Significance was lost when calculated as p for trend; specificity and sensitivity were low.
Danish & UK Biobank (2021) [6] 249 (Surgically confirmed) 14 SNPs from a large meta-GWAS PRS was associated with all major subtypes (Ovarian, Infiltrating, Peritoneal). Not explicitly studied for GI tract. PRS was not associated with adenomyosis, suggesting different genetic drivers.
UK Biobank PRS-PheWAS (2023) [2] 2,967 cases (UK Biobank) Bayesian method (SBayesR) on meta-analysis Pleiotropic effects were found irrespective of diagnosis. Not explicitly studied. Genetic liability to endometriosis was causally associated with lower testosterone levels.

The most direct evidence comes from a 2022 study that explicitly tested the association between a PRS and clinical presentation in 172 endometriosis patients [57]. The study reported inverse associations between the PRS and both the spread of endometriosis and the specific involvement of the gastrointestinal tract. However, the authors noted that the statistical significance for these associations was lost when a p for trend was calculated, and the overall specificity and sensitivity of the PRS for predicting these subphenotypes were low [57]. This indicates that while the inverse relationship is observable, the current PRS models lack the discriminatory power for standalone clinical prediction of subphenotypes.

In a larger, multi-cohort study, a 14-SNP PRS was significantly associated with all major subtypes of endometriosis, including ovarian (OR = 1.72), infiltrating (OR = 1.66), and peritoneal (OR = 1.51) [6]. This suggests the PRS captures a general genetic risk for developing endometriosis across anatomical locations, rather than a risk skewed towards a specific subtype. The critical finding that this PRS was not associated with adenomyosis reinforces the notion that distinct pathological entities within the spectrum of endometriosis-related diseases have unique genetic drivers [6].

Detailed Experimental Protocols for Key Studies

To enable replication and critical evaluation, this section details the experimental methodologies from the pivotal studies cited.

1. Patient Cohort Identification and Phenotyping:

  • Patient Selection: Women with endometriosis (N=172) were identified at the Department of Gynecology. Diagnosis was confirmed through established clinical methods.
  • Phenotype Data Collection: All participants completed comprehensive questionnaires covering sociodemographic factors, lifestyle habits, and medical history.
  • Symptom Registration: Bowel symptoms were quantitatively registered using the Visual Analog Scale for Irritable Bowel Syndrome (VAS-IBS).
  • Biomarker Analysis: Blood samples were collected from all participants. DNA was extracted for genotyping. Inflammatory proteins and TSH receptor antibodies (TRAb) in serum were analyzed.

2. Genotyping and PRS Calculation:

  • DNA Extraction and Genotyping: DNA was extracted from blood samples using standard protocols. Samples were genotyped on an appropriate genotyping array.
  • PRS Derivation: The polygenic risk score was calculated for each individual based on the summary statistics from previous genome-wide association studies (GWAS) of endometriosis. The score represents the weighted sum of risk alleles an individual carries.

3. Statistical Analysis:

  • Association Testing: The association between the calculated PRS and clinical variables (spread of disease, GI tract involvement, hormone treatment) was tested using appropriate statistical models, such as logistic or linear regression.
  • Sensitivity/Specificity Assessment: The predictive performance of the PRS for clinical presentation was evaluated by calculating the sensitivity and specificity.

1. Cohort Descriptions and Definition of Cases/Controls:

  • Clinical Cohort (Surgically Confirmed): 249 surgically confirmed endometriosis cases were enrolled from a Danish hospital. All cases were confirmed by laparoscopy and histology, with ASRM stages II–IV. Controls were 348 age-matched female blood donors without an endometriosis diagnosis.
  • Danish Twin Registry (DTR) Biobank Cohort: 140 unrelated women with an endometriosis diagnosis (ICD-10 codes N80.1–N80.9) from the national patient registry. Controls were 316 age-matched women without an N80 diagnosis.
  • UK Biobank Replication Cohort: 2,967 cases with endometriosis (ICD-10 codes N80.1–N80.9) and 256,222 controls (no N80 diagnosis or self-reported endometriosis) were identified from the UK Biobank.

2. Assay Design and Genotyping:

  • SNP Selection: The 14 genome-wide significant lead SNPs from a large endometriosis GWAS meta-analysis (17,045 cases; 191,596 controls) were selected for the PRS. One SNP failed assay design and was replaced by a region-wide associated proxy SNP (rs77294520 in the GREB1 locus).
  • Genotyping: Genotyping in the Danish cohorts was performed using platform-specific methods. For the UK Biobank, the 14 SNPs were identified from the imputed genetic data ("best guess" genotypes).

3. PRS Calculation and Statistical Analysis:

  • Score Calculation: The PRS for each individual was calculated as the sum of risk alleles weighted by their effect sizes (log(odds ratio)) from the original GWAS.
  • Association Testing: The association between the PRS and endometriosis (and its subtypes) was tested using logistic regression, adjusting for relevant covariates like principal components of genetic ancestry. Odds ratios (OR) were calculated per standard deviation increase in the PRS.

Proposed Biological Mechanisms Underlying Inverse Associations

The observed inverse associations suggest that a higher genetic load for endometriosis might trigger compensatory biological pathways or that distinct genetic variants are linked to localized versus widespread disease. Two primary, non-mutually exclusive mechanisms are supported by the literature.

The Hormonal Pathway: Testosterone as a Mediator

A landmark PRS-phenome-wide association study (PheWAS) revealed that genetic liability to endometriosis is associated with lower levels of testosterone [2]. Follow-up Mendelian randomization analyses suggested that lower testosterone may have a causal effect on increasing endometriosis risk.

HormonalPathway PRS PRS Testosterone Testosterone PRS->Testosterone Inversely associated with lower levels EndometriosisRisk EndometriosisRisk Testosterone->EndometriosisRisk Causal effect of low level DiseaseSpread DiseaseSpread Testosterone->DiseaseSpread Potential protective effect of higher levels LesionProliferation LesionProliferation EndometriosisRisk->LesionProliferation LesionProliferation->DiseaseSpread

Figure 1: Hormonal Pathway Linking PRS and Disease Spread

This pathway posits that a specific genetic profile (high PRS) predisposes individuals to lower circulating testosterone. Since testosterone may have a protective effect against the establishment and growth of ectopic lesions, individuals with a high PRS (and thus lower testosterone) might be more likely to develop initial endometriosis. However, this same hormonal milieu could be less conducive to the specific processes required for deep infiltration and GI tract involvement, potentially through modulation of immune cell function or fibrosis, leading to the observed inverse association with severe spread [2] [3].

The Gut-Microbiota-Immune Axis

The gut microbiota plays a significant role in regulating systemic inflammation and estrogen metabolism (the "estrobolome") [58] [55] [59]. Dysbiosis, characterized by a shift in microbial communities, is frequently reported in endometriosis patients.

MicrobiomeAxis PRS PRS GutDysbiosis GutDysbiosis PRS->GutDysbiosis Potential link via immune function LPS LPS GutDysbiosis->LPS ↑ Gram-negative bacteria EstrogenLevels EstrogenLevels GutDysbiosis->EstrogenLevels ↑ β-glucuronidase activity (Estrobolome) TLR4 TLR4 LPS->TLR4 NFkB NFkB TLR4->NFkB SystemicInflammation SystemicInflammation NFkB->SystemicInflammation LesionGrowth LesionGrowth SystemicInflammation->LesionGrowth EstrogenLevels->LesionGrowth GIDiseaseInvolvement GIDiseaseInvolvement LesionGrowth->GIDiseaseInvolvement

Figure 2: Gut-Microbiota-Immune Axis in Endometriosis

A high-PRS genetic background might be linked to a gut microbiome configuration that, while permissive for the initial establishment of endometriosis (via elevated systemic inflammation and estrogen levels), simultaneously creates an environment that is resistant to the deep infiltration of the intestinal wall. For instance, specific microbial communities could influence the local immune landscape in the peritoneal cavity or the integrity of the gastrointestinal mucosal barrier, thereby limiting the ability of lesions to penetrate the GI tract [58] [55] [59]. This would manifest as an inverse association between PRS and GI involvement.

The Scientist's Toolkit: Research Reagent Solutions

Table 2: Essential Research Reagents for Endometriosis PRS and Mechanistic Studies

Reagent / Material Function / Application Example / Note
Genotyping Arrays Genome-wide genotyping of DNA samples to determine individual genotypes for PRS calculation. Illumina Global Screening Array, Infinium arrays.
GWAS Summary Statistics Source of SNP effect sizes (odds ratios) and p-values used as weights for PRS calculation. Data from large consortia like Sapkota et al. (2017) or FinnGen [6] [2].
PRS Calculation Software Software tools to compute polygenic risk scores for individuals in a cohort. PLINK1.9 --score function, PRSice, LDPred [6] [2].
ELISA Kits / Multiplex Immunoassays Quantification of protein biomarkers in serum/plasma (e.g., inflammatory cytokines, hormones). For measuring IL-1β, IL-18, TGF-β, TNF-α, Testosterone, Estradiol.
16S rRNA Sequencing Reagents Profiling the composition of the gut microbiota to identify dysbiosis. Kits for amplification and sequencing of the 16S rRNA gene (e.g., targeting V4 region).
TLR4/NF-κB Pathway Inhibitors/Agonists Mechanistic studies to validate the role of microbial components in inflammation. Lipopolysaccharides (LPS) as TLR4 agonists; TAK-242 as a TLR4 signaling inhibitor.
Cell Culture Models In vitro studies of endometriotic epithelial and stromal cell behavior. Immortalized human endometriotic stromal cells (e.g., 12Z cell line).

The inverse association between a high polygenic risk score for endometriosis and the spread of disease or gastrointestinal involvement presents a fascinating paradox that underscores the complexity of the disease's genetic architecture. Current evidence, while suggestive, indicates that standalone PRS models currently lack the sensitivity for clinical subphenotype prediction. The integration of PRS with other data layers, such as hormone levels (e.g., testosterone), gut microbiome profiles, and inflammatory biomarkers, is a promising avenue for building more powerful predictive models.

For drug development, these findings highlight the need for therapies that target specific biological pathways (e.g., testosterone-mediated effects or TLR4/NF-κB signaling) which may be more relevant for patients with certain genetic backgrounds and disease manifestations. Future research must prioritize large-scale, deeply phenotyped cohorts with genomic, microbiomic, and hormonal data to disentangle these complex relationships and fully realize the potential of polygenic risk scoring in endometriosis patient stratification and personalized treatment.

Endometriosis, a complex gynecological disorder affecting approximately 10% of reproductive-aged women, demonstrates a substantial heritable component, with genetic factors accounting for an estimated 50% of disease susceptibility [5] [60]. While polygenic risk scores (PRS) have emerged as valuable tools for aggregating the effects of numerous genetic variants, their predictive power remains limited for clinical implementation, with studies reporting area under the curve (AUC) values typically ranging from 0.546 to 0.636 [6] [61]. This limitation stems from the modest effect sizes of individual risk variants and the inability of PRS to capture the significant environmental contributions to disease pathogenesis [6] [61].

The integration of epigenetic data, particularly DNA methylation, offers a promising approach to enhance risk prediction models. DNA methylation represents a dynamic interface between genetic predisposition and environmental exposures, potentially capturing both inherited and acquired risk factors [5] [62]. Recent evidence indicates that DNA methylation profiles in endometrial tissue can capture approximately 15.4-24.2% of the variance in endometriosis status, with a significant portion (12-16.1%) remaining after accounting for genetic variation [5] [36]. This independent contribution highlights the potential of methylation risk scores (MRS) to complement traditional PRS and improve risk stratification across diverse endometriosis subphenotypes.

Quantitative Foundations: MRS and PRS Performance Metrics

Table 1: Performance Comparison of Risk Prediction Models in Endometriosis

Model Type Key Components AUC/Performance Variance Explained Sample Size Reference
PRS (14-SNP) 14 genetic variants from GWAS OR = 1.57-1.59 per SD ~26.2% (SNP heritability) 249 cases, 348 controls [6]
MRS 746 DNAm sites AUC = 0.6748 12-16.1% (independent of genetics) 908 samples [5]
Combined PRS+MRS Genetic + epigenetic markers Consistently higher than PRS alone 37% combined (20.9% genetics + 16.1% DNAm) 984 participants [5] [36]
Multi-PRS Model 40 PRSs across multiple traits AUC = 0.636 N/R 1,996 women [61]
Phenotype-Only Questionnaire CA125, fatigue, gynecological symptoms AUC = 0.904 N/R 506 participants [61]

Table 2: DNA Methylation Variance Components in Endometrial Tissue

Variance Component Proportion Explained Biological Interpretation Study Details
Total DNAm Variance 24.2% Combined genetic and environmental influences Analysis of 759,345 DNAm sites in 984 samples [36]
DNAm Variance (independent of genetics) 16.1% Pure epigenetic contribution after controlling for SNPs OREML models including GRM and ORM [5] [36]
Genetic Variance 20.9% Common SNP-based heritability Simultaneous modeling with DNAm [36]
Combined Genetic + Epigenetic 37% Total variance captured by integrated model [36]
Menstrual Cycle Phase 4.30% Hormonal influence on methylation patterns After SVA correction [36]

Methodological Framework: MRS Development and Integration

Sample Processing and Quality Control

The development of robust MRS models requires stringent quality control protocols across multiple processing stages. For endometrial tissue studies, the initial sample collection phase should incorporate standardized surgical techniques and precise menstrual cycle dating through histological assessment according to Noyes' criteria [36]. Following tissue acquisition, DNA extraction should be performed using standardized kits such as the DNeasy Blood & Tissue Kit, with DNA quality verification through spectrophotometry or fluorometry [63].

For methylation analysis, the Illumina Infinium MethylationEPIC BeadChip platform provides comprehensive genome-wide coverage of over 850,000 CpG sites [36]. Quality control should include:

  • Probe filtering (removal of probes with detection p-value > 0.01)
  • Sample exclusion based on low signal intensity or mismatch between genetic and methylation sex calls
  • Normalization using standardized methods such as functional normalization or dasen [5] [36]

Batch effects from technical variables (array processing date, position) and biological covariates (age, institution) must be addressed through surrogate variable analysis (SVA), which has been shown to effectively reduce false positives while preserving biological signals [5] [36].

MRS Construction and Statistical Analysis

The construction of MRS follows a multi-step analytical pipeline with specific considerations for endometriosis applications:

  • Differential Methylation Analysis: Identify significantly associated CpG sites using linear models adjusted for key covariates including age, menstrual cycle phase, genetic ancestry, and technical batch effects. The model typically takes the form:

    M-value ~ Endometriosis_status + Age + Cycle_phase + Genetic_PCs + SV1...SVk

    where M-values represent logit-transformed beta values for improved statistical properties [36].

  • Feature Selection: Apply genome-wide significance thresholds (Bonferroni-corrected p < 6.58×10^-8 for EPIC array) to identify robustly associated CpG sites. In endometriosis, studies have identified significant signals in genes including ELAVL4 and TNPO2 in advanced stage disease [36].

  • Weighted Score Calculation: Generate MRS using effect size-weighted sums of methylation values:

    MRS = Σ(β_i × DNAm_i)

    where β_i represents the effect size estimate for each CpG site i from the discovery analysis [5].

  • Model Validation: Implement rigorous train-test validation splits, ideally separating samples by recruitment institution to ensure independence. Performance evaluation should include AUC calculations, sensitivity analyses across disease stages, and assessment of subtype-specific predictive ability [5].

MRS_Workflow Start Sample Collection (Endometrial Tissue) QC1 DNA Extraction & Quality Control Start->QC1 QC2 Methylation Array Processing & QC QC1->QC2 Analysis Differential Methylation Analysis QC2->Analysis Feature Feature Selection (p-value & effect size) Analysis->Feature Scoring MRS Calculation (Weighted Sum) Feature->Scoring Validation Model Validation (Train-Test Split) Scoring->Validation Integration PRS+MRS Integration Validation->Integration

MRS Development and Validation Workflow

Multi-Omic Integration with PRS

The integration of MRS with PRS requires careful consideration of genetic and epigenetic relationships. Critical steps include:

  • mQTL Analysis: Identify methylation quantitative trait loci (mQTLs) where genetic variants influence DNA methylation levels. Recent large-scale studies have detected 118,185 independent cis-mQTLs in endometrial tissue, including 51 associated with endometriosis risk [36]. These represent prime candidates for integrated risk modeling.

  • Variance Partitioning: Use omics residual maximum likelihood (OREML) analyses to quantify the proportion of disease variance captured by genetic (GRM) and methylation (ORM) relationship matrices [5] [36]. This approach demonstrated that combining both matrices captured 37% of endometriosis variance, significantly exceeding either component alone.

  • Clinical Subphenotype Stratification: Evaluate model performance across endometriosis subtypes, including rASRM stages (I-IV), lesion characteristics (ovarian, peritoneal, deeply infiltrating), and infertility associations. Current evidence indicates stronger epigenetic effects in advanced stage (III/IV) disease [36].

Pathway Integration: Biological Mechanisms and Therapeutic Implications

Functional Genomics of Endometriosis-Associated Methylation

DNA methylation alterations in endometriosis converge on several key pathological pathways that may inform both risk prediction and therapeutic targeting:

  • Hormonal Response Pathways: Key genes including ESR1 (estrogen receptor), PGR (progesterone receptor), and HOXA10 exhibit disease-specific methylation patterns associated with progesterone resistance and estrogen dominance [64] [65] [63]. Hyperestrogenism resulting from CYP19/aromatase hypomethylation creates a permissive environment for ectopic lesion growth [62] [63].

  • Immune-Inflammatory Regulation: Methylation changes in genes encoding inflammatory mediators (COX-2, IL-12B, TNF-α) contribute to the characteristic inflammatory microenvironment of endometriosis [64] [62]. Genome-wide analyses identify enrichment in HTLV infection, PI3K-Akt, and oxytocin signaling pathways [63].

  • Tissue Remodeling and Cell Adhesion: Aberrant methylation in extracellular matrix (ECM) interaction pathways, including adherens junctions, focal adhesion, and regulation of actin cytoskeleton, facilitates ectopic implantation and survival [36] [60].

  • Oxidative Stress Response: The interplay between oxidative stress and epigenetic modifications creates a feed-forward loop that promotes disease progression, with oxidative stress both influencing and being influenced by DNA methylation patterns [62].

EndometriosisPathways MRS Methylation Risk Score Hormonal Hormonal Dysregulation Pathway MRS->Hormonal HOXA10, ESR1, PGR Immune Immune-Inflammatory Pathway MRS->Immune COX-2, IL-12B Tissue Tissue Remodeling Pathway MRS->Tissue ECM Genes Oxidative Oxidative Stress Response MRS->Oxidative SF-1, GATA6 Subphenotype Endometriosis Subphenotypes (Stage, Location, Infertility) Hormonal->Subphenotype Immune->Subphenotype Tissue->Subphenotype Oxidative->Subphenotype

MRS-Informed Endometriosis Pathogenesis Pathways

The Scientist's Toolkit: Essential Research Reagents and Platforms

Table 3: Essential Research Resources for Endometriosis Epigenetic Studies

Category Specific Product/Platform Key Applications Performance Considerations
DNA Methylation Profiling Illumina Infinium MethylationEPIC BeadChip Genome-wide CpG methylation analysis (850,000+ sites) Coverage includes enhancers, gene bodies, promoters; suitable for formalin-fixed samples [36]
Targeted Methylation Analysis Zymo Research EZ DNA Methylation Kit Bisulfite conversion for targeted sequencing High conversion efficiency (>99%); compatible with multiple sample types [63]
DNA Extraction DNeasy Blood & Tissue Kit (Qiagen) High-quality DNA from endometrial tissues Effective for difficult tissues; minimal contaminant carryover [63]
Bioinformatic Analysis R packages: minfi, sva, DMRcate Preprocessing, batch correction, DMR identification Integration with Bioconductor; comprehensive QC metrics [5] [36]
Multi-omic Integration OREML, MOA, METASOFT Variance partitioning, cross-omics analysis Accounts for relatedness; handles mixed models [5] [36]
Validation Platforms Pyrosequencing, bisulfite sequencing Targeted validation of significant CpG sites Quantitative results; high sensitivity and reproducibility [60]

The integration of methylation risk scores with traditional polygenic risk scores represents a paradigm shift in endometriosis risk prediction, moving beyond static genetic assessment to incorporate dynamic molecular measures that reflect both genetic predisposition and environmental influences. Current evidence demonstrates that MRS captures significant disease variance independent of PRS, with combined models explaining approximately 37% of endometriosis risk [5] [36].

Future research directions should prioritize several key areas:

  • Multi-ancestry validation of existing MRS models to ensure equitable application across diverse populations
  • Longitudinal studies examining methylation dynamics across disease progression and treatment response
  • Single-cell epigenomic profiling to resolve cellular heterogeneity within endometrial tissues
  • Integration of additional omics layers including histone modifications, non-coding RNAs, and proteomic data
  • Development of non-invasive biomarkers based on methylation signatures in blood or uterine fluids

For drug development applications, MRS may facilitate patient stratification for clinical trials, particularly for therapies targeting specific molecular subtypes. The identified methylation signatures highlight potential therapeutic targets, including chromatin-modifying enzymes and methylation-sensitive signaling pathways [64] [62]. As epigenetic therapies advance, MRS could guide personalized treatment approaches based on individual methylation profiles, ultimately improving outcomes for women across the endometriosis disease spectrum.

Combining PRS with Inflammatory Biomarkers and Hormonal Profiles

Endometriosis is a complex, chronic inflammatory gynecological disease characterized by the presence of endometrial-like tissue outside the uterine cavity, affecting approximately 10% of women of reproductive age [66] [3]. The diagnostic journey for endometriosis remains challenging, with an average delay of 7-11 years from symptom onset to surgical confirmation, underscoring the critical need for non-invasive diagnostic strategies [31] [3]. This whitepaper explores the integration of polygenic risk scores (PRS) with inflammatory biomarkers and hormonal profiles to enhance risk prediction, disease stratification, and understanding of endometriosis subphenotypes.

The heterogeneous nature of endometriosis manifests in varied clinical presentations, treatment responses, and molecular profiles. Current classification systems based solely on surgical findings fail to predict therapeutic outcomes or correlate well with symptom severity [66] [45]. Emerging evidence suggests that molecular subtyping may provide superior stratification, with recent transcriptomic analyses identifying distinct stroma-enriched and immune-enriched subtypes that demonstrate varying responses to hormone therapy [45]. Within this context, the integration of PRS with inflammatory and hormonal biomarkers offers a promising multidimensional approach to deciphering endometriosis heterogeneity and advancing personalized medicine approaches.

Theoretical Foundations of Individual Modalities

Polygenic Risk Scores in Endometriosis

Polygenic risk scores aggregate the effects of numerous genetic variants to quantify an individual's inherited susceptibility to endometriosis. The heritability of endometriosis is estimated at 47-51%, with genome-wide association studies (GWAS) identifying multiple risk loci including genes involved in the development and regulation of the female reproductive tract [31] [10]. PRS derived from these studies has demonstrated consistent predictive value across diverse populations.

Table 1: Performance Characteristics of Endometriosis PRS Across Cohorts

Cohort Cases/Controls Odds Ratio per SD P-value Subtypes Assessed
Danish Surgical Cohort 249/348 1.59 2.57×10⁻⁷ Ovarian, Infiltrating, Peritoneal
Danish Twin Registry 140/316 1.50 0.0001 Population-based
Combined Danish Cohorts 389/664 1.57 2.5×10⁻¹¹ All major subtypes
UK Biobank 2,967/256,222 1.28 <2.2×10⁻¹⁶ Large-scale validation

PRS demonstrates differential performance across endometriosis subtypes, with the strongest association observed for ovarian endometriosis (OR=1.72) followed by infiltrating (OR=1.66) and peritoneal (OR=1.51) subtypes [39] [67]. Notably, PRS shows no significant association with adenomyosis, suggesting distinct genetic architectures between these related conditions [39]. This specificity underscores the value of PRS in elucidating biological mechanisms underlying different endometriosis subphenotypes.

Inflammatory Biomarkers

Endometriosis is characterized by a chronic inflammatory state that drives disease progression and symptom manifestation. The inflammatory microenvironment involves complex interactions between immune cells, cytokines, chemokines, and growth factors.

Table 2: Key Inflammatory Biomarkers in Endometriosis

Biomarker Category Specific Analytes Alteration in Endometriosis Functional Role
Macrophage Factors MIF, MCP-1 Increased in peritoneal fluid Recruitment of macrophages, promotion of angiogenesis and cell survival
Cytokines IL-1, TNF-α Elevated in ectopic lesions Pro-inflammatory signaling, pain mediation
Chemokines CXCL9 Altered expression Immune cell recruitment and activation
Growth Factors VEGF, FGF Increased in ectopic sites Angiogenesis, lesion establishment
Nuclear Factors NF-κB Activated Master regulator of inflammatory response

Macrophage migration inhibitory factor (MIF) deserves particular attention as it enhances levels of anti-apoptotic proteins during retrograde menstruation, promotes cell survival, and activates pathways involving migration, invasion, and angiogenesis [68]. The nuclear factor kappa B (NF-κB) pathway serves as a central regulator of inflammation in endometriosis, controlling transcription of pro-inflammatory cytokines, cell adhesion molecules, and survival factors [68].

Hormonal Profiles

Endometriosis is an estrogen-dependent disorder characterized by hormonal imbalances that create a permissive environment for lesion establishment and growth. Key hormonal alterations include estrogen dominance, progesterone resistance, and perturbations in androgen signaling.

Recent Mendelian randomization studies have identified a causal relationship between lower testosterone levels and endometriosis risk, suggesting that reduced androgen signaling may contribute to disease pathogenesis [31]. This finding is further supported by observations of reduced testosterone concentrations in follicular fluid of endometriosis patients undergoing assisted reproductive technologies [3].

Progesterone resistance represents another hallmark of endometriosis, manifested through reduced expression of progesterone receptors (particularly PR-B), disrupted signaling pathways, and altered regulation of downstream targets [3]. The enzyme aromatase (CYP19A1), responsible for converting androgens to estrogens, shows increased expression in endometrial tissues of endometriosis patients and demonstrates promising diagnostic accuracy with 79% sensitivity and 89% specificity [3].

Integration of Multimodal Data

Analytical Approaches for Data Integration

Integrating PRS with inflammatory biomarkers and hormonal profiles requires sophisticated computational approaches that can handle the multidimensional nature of the data. Machine learning techniques, particularly deep neural networks, have shown promise in enhancing genomic prediction of endometriosis by capturing complex non-linear relationships between genetic variants and disease phenotypes [33].

Statistical methods for data integration include:

  • Multivariate Logistic Regression: Modeling combined effects of PRS, inflammatory markers, and hormonal measures on disease risk and subphenotypes
  • Cluster Analysis: Identifying patient subgroups based on integrated molecular profiles
  • Pathway Enrichment Analysis: Mapping multimodal signatures to biological pathways
  • Network-Based Approaches: Constructing interaction networks between genetic loci, inflammatory mediators, and hormonal factors

The integration of these disparate data types requires careful normalization, dimensionality reduction, and validation in independent cohorts to ensure robustness and generalizability.

Experimental Workflows for Multimodal Assessment

A standardized workflow for collecting and analyzing multimodal data is essential for generating comparable results across research studies. The following diagram illustrates an integrated experimental pipeline:

G Patient Recruitment Patient Recruitment Biospecimen Collection Biospecimen Collection DNA Extraction DNA Extraction Genotyping Genotyping DNA Extraction->Genotyping PRS Calculation PRS Calculation Genotyping->PRS Calculation Data Integration\n& Multimodal Analysis Data Integration & Multimodal Analysis PRS Calculation->Data Integration\n& Multimodal Analysis Serum/Plasma Collection Serum/Plasma Collection Inflammatory Assays Inflammatory Assays Serum/Plasma Collection->Inflammatory Assays Cytokine Profiling Cytokine Profiling Inflammatory Assays->Cytokine Profiling Cytokine Profiling->Data Integration\n& Multimodal Analysis Blood Collection Blood Collection Hormonal Assays Hormonal Assays Blood Collection->Hormonal Assays Endocrine Profiling Endocrine Profiling Hormonal Assays->Endocrine Profiling Endocrine Profiling->Data Integration\n& Multimodal Analysis Clinical Data Collection Clinical Data Collection Clinical Data Collection->Data Integration\n& Multimodal Analysis Subphenotype Identification Subphenotype Identification Data Integration\n& Multimodal Analysis->Subphenotype Identification Risk Stratification Risk Stratification Data Integration\n& Multimodal Analysis->Risk Stratification Therapeutic Prediction Therapeutic Prediction Data Integration\n& Multimodal Analysis->Therapeutic Prediction

This integrated workflow enables researchers to capture the complex interactions between genetic predisposition, inflammatory processes, and endocrine dysregulation that collectively drive endometriosis pathogenesis and heterogeneity.

Molecular Subtyping and Therapeutic Implications

Transcriptomically-Defined Subtypes

Recent transcriptomic analyses have revealed distinct molecular subtypes of endometriosis that transcend traditional anatomical classification systems. Unsupervised clustering of ectopic lesion gene expression profiles identifies two main subtypes: stroma-enriched (S1) and immune-enriched (S2) [45].

The stroma-enriched subtype (S1) is characterized by:

  • Enrichment in fibroblast activation pathways
  • Extracellular matrix remodeling
  • Tissue organization and fibrotic processes

The immune-enriched subtype (S2) demonstrates:

  • Upregulation of immune and inflammatory pathways
  • Enhanced lymphocyte infiltration and activation
  • Stronger correlation with immunotherapy response

These molecular subtypes show distinct clinical behaviors, particularly in their response to hormone therapy. The immune-enriched subtype is significantly associated with failure or intolerance to conventional hormone treatments, suggesting the potential for alternative therapeutic approaches targeting immune pathways [45].

Signaling Pathways in Endometriosis Subphenotypes

The integration of PRS, inflammatory markers, and hormonal profiles reveals intricate signaling networks that drive different endometriosis subphenotypes. The following diagram illustrates key pathways and their interactions:

G Genetic Risk Variants Genetic Risk Variants NF-κB Activation NF-κB Activation Genetic Risk Variants->NF-κB Activation Estrogen Signaling Estrogen Signaling Estrogen Signaling->NF-κB Activation Inflammatory Triggers Inflammatory Triggers Inflammatory Triggers->NF-κB Activation Cytokine Production Cytokine Production NF-κB Activation->Cytokine Production Cell Survival Cell Survival NF-κB Activation->Cell Survival Angiogenesis Angiogenesis NF-κB Activation->Angiogenesis Progesterone Resistance Progesterone Resistance NF-κB Activation->Progesterone Resistance Immune Cell Recruitment Immune Cell Recruitment Cytokine Production->Immune Cell Recruitment Hormone Therapy Failure Hormone Therapy Failure Progesterone Resistance->Hormone Therapy Failure Chronic Inflammation Chronic Inflammation Immune Cell Recruitment->Chronic Inflammation Pain & Infertility Pain & Infertility Chronic Inflammation->Pain & Infertility

The NF-κB pathway serves as a central integrator of genetic, inflammatory, and hormonal signals in endometriosis. Activation of this pathway promotes cytokine production, cell survival, angiogenesis, and contributes to progesterone resistance—collectively driving disease progression and therapeutic challenges [68].

Experimental Protocols and Methodologies

PRS Generation and Validation

Sample Preparation and Genotyping:

  • Collect peripheral blood samples in EDTA tubes or use saliva collection kits for DNA isolation
  • Extract genomic DNA using standardized commercial kits (e.g., Qiagen DNeasy)
  • Perform genotyping using Illumina Global Screening Array or similar platforms
  • Apply rigorous quality control: remove samples with ≥15% missing rates, exclude markers with call rates <95%, apply Hardy-Weinberg equilibrium threshold (p<1×10⁻⁵), and remove population outliers via principal component analysis

PRS Calculation:

  • Obtain GWAS summary statistics from large-scale endometriosis studies (e.g., Sapkota et al. 2017 meta-analysis combined with FinnGen data)
  • Apply Bayesian methods (SBayesR) for effect size adjustment and SNP weighting
  • Calculate PRS using PLINK1.9 score function: plink --score prs_weights.txt 1 2 4 header
  • Normalize PRS to z-scores within the study population to enable comparison across cohorts

Validation Approaches:

  • Assess association with endometriosis risk using logistic regression adjusted for principal components and age
  • Evaluate subtype-specific associations by stratifying cases according to anatomical location
  • Determine discriminative accuracy via receiver operating characteristic (ROC) analysis
Inflammatory Biomarker Profiling

Multiplex Immunoassays:

  • Utilize proximity extension assay platforms (e.g., Olink Bioscience Inflammation panel) for simultaneous quantification of 92 inflammatory proteins
  • Alternatively, employ multiplex ELISA systems or Luminex technology for cytokine profiling
  • Process serum/plasma samples according to manufacturer protocols, ensuring consistent pre-analytical handling
  • Express results in Normalized Protein eXpression (NPX) values on log2 scale for Olink data

Specific Inflammatory Assays:

  • Quantify macrophage migration inhibitory factor (MIF) using commercial ELISA kits
  • Measure monocyte chemoattractant protein-1 (MCP-1) as key chemokine in endometriosis
  • Analyze NF-κB pathway activation through phospho-protein assays or target gene expression
  • Assess macrophage polarization markers (CD68, CD163, CD206) via flow cytometry or immunohistochemistry
Hormonal Profiling

Sex Steroid Quantification:

  • Measure serum testosterone levels using liquid chromatography-tandem mass spectrometry (LC-MS/MS) for highest accuracy
  • Quantify estradiol, progesterone, LH, and FSH via electrochemiluminescence immunoassays
  • Assess bioavailable testosterone through calculation of free androgen index or direct measurement
  • Consider menstrual cycle phase at sample collection for interpretation of results

Functional Hormonal Assays:

  • Evaluate aromatase (CYP19A1) expression in menstrual blood or endometrial tissue via qRT-PCR
  • Assess progesterone responsiveness through measurement of FKBP4 and progesterone receptor isoforms
  • Analyze estrogen metabolism profiles via quantification of urinary 2-hydroxyestrone and 4-hydroxyestrone

The Scientist's Toolkit

Table 3: Essential Research Reagents and Platforms

Category Specific Product/Platform Application in Endometriosis Research
Genotyping Illumina Global Screening Array Genome-wide SNP profiling for PRS calculation
Genotyping TOPMed Imputation Server Phasing and imputation of missing genotypes
PRS Calculation PLINK (v1.9+) PRS generation and basic genetic association analysis
PRS Calculation GCTB (v2.02) Bayesian methods for SNP weighting (SBayesR)
Inflammatory Profiling Olink Proseek Multiplex Inflammation I Simultaneous quantification of 92 inflammatory proteins
Inflammatory Profiling RBM InflammationMAP v1.1 54-analyte multi-analyte profile for inflammatory patterns
Hormonal Assays LC-MS/MS platforms Gold standard for steroid hormone quantification
Hormonal Assays Electrochemiluminescence Immunoassays High-sensitivity measurement of reproductive hormones
Cell Analysis xCell Analysis Package Estimation of immune cell type enrichment from transcriptomic data
Cell Analysis CIBERSORT Digital cytometry for estimating immune cell fractions
Data Integration ConsensusClusterPlus Unsupervised molecular subtyping through consensus clustering
Data Integration WGCNA Package Weighted gene co-expression network analysis

The integration of polygenic risk scores with inflammatory biomarkers and hormonal profiles represents a paradigm shift in endometriosis research, moving beyond singular approaches to embrace the multidimensional nature of the disease. This integrated framework enables refined subphenotyping, improved risk prediction, and insights into the biological mechanisms underlying different disease manifestations.

Future research directions should focus on:

  • Developing standardized protocols for multimodal biomarker assessment across research centers
  • Validating integrated biomarkers in diverse, well-characterized cohorts with detailed subphenotype information
  • Exploring longitudinal dynamics of inflammatory and hormonal markers in relation to disease progression
  • Investigating the utility of integrated biomarkers for predicting treatment response and guiding therapeutic choices
  • Leveraging artificial intelligence and machine learning approaches to uncover complex patterns within multimodal data

As these integrated approaches mature, they hold significant promise for transforming endometriosis from a surgically diagnosed disease to one characterized through molecular signatures, ultimately enabling earlier intervention, personalized treatment strategies, and improved quality of life for affected individuals.

Addressing Population Stratification and Ancestry-Specific Effects in PRS

Polygenic risk scores (PRS) have emerged as powerful tools for quantifying an individual's genetic predisposition to complex diseases. However, their clinical utility is severely limited by a critical challenge: population stratification and ancestry-specific effects. The performance of PRS developed primarily in European-ancestry populations deteriorates substantially when applied to individuals of diverse genetic backgrounds [69]. This transferability problem stems from fundamental differences in allele frequencies, linkage disequilibrium (LD) patterns, and population-specific causal variants across ancestrally diverse populations [70].

Within endometriosis research, where subphenotype characterization is crucial for understanding disease mechanisms and progression, these challenges are particularly acute. Endometriosis exhibits a heritability of 47-51% [11] [2], making genetic risk prediction highly promising, yet current PRS explain only a fraction of this heritability and perform suboptimally across global populations. This technical guide provides comprehensive methodologies for addressing population stratification and developing ancestry-aware PRS, with specific application to endometriosis subphenotype research.

Fundamental Concepts: Genetic Architecture and Stratification

The attenuation of PRS performance across diverse populations arises from several interconnected factors:

  • Allele Frequency Differences: Variants identified in genome-wide association studies (GWAS) often show substantial frequency differences across populations. A risk allele common in one population may be rare or absent in another, reducing portability [70].
  • Linkage Disequilibrium Variation: LD patterns differ across populations due to distinct demographic histories. Causal variants may be tagged by different proxy SNPs, or the same SNP may tag different causal variants in different populations [71].
  • Causal Variant Heterogeneity: Some disease-associated loci are population-specific, exemplified by the G1 and G2 variants in APOL1 associated with chronic kidney disease in individuals of African ancestry but largely absent in other groups [70].
  • Environmental Interactions: Gene-environment interactions can modify genetic effect sizes across populations with different environmental exposures and lifestyle factors.
Impact of Demographic History on Stratification

The ability to correct for population stratification depends critically on demographic history. Recent population structure (originating within the past 100 generations) presents particular challenges:

G cluster_0 Recent Structure (100 generations) cluster_1 Perpetual Structure RS_Common Common Variants (MAF>5%) RS_PCA PCA Correction Efficacy RS_Common->RS_PCA Low information (3% variance explained) RS_Rare Rare Variants (MAF<1%) RS_Rare->RS_PCA High information PS_Common Common Variants (MAF>5%) PS_PCA PCA Correction Efficacy PS_Common->PS_PCA High information (50% variance explained) PS_Rare Rare Variants (MAF<1%) PS_Rare->PS_PCA High information

Figure 1: Differential impact of demographic history on genetic variant informativeness for population structure correction. Recent structure is better captured by rare variants, while perpetual structure is captured by both common and rare variants [71].

Table 1: Characteristics of Population Structure Types

Structure Type Time Depth Common Variant Informativeness Rare Variant Informativeness Recommended Correction Approach
Recent Structure ~100 generations Low (explains ~3% of spatial variance) High Rare-variant PCA or IBD-based methods
Perpetual Structure Infinite horizon High (explains ~50% of spatial variance) High Common-variant PCA or LMMs
Admixed Populations Variable Moderate (depends on admixture timing) Moderate to High Local ancestry-aware methods

Methodological Approaches for Ancestry-Aware PRS Development

Multi-Ancestry GWAS Strategies

Developing robust PRS begins with the GWAS stage, where several strategies can improve cross-ancestry portability:

  • Ancestry-Specific GWAS: Conduct separate GWAS in distinct ancestry groups to identify population-specific effects. For example, the HBA1/2 locus for red blood cell traits shows extreme ancestry specificity, with 11 of 14 conditionally independent SNPs being monomorphic in one or more ancestries [72].
  • Multi-Ancestry Meta-Analysis: Combine summary statistics across diverse populations using fixed-effects or random-effects models. This approach must account for heterogeneity in effect sizes across populations [70].
  • Finemapping and Causal Variant Prioritization: Finemapping increases PRS portability by identifying putatively causal variants rather than LD-dependent tag SNPs. This approach has been shown to improve transferability of PRS across ancestries [69].
PRS Construction Methods for Diverse Populations

Several advanced statistical methods have been developed specifically for cross-ancestry PRS construction:

  • PRS-CSx: This Bayesian method uses a continuous shrinkage prior to incorporate multiple ancestry-specific GWAS summary statistics, allowing for effect size heterogeneity across populations. In cardiovascular disease risk prediction, PRS-CSx outperformed European-centric scores across all ancestry groups [69].
  • Ancestry-Score Weighting: For admixed individuals, compute ancestry-specific PRS and combine them using local ancestry proportions. This approach achieved better calibration (lower Brier score) in admixed populations compared to single-ancestry scores [69].
  • LD-Aware Methods: Methods that account for ancestry-specific LD patterns, such as LDpred2 and lassosum, can be applied within each ancestry group before meta-analysis.

Table 2: Performance Comparison of PRS Methods Across Ancestries in CAD Risk Prediction

PRS Method European Ancestry OR/SD African Ancestry OR/SD East Asian Ancestry OR/SD Admixed Calibration (Brier Score)
Multi-ancestry PRS-CSx 1.63 (1.52-1.75) 1.53 (1.15-2.05) 1.54 (1.28-1.86) 0.06085
GPS_CAD (European-centric) 1.49 (1.40-1.59) 1.04 (0.79-1.39) 1.26 (1.05-1.52) 0.06089
AllelicaCADEUR_2020 1.56 (1.46-1.67) 1.24 (0.92-1.67) 1.41 (1.17-1.69) 0.06095
multiGRS_CAD 1.51 (1.41-1.61) 1.30 (0.96-1.76) 1.38 (1.15-1.66) 0.06107

Data adapted from [69] demonstrating superior performance of multi-ancestry methods across diverse populations. OR/SD = Odds Ratio per Standard Deviation.

Experimental Design Considerations for Endometriosis

Endometriosis research presents unique challenges due to disease heterogeneity and subphenotypes:

  • Subphenotype Stratification: Endometriosis encompasses distinct subtypes (ovarian, peritoneal, infiltrating) with potentially different genetic architectures. Most known loci show stronger effects for rAFS Stage III/IV disease [11] [28].
  • Case Ascertainment: Surgical confirmation remains the gold standard but introduces selection bias. Combining surgically confirmed cases with broader ICD-based definitions can increase sample size while maintaining specificity [6] [2].
  • Genetic Correlation Leverage: Leveraging genetic correlations between endometriosis subphenotypes and other traits can improve power. For example, a PRS-phenome-wide association study (PheWAS) revealed an association between endometriosis PRS and testosterone levels, suggesting causal relationships [2].

Implementation Protocols for Endometriosis PRS

Quality Control and Ancestry Determination

Robust QC procedures are essential prior to PRS development:

G cluster_0 Sample QC cluster_1 Variant QC cluster_2 Ancestry Inference SQ1 Sample call rate < 95% SQ2 Sex discordancy check SQ3 Relatedness filtering (π̂ < 0.2) SQ4 Genetic vs. self-reported ancestry AI1 PCA with reference panels SQ4->AI1 VQ1 Variant call rate < 99% VQ2 Hardy-Weinberg equilibrium (p>1e-8) VQ3 Imputation quality (R²>0.8) VQ4 MAF filters ancestry-specific VQ4->AI1 AI2 ADMIXTURE analysis AI3 Identity-by-descent clustering

Figure 2: Comprehensive quality control workflow for multi-ancestry PRS development, incorporating sample- and variant-level QC with robust ancestry inference [70].

Multi-Ancestry GWAS Protocol for Endometriosis

A standardized protocol for endometriosis GWAS in diverse populations:

  • Cohort Preparation: Harmonize endometriosis subphenotype definitions across studies using revised American Fertility Society (rAFS) staging where available [11] [28].

  • Stratified Analysis: Perform GWAS separately in each ancestry group (European, African, East Asian, etc.) with ancestry-appropriate covariates:

    • First 10-20 genetic principal components
    • Age and study-specific covariates
    • Genotyping platform or batch effects
  • Meta-Analysis: Combine ancestry-specific results using sample-size weighted meta-analysis or heterogeneous effects models (e.g., RE2) for loci showing heterogeneity [28].

  • Finemapping: Apply statistical finemapping methods (e.g., SUSIE, FINEMAP) within each ancestry to identify putative causal variants.

PRS Validation and Calibration

Robust validation of ancestry-aware endometriosis PRS requires:

  • Independent Validation Cohorts: Test PRS performance in completely independent datasets with similar ancestry composition.
  • Ancestry-Specific Thresholds: Establish ancestry-specific risk thresholds based on the PRS distribution within each group. For CAD risk, thresholds identified 12-24% of individuals with twofold increased risk depending on ancestry [69].
  • Clinical Utility Assessment: Evaluate reclassification improvement when adding PRS to established risk factors. For cardiovascular disease, PRS provided a net reclassification improvement of 10.70-13.14% for intermediate-risk individuals [69].

Table 3: Research Reagent Solutions for Ancestry-Aware Endometriosis PRS Development

Resource Category Specific Tools Application in Endometriosis PRS Key Considerations
GWAS Processing REGENIE, SAIGE, PLINK Case-control association testing for endometriosis and subphenotypes Account for binary and quantitative subphenotypes; use Firth correction for rare variants
Ancestry Inference PCA, ADMIXTURE, GRAF Genetic ancestry determination in multi-ethnic cohorts Use reference panels (1000 Genomes, gnomAD) for projection; assess admixture proportions
Fine-mapping SUSIE, FINEMAP, POLYFUN Identify causal variants at endometriosis risk loci Leverage cross-ancestry information to improve resolution; incorporate functional annotations
PRS Methods PRS-CSx, LDpred2, CT-SLEB Construction of ancestry-aware risk scores Tune hyperparameters within each ancestry; assess portability metrics
Validation PHEWAS, ROC analysis, NRI Clinical utility assessment for endometriosis risk prediction Evaluate across subphenotypes; assess improvement over clinical factors alone

Endometriosis-Specific Considerations and Applications

Genetic Architecture of Endometriosis

The genetic architecture of endometriosis informs PRS development strategies:

  • Heritability and SNP Heritability: Twin studies estimate heritability at 51% [11], while common SNPs explain approximately 26% of disease variance [28].
  • Shared and Specific Loci: Current GWAS have identified 42 risk loci for endometriosis [2], with most showing consistent effects across populations but some exhibiting heterogeneity [11].
  • Subphenotype Specificity: Most loci show stronger effects for moderate-to-severe (rAFS Stage III/IV) disease, highlighting the importance of stratified analyses [11] [28].
Hormonal Pathways and Pleiotropy

Endometriosis PRS development must account for its hormonal etiology and pleiotropic effects:

  • Hormonal Mechanisms: Novel loci identified through large meta-analyses implicate genes involved in sex steroid hormone pathways (FN1, CCDC170, ESR1, SYNE1, FSHB) [28].
  • Pleiotropic Effects: PRS-phenome-wide association studies reveal that genetic liability to endometriosis associates with multiple other conditions even in undiagnosed individuals, suggesting shared biological pathways [2].
  • Mendelian Randomization: MR studies suggest causal effects of lower testosterone on endometriosis risk, highlighting potential for incorporating endocrine biomarkers into risk prediction [2].

Addressing population stratification and ancestry-specific effects is not merely a statistical challenge but an essential requirement for equitable implementation of PRS in endometriosis research and clinical care. The methodologies outlined in this guide provide a framework for developing ancestry-aware PRS that perform robustly across diverse populations.

Future efforts should focus on: (1) expanding GWAS diversity to include currently underrepresented populations; (2) developing methods that efficiently leverage admixed individuals as biological bridges between ancestry groups; (3) integrating functional genomics data to improve fine-mapping and biological interpretation; and (4) validating PRS in clinical settings for endometriosis risk stratification and early intervention.

As sample sizes continue to grow through initiatives like All of Us, Biobank Japan, and H3Africa, the opportunities for developing clinically useful, ancestrically informed PRS for endometriosis subphenotypes will expand, ultimately enabling more personalized approaches to diagnosis, prevention, and treatment.

Comparative Performance Analysis: PRS Versus Other Biomarkers and Clinical Tools

Endometriosis is a complex gynecological disorder affecting approximately 10% of reproductive-aged women, characterized by the presence of endometrial-like tissue outside the uterus. The disease presents substantial diagnostic challenges, with average delays of 7-11 years between symptom onset and definitive diagnosis via laparoscopy. Polygenic risk scores (PRS) have emerged as promising tools for quantifying genetic susceptibility by aggregating the effects of multiple genetic variants into a single metric. Understanding how PRS performs across differently ascertained cohorts—from deeply phenotyped clinical samples to large biobank populations—is crucial for developing clinically applicable risk stratification tools. This technical analysis examines the performance characteristics of an endometriosis PRS across Danish clinical and registry-based cohorts compared with replication data from the UK Biobank.

Cohort Characteristics and Study Design

Cohort Descriptions and Definitions

The validation strategy utilized three distinct cohorts to assess PRS performance across different ascertainment methods and population structures.

Table 1: Cohort Characteristics and Endometriosis Definitions

Cohort Sample Size (Cases/Controls) Case Ascertainment Method Control Definition Subtype Information
Danish Clinical Cohort 249/348 Surgical confirmation with histology + ASRM stages II-IV Age-matched blood donors without ICD-10 N80 diagnosis Detailed subtype classification available
Danish Twin Registry (DTR) 140/316 ICD-10 codes (N80.1-N80.9) from Danish National Patient Registry Age-matched unrelated women without N80 diagnosis Subtypes derived from ICD-10 codes
UK Biobank (Replication) 2,967/256,222 ICD-10 codes (N80.1-N80.9) from hospital records + self-report No N80 diagnosis + no self-reported endometriosis Limited subtype resolution

Endometriosis Subtype Classification

The study employed a standardized approach to classify endometriosis subtypes across cohorts based on ICD-10 codes, with severity ranking from severe to mild:

  • Infiltrating (N80.4, N80.5): Endometriosis of rectovaginal septum, vagina, and intestine
  • Ovarian (N80.1): Endometriosis of ovary
  • Peritoneal (N80.2, N80.3): Endometriosis of fallopian tube and pelvic peritoneum
  • Other (N80.6, N80.8, N80.9): Cutaneous scar, thorax, and unspecified locations
  • Adenomyosis (N80.0): Treated as a separate disease entity

Experimental Protocols and Methodologies

PRS Derivation and Genotyping

The polygenic risk score was derived from 14 genome-wide significant lead SNPs identified in a published GWAS meta-analysis comprising 17,045 endometriosis cases and 191,596 controls [6]. One lead SNP (rs760794) failed assay design and was replaced with rs77294520, which showed region-wide association after conditioning on the index SNP in the GREB1 locus.

Genotyping Methods:

  • Danish Cohorts: Custom genotyping assays followed by quality control procedures including Hardy-Weinberg equilibrium testing and sample-level call rate checks
  • UK Biobank: Imputed data (best guess genotypes) from the UK Biobank resource with standard quality control metrics applied

PRS Calculation: The PRS was calculated as the sum of risk alleles weighted by their effect sizes (log(odds ratios)) from the discovery GWAS. Each individual's genotype for the 14 SNPs was converted to a dosage of effect alleles (0, 1, or 2) and multiplied by the corresponding weight. The resulting weighted sums were standardized to z-scores for analysis.

Statistical Analysis Framework

The analytical approach employed consistent statistical methods across cohorts to ensure comparability:

  • Association Testing: Logistic regression models assessing the relationship between PRS (per standard deviation increase) and endometriosis case-control status
  • Covariate Adjustment: Models included principal components to account for population stratification
  • Subtype Analysis: Separate logistic regression models for each endometriosis subtype
  • Adenomyosis Comparison: Parallel analysis of adenomyosis cases to evaluate specificity of genetic associations
  • Statistical Software: Primary analyses conducted using R and PLINK with custom scripts for cohort-specific adaptations

Performance Results Across Cohorts

Primary Association Results

Table 2: PRS Performance Across Validation Cohorts

Cohort Odds Ratio (per SD) 95% Confidence Interval P-value Discriminative Accuracy
Danish Clinical 1.59 1.33-1.89 2.57×10^-7 Moderate
Danish Twin Registry 1.50 1.22-1.84 0.0001 Moderate
Combined Danish 1.57 1.37-1.80 2.5×10^-11 Moderate
UK Biobank 1.28 1.24-1.33 <2.2×10^-16 Limited

The PRS demonstrated significant association with endometriosis across all cohorts, with the strongest effects observed in the surgically confirmed Danish clinical cohort. The effect size attenuation in the UK Biobank likely reflects differences in case ascertainment, with the Danish clinical cohort representing more severe, surgically confirmed cases.

Subtype-Specific Performance

Table 3: Subtype-Specific Associations in Combined Danish Cohorts

Endometriosis Subtype Odds Ratio (per SD) P-value Case Count
Ovarian (N80.1) 1.72 6.7×10^-5 75
Infiltrating (N80.4, N80.5) 1.66 2.7×10^-9 210
Peritoneal (N80.2, N80.3) 1.51 2.6×10^-3 60
All Endometriosis 1.57 2.5×10^-11 389

The PRS showed consistent performance across endometriosis subtypes, suggesting it captures genetic risk for endometriosis broadly rather than specificity for particular disease localizations. The similar effect sizes across subtypes indicate shared genetic architecture.

Specificity Analysis: Adenomyosis Comparison

A critical validation analysis tested the PRS association with adenomyosis (N80.0) to evaluate specificity. The PRS showed no significant association with adenomyosis in either the DTR (25 cases) or UK Biobank (1,883 cases), supporting the hypothesis that adenomyosis is not driven by the same common genetic risk variants as endometriosis [6].

Visualization of Experimental Workflows

Cohort Recruitment and Analysis Pipeline

G Endometriosis PRS Validation Workflow cluster_discovery Discovery Phase cluster_validation Validation Cohorts cluster_analysis Analysis Phase GWAS GWAS Meta-analysis 17,045 cases 191,596 controls SNP 14 GWAS-significant SNPs identified GWAS->SNP PRS PRS Derivation Weighted allele sum SNP->PRS DanishClinical Danish Clinical Cohort Surgically confirmed 249 cases, 348 controls PRS->DanishClinical DanishRegistry Danish Twin Registry ICD-10 coded 140 cases, 316 controls PRS->DanishRegistry UKBiobank UK Biobank ICD-10 + self-report 2,967 cases, 256,222 controls PRS->UKBiobank Primary Primary Association Logistic regression DanishClinical->Primary DanishRegistry->Primary UKBiobank->Primary Subtype Subtype Analysis Ovarian, infiltrating, peritoneal Primary->Subtype Specificity Specificity Analysis Adenomyosis comparison Subtype->Specificity

Performance Comparison Logic

G PRS Performance Comparison Framework cluster_inputs Cohort Characteristics cluster_metrics Performance Metrics Ascertainment Case Ascertainment (Surgical vs ICD codes) EffectSize Effect Size (Odds Ratio per SD) Ascertainment->EffectSize Severity Disease Severity (Referral center vs population) Severity->EffectSize SampleSize Sample Size (Statistical power) Significance Statistical Significance SampleSize->Significance Heterogeneity Population Heterogeneity Discriminative Discriminative Accuracy Heterogeneity->Discriminative Clinical Clinical Utility EffectSize->Clinical Significance->Clinical Discriminative->Clinical Interpretation Interpretation: Trade-offs between effect size and generalizability Clinical->Interpretation

The Scientist's Toolkit

Table 4: Essential Research Reagents and Computational Tools

Resource Category Specific Tool/Resource Application in Study
Genotyping Platforms Custom SNP arrays, UK Biobank imputed data Genotype generation and quality control
Statistical Software R, PLINK, METAL, GCTB, SBayesR PRS calculation, association testing, meta-analysis
Cohort Resources Danish National Patient Registry, UK Biobank Case ascertainment and phenotype data
Bioinformatics Tools ConsensusClusterPlus, WGCNA, xCell Subtype classification and functional analysis
Reference Data GWAS catalog, ENCODE cCREs, Epimap Functional annotation and interpretation

Discussion and Research Implications

Interpretation of Performance Differences

The differential performance of the endometriosis PRS across cohorts reveals important considerations for clinical translation:

  • Ascertainment Intensity: The higher odds ratio in surgically confirmed cases (OR=1.59) compared to registry-based cases (OR=1.28-1.50) suggests that PRS captures genetic risk for more severe, clinically recognized disease
  • Sample Size Trade-offs: While the UK Biobank provided superior statistical power, the effect size attenuation highlights limitations of biobank phenotyping for complex traits
  • Clinical Utility: The consistent performance across subtypes supports broad application, though current discriminative accuracy remains insufficient for standalone clinical use

Integration with Clinical Risk Factors

Recent evidence demonstrates significant interactions between endometriosis PRS and diagnosed comorbidities. The absolute increase in endometriosis prevalence conveyed by uterine fibroids, heavy menstrual bleeding, and dysmenorrhea was greater in individuals with high endometriosis PRS compared to low PRS [73] [49]. This supports a model where PRS could enhance existing clinical risk prediction by identifying women who would benefit most from intensive diagnostic investigation.

Future Directions

The 14-SNP PRS represents an early approach to endometriosis genetic risk prediction. More recent combinatorial analytical approaches have identified 1,709 disease signatures comprising 2,957 unique SNPs, revealing 77 novel gene associations beyond previous GWAS findings [74]. Additionally, transcriptomic analyses have identified distinct endometriosis subtypes (stroma-enriched and immune-enriched) with differential responses to hormone therapy [45], suggesting future PRS development could benefit from subtype-specific approaches.

This multi-cohort validation demonstrates that endometriosis PRS performs consistently across different ascertainment methods and populations, with measurable effect sizes that are robust but currently insufficient for standalone clinical prediction. The Danish clinical cohort provided stronger genetic effects while the UK Biobank enabled powerful replication, illustrating the complementary value of both deeply phenotyped clinical samples and large biobanks. Future research should focus on integrating PRS with clinical risk factors, exploring subtype-specific genetic architectures, and leveraging more powerful PRS methods to improve discriminative accuracy for clinical application.

This whitepaper provides a technical comparison of the discriminatory accuracy of three biomarker classes—polygenic risk scores (PRS), circulating microRNAs (miRNAs), and protein biomarkers—within endometriosis research. Endometriosis presents significant diagnostic challenges, with an average latency of 7-11 years from symptom onset to surgical diagnosis, creating an urgent need for non-invasive diagnostic solutions [66] [75]. Current research is increasingly focused on molecular subtyping to predict treatment responses, particularly given that first-line hormone therapy is effective in only approximately 40% of patients [45]. This analysis synthesizes current evidence on biomarker performance, highlighting that while PRS establishes genetic predisposition, circulating miRNAs demonstrate superior diagnostic accuracy, and multi-analyte approaches combining miRNAs with classical proteins show the most promising results for clinical application. The integration of these biomarkers holds potential for transforming endometriosis management through early detection, subphenotype classification, and personalized treatment strategies.

Quantitative Comparison of Discriminatory Accuracy

Table 1: Comparative Diagnostic Performance of Biomarker Classes in Endometriosis

Biomarker Class Specific Biomarkers Sensitivity (%) Specificity (%) AUC/Other Metrics Evidence Level
Polygenic Risk Score (PRS) 13-SNP weighted score [10] Not reported Not reported Inverse association with disease spread & hormone treatment; Low sensitivity/specificity [10] Single study (N=140)
Circulating miRNAs (Single) let-7b [76] ~69.1 ~69.1 AUC: 0.691 [76] Multiple studies
miR-451a, miR-20a-5p [77] Significant differential expression Significant differential expression Promising ROC analysis (specific values NR) [77] Single validation study
Circulating miRNAs (Panels) let-7b, let-7d, let-7f (proliferative phase) [76] High High AUC: 0.929 [76] Single study (N=48)
miR-200 family, miR-141, others [78] 83.92 89.82 Bivariate model [78] Meta-analysis (50 articles)
Protein Biomarkers CA-125 [78] [76] Limited alone Limited alone Elevated in other conditions [78] Established use
Multi-Analyte miRNA panels + CA-125/HE4 [78] 93.39 92.71 Bivariate model [78] Meta-analysis

Table 2: Functional Characteristics and Clinical Applicability

Characteristic Polygenic Risk Score (PRS) Circulating miRNAs Protein Biomarkers
Primary Role Determines genetic predisposition [10] Detects active disease; monitors treatment response [79] Indicates inflammation/disease presence [78]
Temporal Dynamics Static (lifetime risk) Dynamic (reflects current status) [79] Dynamic (reflects current status)
Stage Detection Not established Early and late-stage detection possible [78] Limited early-stage detection
Therapy Guidance Limited association with hormone treatment found [10] Predicts progesterone resistance [79] Limited
Key Advantages Lifetime risk assessment High stability, minimally invasive, tissue-specific [80] Standardized assays
Major Limitations Low predictive power for subphenotypes [10] Lack of standardization [80] [75] Low specificity [78]

Detailed Experimental Protocols

Polygenic Risk Score (PRS) Analysis

Objective: To calculate a PRS for endometriosis and investigate its association with clinical presentations and hormone treatment response [10].

Sample Preparation:

  • DNA Extraction: Obtain DNA from blood samples (buffy coat) using standardized kits (e.g., MagMAX DNA Multi-sample Ultra 2.0).
  • Genotyping: Perform genotyping using microarray platforms (e.g., Illumina Global Screening Array on an iScan system).
  • Quality Control (QC): Implement a rigorous QC pipeline: exclude samples with ≥15% missing rates; remove markers with call rates <95%; exclude related samples (PI-HAT > 0.1875); remove markers violating Hardy-Weinberg equilibrium (p < 1×10⁻⁵).
  • Imputation: Use reference panels (e.g., TOPMed on GRC38) for genotype imputation, retaining high-quality variants (INFO score >0.80, MAF >0.01).

PRS Calculation:

  • Base Data: Utilize summary statistics from large-scale genome-wide association studies (GWAS).
  • Clumping & Thresholding: Select independent, significantly associated SNPs (p < 5×10⁻⁸).
  • Scoring: Apply both unweighted (risk allele count) and weighted (effect size β-weighted) methods using PLINK software.
  • Model Adjustment: Include principal components (PCs) as covariates to control for population stratification.

Validation: Assess PRS association with clinical traits (disease spread, gastrointestinal involvement, hormone treatment) using logistic regression, calculating odds ratios (OR) with 95% confidence intervals (CI) and p-for-trend [10].

Circulating miRNA Profiling and Validation

Objective: To identify and validate differentially expressed circulating miRNAs in endometriosis patients versus controls [76] [79] [77].

Sample Collection and Processing:

  • Collection: Draw peripheral blood into sterile tubes (no additive for serum; EDTA for plasma). Centrifuge at 1000-2000×g for 10 minutes to isolate serum/plasma.
  • Storage: Aliquot and immediately freeze supernatant at -80°C until analysis to preserve miRNA integrity.

RNA Extraction:

  • Isolation: Use commercial kits (e.g., miRVana RNA Isolation Kit, miRNeasy Advanced Micro Kit) with 400μL serum/plasma input.
  • Quantification: Determine RNA yield and purity using spectrophotometry (NanoDrop) or fluorometry (Qubit RNA HS Assay).

miRNA Expression Analysis:

  • Screening (Discovery Phase): Employ high-throughput methods:
    • Next-Generation Sequencing (NGS): Prepare libraries with kits (e.g., QIAseq microRNA Library Kit), sequence on Illumina platforms (74bp paired-end). Analyze data via bioinformatics pipelines (e.g., RNA-seq Analysis Portal, DESeq2 algorithm) [79].
  • Validation (Confirmation Phase): Use targeted, highly sensitive methods:
    • Reverse Transcription Quantitative PCR (RT-qPCR): Reverse transcribe RNA with poly(A) tailing and specific stem-loop primers. Perform qPCR with SYBR Green or TaqMan chemistry on platforms (e.g., Bio-Rad iCycler). Normalize data using stable reference genes (e.g., U6 snRNA) [76] [77].

Data Analysis:

  • Differential Expression: Calculate relative expression using the 2^(-ΔΔCt) method. Apply statistical tests (Wilcoxon, t-test) with multiple testing correction (FDR).
  • Diagnostic Performance: Perform Receiver Operating Characteristic (ROC) analysis to determine Area Under Curve (AUC), sensitivity, and specificity for individual miRNAs and panels [76] [77].
  • Pathway Analysis: Use tools like Ingenuity Pathway Analysis (IPA) to identify biological pathways and target genes of dysregulated miRNAs [79].

Protein Biomarker Assays

Objective: To quantify established and novel protein biomarkers in serum/plasma.

Multiplex Immunoassays:

  • Technology: Proximity Extension Assay (e.g., Olink Proseek Multiplex Inflammation I panel) allowing simultaneous measurement of 92 proteins.
  • Procedure: Incuminate sample with antibody pairs tagged with DNA; upon binding, DNA tags hybridize and are extended by DNA polymerase, creating a quantifiable template for qPCR.
  • Output: Data reported as Normalized Protein Expression (NPX) on a log2 scale.

Enzyme-Linked Immunosorbent Assay (ELISA):

  • Procedure: Use commercial kits (e.g., MBS762601 for AXIN1) following manufacturer's protocol: coat plate with capture antibody, add sample/standards, detect with enzyme-conjugated antibody, and measure signal after substrate addition.
  • Quantification: Generate standard curve for absolute concentration calculation.

Electro-Chemiluminescence Immunoassay (ECLIA):

  • Application: Quantify clinical biomarkers like CA-125 and TSH receptor antibodies (TRAb) on automated systems (e.g., Roche Modular Analytics E170) [10].

Signaling Pathways and Workflow Visualization

Multi-Analyte Biomarker Integration Logic

G cluster_biomarker_separation Biomarker Isolation cluster_analysis Analytical Platform cluster_integration Data Integration & Clinical Decision Start Patient Sample (Blood/Saliva) DNA DNA Extraction Start->DNA miRNA miRNA Extraction Start->miRNA Protein Protein Isolation Start->Protein PRS PRS Analysis (Genotyping + Algorithm) DNA->PRS miRNA_Profiling miRNA Profiling (Sequencing/RT-qPCR) miRNA->miRNA_Profiling Protein_Assay Protein Quantification (Multiplex Immunoassay) Protein->Protein_Assay MultiAnalyte Multi-Analyte Model PRS->MultiAnalyte miRNA_Profiling->MultiAnalyte Protein_Assay->MultiAnalyte Clinical Personalized Diagnosis & Therapy Selection MultiAnalyte->Clinical

Diagram 1: Multi-Analyte Biomarker Integration Logic - This workflow illustrates the parallel processing of different biomarker classes from a single patient sample, culminating in an integrated diagnostic model that enhances predictive power for personalized clinical decisions.

Circulating miRNA Sequencing Workflow

G cluster_wet_lab Wet Lab Processing cluster_bioinformatics Bioinformatics Analysis Start Saliva/Blood Collection (with DNA/RNA Shield) Centrifuge Centrifugation (10,000×g, 20 min) Start->Centrifuge RNA RNA Extraction (miRNeasy Advanced Kit) Centrifuge->RNA Library NGS Library Prep (QIAseq miRNA Library Kit) RNA->Library Pool Library Pooling & Quality Control Library->Pool Sequence Illumina Sequencing (74bp paired-end) Pool->Sequence FASTQ FASTQ Files (Demultiplexing) Sequence->FASTQ Align Alignment to Reference Genome FASTQ->Align DiffExpr Differential Expression (DESeq2, FDR<0.01) Align->DiffExpr Pathway Pathway Analysis (IPA, Gene Ontology) DiffExpr->Pathway Validation RT-qPCR Validation of Candidate miRNAs DiffExpr->Validation Biomarker Diagnostic/Prognostic Biomarker Signature Pathway->Biomarker Validation->Biomarker

Diagram 2: Circulating miRNA Sequencing Workflow - This end-to-end protocol details the process from sample collection to biomarker identification, highlighting critical quality control steps and the transition from discovery to validation phases.

The Scientist's Toolkit: Research Reagent Solutions

Table 3: Essential Research Tools for Endometriosis Biomarker Studies

Category Product/Technology Manufacturer Primary Application Key Features
Sample Collection & Storage DNA/RNA Shield Safe Collection Kit Zymo Research Stabilize nucleic acids in saliva during collection/transport [79] Preserves miRNA integrity, prevents degradation
Nucleic Acid Extraction miRNeasy Advanced Micro Kit Qiagen High-quality total RNA extraction from serum/plasma/saliva [79] Includes RNA cleanup, high recovery of small RNAs
miRNA Library Prep QIAseq microRNA Library Kit Qiagen NGS library preparation for miRNA sequencing [79] Unique molecular indexes for accurate quantification
Multiplex Protein Assay Proseek Multiplex Inflammation I Olink Bioscience Simultaneous measurement of 92 inflammatory proteins [10] Proximity Extension Assay technology, high specificity
Genotyping Illumina Global Screening Array Illumina Genome-wide SNP genotyping for PRS calculation [10] High-density coverage, optimized for imputation
miRNA Detection miRVana RNA Isolation Kit Applied Biosystems Total RNA isolation including small RNAs [76] Efficient recovery of miRNA, compatible with multiple platforms
NCode miRNA First-Strand cDNA Synthesis Life Technologies Reverse transcription for miRNA qPCR analysis [76] Poly(A) tailing-based method, high sensitivity

The comparative analysis presented in this whitepaper reveals a clear hierarchy in discriminatory accuracy among biomarker classes for endometriosis. PRS currently demonstrates limited utility for subphenotype stratification or predicting therapeutic response, though it establishes genetic predisposition [10]. Circulating miRNAs show significantly greater promise, with panels achieving sensitivity and specificity exceeding 83% in meta-analyses [78]. The most compelling results emerge from multi-analyte approaches that combine miRNA signatures with classical protein biomarkers like CA-125, achieving diagnostic performance exceeding 93% for both sensitivity and specificity [78].

Future research should prioritize several key areas: First, standardization of pre-analytical and analytical protocols for miRNA quantification is critical to enable cross-study comparisons and clinical translation [80] [75]. Second, larger validation studies in diverse populations are needed to confirm the preliminary findings reported in many miRNA studies [77]. Third, integrated models that combine PRS for risk stratification with dynamic biomarkers (miRNAs and proteins) for active disease detection and monitoring represent the most promising path forward [45] [80]. Finally, biomarker discovery must be linked to therapeutic implications, particularly for predicting progesterone resistance, which affects approximately one-third of patients [79].

The emerging paradigm of molecular subtyping in endometriosis, such as the identification of stroma-enriched and immune-enriched subtypes, offers a framework for developing truly personalized treatment approaches [45]. By leveraging the complementary strengths of different biomarker classes, researchers and drug developers can advance both the understanding of endometriosis pathophysiology and the clinical management of this complex condition.

Endometriosis and adenomyosis are prevalent gynecological disorders that significantly impact women's health, causing symptoms such as pelvic pain, abnormal uterine bleeding, and infertility. While both conditions involve the presence of endometrial-like tissue outside its normal location, they represent distinct clinical entities with different pathological features and clinical management implications. Endometriosis is characterized by the growth of endometrial tissue outside the uterine cavity, predominantly affecting pelvic structures such as the ovaries, uterosacral ligaments, and pelvic peritoneum [81]. In contrast, adenomyosis involves the invasion of endometrial tissue into the myometrial wall of the uterus [82]. Within the broader thesis on polygenic risk score performance across endometriosis subphenotypes, this review examines the genetic evidence distinguishing these two conditions, with implications for targeted drug development and personalized treatment approaches.

The etiology of both conditions remains incompletely understood, though genetic factors play a substantial role. Family and twin studies estimate the heritability of endometriosis at approximately 50%, with common genetic variants accounting for roughly 26% of disease risk [83]. Until recently, the genetic relationship between endometriosis and adenomyosis remained largely unexplored, but emerging evidence from large-scale genetic studies now provides compelling data on their distinct genetic architectures.

Genetic Evidence for Distinct Disease Entities

Polygenic Risk Score Analyses

Polygenic risk scores (PRS) aggregate the effects of many genetic variants to quantify an individual's genetic susceptibility to a particular condition. A pivotal study investigating the discriminative ability of a 14-variant PRS for endometriosis found a significant association with endometriosis risk across multiple cohorts, including surgically confirmed cases and population-based biobanks [6] [39]. Each standard deviation increase in the PRS was associated with endometriosis (OR = 1.57, p = 2.5×10⁻¹¹) and its major subtypes: ovarian (OR = 1.72), infiltrating (OR = 1.66), and peritoneal (OR = 1.51) [6].

Crucially, this same PRS demonstrated no significant association with adenomyosis, suggesting that adenomyosis is not driven by the same common genetic risk variants as endometriosis [6] [39]. This differential performance was consistent across both the Danish Twin Registry cohort and the UK Biobank replication analysis, providing robust evidence for distinct genetic architectures.

Table 1: Performance of Endometriosis PRS Across Phenotypes

Phenotype Cohort Odds Ratio P-value Sample Size (Cases/Controls)
Endometriosis (combined) Danish cohorts 1.57 2.5×10⁻¹¹ 389/664
Ovarian endometriosis Danish cohorts 1.72 6.7×10⁻⁵ 75/664
Infiltrating endometriosis Danish cohorts 1.66 2.7×10⁻⁹ 210/664
Peritoneal endometriosis Danish cohorts 1.51 2.6×10⁻³ 60/664
Endometriosis UK Biobank 1.28 <2.2×10⁻¹⁶ 2,967/256,222
Adenomyosis UK Biobank Not significant - 1,883/256,222

Recent Multi-Ancestry Genome-Wide Association Studies

The most recent and largest genome-wide association study (GWAS) for endometriosis and adenomyosis, published as a preprint in 2025, provides further evidence of genetic distinction [52]. This multi-ancestry study of approximately 1.4 million women (including 105,869 cases) identified 80 genome-wide significant associations, with 37 novel loci. Notably, the study reported five loci that represent the first genetic variants specifically associated with adenomyosis, marking a significant advancement in understanding the unique genetic architecture of this condition [52].

Fine-mapping and colocalization analyses in this study uncovered causal loci for over 50 endometriosis-related associations, with multi-omics integration revealing that genetic variation influences endometriosis risk through transcriptomic, epigenetic, and proteomic regulation across multiple tissues [52]. These findings converge on pathways involved in immune regulation, tissue remodeling, and cell differentiation, providing insights into the distinct pathogenic mechanisms underlying these disorders.

Mendelian Randomization Studies for Adenomyosis

A 2025 Mendelian randomization study identified 24 novel protein-coding genes potentially causally linked to adenomyosis, none of which had been previously reported in the context of this disease [82]. The most relevant candidate genes identified were ARHGEF35, AMT, RCVRN, GMPPB, and INTS1. Bioinformatics analysis indicated that these genes play critical roles in essential biological functions, including base-excision repair, negative regulation of various cell cycle processes, and metabolism-related pathways in adenomyosis [82].

Differential gene expression analysis comparing adenomyosis patients and healthy controls revealed that DNA2 and INTS1 displayed high expression levels, whereas EFCAB2, HLA-DQA2, and RPS26 exhibited low expression levels. The receiver operating characteristic curve analysis for the Predictive Diagnostic Index revealed an area under the curve of 0.8 for the combined analysis of the five risk genes, suggesting promise as therapeutic targets and biomarkers for early diagnosis [82].

Table 2: Key Genetic Studies Differentiating Endometriosis and Adenomyosis

Study Type Key Findings Implications
PRS Analysis Endometriosis PRS not associated with adenomyosis [6] [39] Distinct genetic architectures; different underlying biology
Multi-ancestry GWAS Five novel loci specific to adenomyosis identified [52] First genetic variants specifically linked to adenomyosis
Mendelian Randomization 24 novel genes causally linked to adenomyosis [82] Potential new therapeutic targets and biomarkers
Genetic Correlation Endometriosis shares genetic architecture with pain conditions [2] [83] Explains comorbidity patterns and symptom overlap

Methodological Approaches for Genetic Differentiation

Cohort Design and Phenotyping

The studies cited employed rigorous methodological approaches to ensure reliable differentiation between endometriosis and adenomyosis. The PRS analysis utilized three distinct cohorts: surgically confirmed endometriosis cases from a specialized referral center, cases identified from registry data using ICD-10 codes, and a large replication cohort from the UK Biobank [6]. This multi-cohort approach enhanced the generalizability and robustness of the findings.

Adenomyosis cases were carefully defined to exclude women with coexisting endometriosis, using the ICD-10 code N80.0 (endometriosis of uterus) without other endometriosis diagnoses (N80.1-N80.9) [2]. This precise phenotyping was crucial for ensuring the genetic specificity observed.

Genotyping and Polygenic Risk Score Calculation

The PRS was derived from 14 genetic variants identified in a published GWAS meta-analysis with more than 17,000 endometriosis cases [6]. When one lead SNP (rs760794) failed assay design, researchers included rs77294520, which was region-wide associated after conditioning on the index SNP in the GREB1 locus, demonstrating appropriate methodological adaptation.

For the PRS-PheWAS study, endometriosis PRS weightings were developed using summary statistics from seven European cohorts included in the Sapkota et al. 2017 meta-analysis (14,926 cases; 189,715 controls), meta-analyzed alongside endometriosis GWAS summary statistics from FinnGen Release 8 (13,456 cases, 100,663 controls) [2]. A Bayesian method (SBayesR) was used for adjusting the GWAS summary statistics effect sizes, performed with default settings while excluding the MHC region and imputing sample size [2].

The adenomyosis Mendelian randomization study employed summary data-based MR (SMR) analysis using single nucleotide polymorphisms as instrumental variables, along with expression quantitative trait loci (eQTL) data from whole blood and uterus as exposures and adenomyosis as the outcome [82]. The top cis-eQTL within the cis-region of a probe having the most potent effect on the gene's expression was selected as the instrumental variable. Multi-SNP SMR method was used as a sensitivity analysis to mitigate potential bias from using a single variant [82].

G Genetic Instrument\n(SNPs) Genetic Instrument (SNPs) Gene Expression\n(eQTL) Gene Expression (eQTL) Genetic Instrument\n(SNPs)->Gene Expression\n(eQTL) DNA Methylation DNA Methylation Genetic Instrument\n(SNPs)->DNA Methylation Protein Levels Protein Levels Genetic Instrument\n(SNPs)->Protein Levels Endometriosis Risk Endometriosis Risk Gene Expression\n(eQTL)->Endometriosis Risk Adenomyosis Risk Adenomyosis Risk Gene Expression\n(eQTL)->Adenomyosis Risk DNA Methylation->Endometriosis Risk DNA Methylation->Adenomyosis Risk Protein Levels->Endometriosis Risk Protein Levels->Adenomyosis Risk

Diagram 1: Mendelian Randomization Approach for Identifying Causal Genes. Solid arrows represent established pathways for endometriosis; dashed arrows represent distinct pathways for adenomyosis.

Biological Pathways and Mechanisms

Shared and Distinct Biological Pathways

Multi-omics integration from the large-scale GWAS revealed that genetic variation influences endometriosis risk through transcriptomic, epigenetic, and proteomic regulation across multiple tissues, converging on pathways involved in immune regulation, tissue remodeling, and cell differentiation [52]. Drug-repurposing analyses highlighted potential therapeutic interventions currently used for breast cancer and preterm birth prevention [52].

For adenomyosis, bioinformatics analysis indicates that identified risk genes play critical roles in essential biological functions, including base-excision repair, negative regulation of various cell cycle processes, and metabolism-related pathways [82]. These findings suggest fundamentally different pathogenic mechanisms despite some overlapping symptoms.

Hormonal Mechanisms

A PRS phenome-wide association study revealed an association between genetic liability to endometriosis and lower testosterone levels [2]. Follow-up analysis using Mendelian randomization approaches suggested lower testosterone may be causal for both endometriosis and clear cell ovarian cancer [2]. This finding provides important insights into the hormonal basis of endometriosis and potential avenues for therapeutic intervention.

The association with testosterone levels was not observed for adenomyosis, further supporting distinct endocrine pathways in these two conditions. This differential hormone sensitivity may explain their different response profiles to various hormonal treatments.

G Genetic Risk Variants Genetic Risk Variants Immune Dysregulation Immune Dysregulation Genetic Risk Variants->Immune Dysregulation Hormonal Alterations Hormonal Alterations Genetic Risk Variants->Hormonal Alterations Tissue Remodeling Tissue Remodeling Genetic Risk Variants->Tissue Remodeling Pain Sensitivity Pain Sensitivity Genetic Risk Variants->Pain Sensitivity Endometriosis Endometriosis Immune Dysregulation->Endometriosis Adenomyosis Adenomyosis Immune Dysregulation->Adenomyosis Hormonal Alterations->Endometriosis Hormonal Alterations->Adenomyosis Tissue Remodeling->Endometriosis Tissue Remodeling->Adenomyosis Pain Sensitivity->Endometriosis Pain Sensitivity->Adenomyosis

Diagram 2: Biological Pathways in Endometriosis and Adenomyosis. Solid arrows represent stronger established pathways for endometriosis; dashed arrows represent modified pathways for adenomyosis.

Clinical and Research Implications

Diagnostic Applications

The differential genetic profiles between endometriosis and adenomyosis have significant implications for diagnostic development. The distinct genetic signatures suggest potential for genetic tests to supplement current diagnostic approaches, which primarily rely on imaging and invasive procedures [84] [81].

Transvaginal ultrasonography remains the first-line imaging modality for assessing adnexal masses and suspected endometriosis, while MRI is used as a secondary diagnostic tool to better characterize these lesions [84]. However, genetic biomarkers could potentially help differentiate between these conditions earlier in the diagnostic process, reducing the current diagnostic delay for endometriosis, which averages 7-11 years [2].

Therapeutic Development and Drug Repurposing

The identification of distinct genetic risk factors and biological pathways for endometriosis and adenomyosis opens new avenues for therapeutic development. Drug-repurposing analyses from the large-scale GWAS highlighted potential therapeutic interventions currently used for breast cancer and preterm birth prevention [52]. The identification of specific genes and pathways for adenomyosis provides new potential targets for this historically understudied condition [82].

The finding that lower testosterone levels may be causal for endometriosis suggests novel endocrine treatment approaches [2]. Furthermore, the shared genetic architecture between endometriosis and pain conditions indicates that treatments targeting pain pathways may be particularly beneficial for endometriosis patients, independent of disease modification [2] [83].

Table 3: Research Reagent Solutions for Genetic Studies

Research Tool Application Key Features
GWAS Summary Statistics Genetic association discovery Large sample sizes (>100,000 participants)
eQTL Data (GTEx) Mapping genetic variants to gene expression Tissue-specific expression data (uterus, whole blood)
SBayesR PRS effect size adjustment Bayesian method for improved prediction accuracy
SMR Analysis Causal inference between gene expression and disease Integrates GWAS and eQTL data; identifies causal genes
ICD-10 Code Mapping Phenotype definition in biobanks Enables precise case identification (N80.0 vs N80.1-N80.9)

The accumulating genetic evidence clearly demonstrates that endometriosis and adenomyosis are distinct entities with different genetic risk profiles, despite some overlapping clinical features. Polygenic risk scores developed for endometriosis show no significant association with adenomyosis, and recent large-scale genetic studies have identified variants specific to each condition. These findings have important implications for the development of targeted therapies and diagnostic tools, moving beyond the historical tendency to group these conditions together. Future research should focus on further elucidating the specific biological pathways driving each condition and translating these genetic findings into clinical applications that improve patient care and outcomes.

For researchers in this field, the methodological approaches outlined—including precise phenotyping, advanced PRS calculation methods, and integrative multi-omics analyses—provide a roadmap for continuing to unravel the genetic complexity of these conditions. As genetic datasets continue to expand and diversify, our understanding of the distinct genetic architectures of endometriosis and adenomyosis will further refine, enabling more personalized approaches to diagnosis and treatment.

The integration of polygenic risk scores (PRS) with established clinical risk factors and patient-reported symptoms represents a transformative approach for endometriosis risk stratification. This technical review synthesizes current evidence on the performance of such combined models, detailing their enhanced discriminative ability over PRS alone. We provide comprehensive quantitative comparisons, detailed experimental methodologies for model development, and resource guidance to facilitate implementation in research and clinical trial settings. For drug development professionals, these integrated models offer promising avenues for improved patient cohort stratification in clinical trials and development of personalized therapeutic strategies.

Endometriosis, affecting approximately 10% of women of reproductive age, presents significant diagnostic challenges with current delays averaging 7-10 years [85]. The complex pathogenesis involving genetic, inflammatory, and hormonal factors necessitates multifactorial risk assessment approaches. Polygenic risk scores, which aggregate the effects of multiple genetic variants into a single metric, provide a foundational genetic risk component but demonstrate limited discriminatory power as standalone tools [6] [10]. The integration of PRS with clinical manifestations and symptom profiles creates synergistic models that significantly enhance predictive accuracy and clinical utility across endometriosis subtypes.

Quantitative Performance of Combined Models

Discriminatory Performance Across Model Configurations

Table 1: Performance Metrics of PRS-Clinical Combined Models

Model Configuration Cohort Sample Size (Cases/Controls) Key Predictive Features Performance (ROC-AUC)
PRS Only (14-SNP) Danish Surgical Cohort 249/348 14 GWAS-identified SNPs OR=1.59 per SD [6]
PRS Only (14-SNP) UK Biobank 2,967/256,222 14 GWAS-identified SNPs OR=1.28 per SD [6]
Machine Learning Combined UK Biobank 5,924/142,723 Genetic variants + ICD-10 history + female health data + lifestyle factors 0.81 [50]
PRS + Subtype Analysis Combined Danish 389/664 Infiltrating, ovarian, and peritoneal subtypes OR=1.57-1.72 per SD [6]

Clinical Feature Contributions to Predictive Models

Table 2: Relative Contribution of Clinical Features in Combined Models

Feature Category Specific Features Impact Measurement Model Context
Comorbidity Profiles Irritable bowel syndrome (IBS) High SHAP value [50] Machine Learning Model
Reproductive History Menstrual cycle length High SHAP value [50] Machine Learning Model
Symptom Patterns Chronic pelvic pain, dysmenorrhea Clinical assessment correlation [10] PRS + Clinical Assessment
Endometriosis Subtypes Ovarian, infiltrating, peritoneal OR=1.72, 1.66, 1.51 respectively [6] PRS Subtype Stratification
Previous Diagnoses ICD-10 history prior to endometriosis diagnosis Significantly more diagnoses in cases [50] Retrospective Model

Experimental Protocols for Combined Model Development

Genotyping and PRS Calculation Protocol

Sample Preparation and Quality Control

  • DNA Extraction: Obtain from whole blood samples collected in EDTA tubes [10]
  • Genotyping Platform: Utilize Illumina Global Screening Array on Illumina iScan system [10]
  • Quality Control Pipeline:
    • Exclude samples with ≥15% missing rates
    • Remove markers with call rates <95%
    • Exclude related samples (PI-HAT > 0.1875)
    • Remove markers violating Hardy-Weinberg equilibrium (p < 1×10⁻⁵)
    • Eliminate population outliers via principal component analysis [10]

Imputation and SNP Selection

  • Imputation Server: TOPMed Version R2 on GRC38 [10]
  • Quality Thresholds: INFO score >0.80, MAF >0.01 [10]
  • PRS Model SNPs: 13 genome-wide significant SNPs (p < 5×10⁻⁸) from endometriosis GWAS [10]
  • Calculation Method: Weighted sum of risk alleles using PLINK software with effect sizes (beta values) as weights [10]

PRS_Workflow start Sample Collection (Whole Blood) dna DNA Extraction start->dna genotyping Genotyping (Illumina iScan) dna->genotyping qc1 Quality Control: - Sample missingness <15% - Marker call rate >95% - HWE compliance genotyping->qc1 imputation Imputation (TOPMed Server) qc1->imputation qc2 Post-Imputation QC: - INFO score >0.80 - MAF >0.01 imputation->qc2 snp_select SNP Selection (p < 5×10⁻⁸) qc2->snp_select prs_calc PRS Calculation (Weighted Risk Alleles) snp_select->prs_calc output PRS Output prs_calc->output

Clinical Data Integration and Model Training

Clinical Phenotype Assessment

  • Symptom Documentation: Utilize structured questionnaires covering sociodemographic factors, lifestyle habits, and medical history [10]
  • Bowel Symptoms: Implement Visual Analog Scale for Irritable Bowel Syndrome (VAS-IBS) for abdominal pain, constipation, diarrhea, bloating, and nausea [10]
  • Surgical Confirmation: Verify endometriosis diagnosis via laparoscopy/laparotomy with histological examination [6] [10]
  • Subtype Classification: Categorize according to ICD-10 codes (N80.1-N80.9) with severity ranking [6]

Machine Learning Integration Protocol

  • Data Partitioning: Divide into endometriosis diagnosis group (ICD-10: N80) and matched control group [50]
  • Feature Engineering: Include over 1000 variables from biobank data covering female health, lifestyle, self-reported data, genetic variants, and medical history [50]
  • Model Training: Apply gradient boosting algorithms (CatBoost) with k-fold cross-validation [50]
  • Model Interpretation: Utilize SHAP (SHapley Additive exPlanations) for feature importance analysis [50]

Clinical_Integration clinical_data Clinical Data Collection symptoms Symptom Assessment: - Structured questionnaires - VAS-IBS scores - Pain documentation clinical_data->symptoms surgical Surgical Confirmation (Laparoscopy/Histology) symptoms->surgical subtype Subtype Classification (ICD-10 N80.1-N80.9) surgical->subtype preprocessing Data Preprocessing: - Missing value imputation - Feature normalization - Age matching subtype->preprocessing model_train Model Training: - Gradient boosting (CatBoost) - Cross-validation - Hyperparameter tuning preprocessing->model_train interpretation Model Interpretation: - SHAP analysis - Feature importance - Clinical validation model_train->interpretation combined_model Final Combined Model (PRS + Clinical) interpretation->combined_model

Biological Pathways and Mechanistic Insights

The biological plausibility of combined PRS-clinical models is reinforced by emerging research on endometriosis pathogenesis. Recent studies utilizing Mendelian randomization have identified potential causal relationships between specific plasma proteins and endometriosis development.

Key Pathway Insights:

  • RSPO3 Association: Mendelian randomization analysis identifies RSPO3 as a potential causal protein in endometriosis pathogenesis, confirmed through colocalization analysis and clinical validation [86]
  • Inflammatory Signaling: Combined models capture interactions between genetic predispositions and inflammatory mediators like IL-6, MCP-1, and TNFRSF9 [10] [85]
  • Adenomyosis Differentiation: PRS demonstrates specificity for endometriosis without association with adenomyosis (OR not significant), suggesting distinct pathogenic mechanisms [6]
  • Hormonal Regulation: Genetic variants in hormonal regulation pathways (GREB1 locus) interact with clinical manifestations related to menstrual cycle characteristics [6] [50]

Endometriosis_Pathways genetic Genetic Predisposition (GWAS-identified SNPs) molecular Molecular Pathways - RSPO3 signaling - Hormonal regulation - Inflammatory mediators genetic->molecular symptoms_node Clinical Symptoms - Chronic pelvic pain - Gastrointestinal symptoms - Infertility genetic->symptoms_node cellular Cellular Processes - Angiogenesis - Fibrosis - Neuronal innervation molecular->cellular molecular->symptoms_node cellular->symptoms_node diagnosis Endometriosis Diagnosis (Surgical confirmation) symptoms_node->diagnosis

Research Reagent Solutions and Essential Materials

Table 3: Essential Research Materials for PRS-Clinical Model Implementation

Category Specific Product/Platform Application Context Function
Genotyping Illumina Global Screening Array PRS Calculation Genome-wide SNP profiling [10]
Genotyping Platform Illumina iScan System PRS Calculation High-throughput screening system [10]
Imputation Server TOPMed Version R2 PRS Refinement Missing genotype imputation [10]
Statistical Package PLINK 1.9 PRS Calculation Genetic association analysis [10]
Protein Analysis Proseek Multiplex Inflammation 1 kit Biomarker Validation Inflammatory protein profiling [10]
Immunoassay Human R-Spondin3 ELISA Kit Mechanistic Validation RSPO3 protein quantification [86]
Machine Learning CatBoost Gradient Boosting Combined Model Development Integrated model training [50]
Model Interpretation SHAP (SHapley Additive exPlanations) Feature Importance Analysis Model explainability [50]

Discussion and Future Directions

The integration of PRS with clinical risk factors and symptoms significantly advances endometriosis risk prediction beyond the limitations of either approach alone. While current PRS alone demonstrates modest effect sizes (OR=1.28-1.59 per standard deviation) [6], combined models achieve substantially improved discriminative performance (ROC-AUC=0.81) [50]. This enhanced accuracy enables meaningful applications in both clinical practice and therapeutic development.

For drug development professionals, these combined models offer particularly valuable applications:

  • Clinical Trial Enrichment: Improved patient stratification for more targeted recruitment and reduced heterogeneity in trial populations
  • Biomarker Development: Identification of mechanistically distinct patient subgroups for personalized therapeutic approaches [87]
  • Clinical Trial Endpoints: Integration of combined risk scores as intermediate endpoints or enrichment biomarkers
  • Target Validation: Cross-referencing of genetic findings with novel drug targets like RSPO3 [86]

Future development should focus on refining subtype-specific models, expanding diverse ancestral representation in training datasets, and validating combined models in prospective clinical settings. The integration of additional data modalities, including imaging findings and novel circulating biomarkers, will further enhance model performance and clinical utility across the endometriosis spectrum.

Within the complex landscape of endometriosis research, the development of accurate diagnostic and prognostic tools remains a paramount challenge. This technical guide examines two powerful genomic approaches transforming our capabilities: transcriptomic biomarkers identified via machine learning and polygenic risk scores (PRS). Framed within broader thesis research on PRS performance across endometriosis subphenotypes, we provide an in-depth comparison of their methodological foundations, performance metrics, and clinical applicability. The integration of machine learning with high-throughput genomic data is advancing non-invasive diagnostic solutions, potentially overcoming the limitations of current invasive diagnostic standards and the modest predictive power of existing PRS models for this heterogeneous condition [88] [6] [89].

Transcriptomic biomarkers are genes or non-coding RNAs whose expression patterns, measured via technologies like RNA sequencing (RNA-Seq), are characteristic of a disease state. In endometriosis, these biomarkers reflect the active molecular pathways in diseased tissues and can be identified through machine learning classification of transcriptomic data [88] [90].

Polygenic Risk Scores (PRS) aggregate the cumulative effect of many genetic variants (often single nucleotide polymorphisms - SNPs) across the genome, each with small effect size, to quantify an individual's genetic susceptibility to a disease. For endometriosis, PRS are typically derived from genome-wide association study (GWAS) summary statistics [6].

Table 1: Fundamental Characteristics of Transcriptomic Biomarkers and PRS

Feature Transcriptomic Biomarkers Polygenic Risk Score (PRS)
Basis Gene expression levels (dynamic) Genetic sequence variants (static)
Data Source RNA-Sequencing, Microarrays GWAS summary statistics, Genotyping arrays
Temporal Dynamics Can change with disease state, environment Fixed at birth, lifelong risk indicator
Primary Tissue Often disease-relevant tissue (e.g., endometrium) or blood Typically calculated from blood or saliva DNA
Machine Learning Role Core to feature selection and classification model development Can be integrated as one feature within larger predictive models

Methodological Approaches and Experimental Protocols

Identifying Transcriptomic Biomarkers with Machine Learning

The discovery of transcriptomic biomarkers follows a structured pipeline that integrates bioinformatics with machine learning [88].

D A Sample Collection (Endometriosis vs. Control) B RNA Extraction & RNA-Seq A->B C Bioinformatic Pre-processing (QC, Alignment, Quantification) B->C D Feature Selection (Filter low-count genes) C->D E Machine Learning Classification (AdaBoost, XGBoost, Bagged CART) D->E F Model Validation (5-fold Cross-Validation) E->F G Biomarker Extraction (Gene Importance Ranking) F->G

Figure 1: Workflow for transcriptomic biomarker discovery using machine learning.

Protocol 1: Transcriptomic Biomarker Discovery [88]

  • Sample Collection & RNA Sequencing: Collect endometrial tissue biopsies via a minimally invasive procedure. Cases are confirmed via laparoscopy and histology. Isolate RNA and prepare libraries for sequencing using a platform like Illumina NextSeq.
  • Bioinformatic Pre-processing:
    • Quality Control: Use FastQC to verify raw sequence quality.
    • Adapter Trimming: Employ Cutadapt to remove adapter sequences and poor-quality bases.
    • Alignment: Map reads to a reference genome (e.g., hg38) using an aligner like Bowtie2.
    • Quantification: Generate read counts for each gene using HTSeq.
    • Filtering: Filter out genes with low expression (e.g., <1 count per million in at least n samples, where n is the smallest group size).
  • Machine Learning Classification:
    • Data Preparation: Divide data into training and test sets. Normalize gene expression data.
    • Model Training: Train multiple classifiers, such as:
      • Bagged Classification and Regression Trees (CART)
      • XGBoost
      • AdaBoost
      • Stochastic Gradient Boosting
    • Validation: Evaluate model performance using five-fold cross-validation.
    • Performance Metrics: Calculate accuracy, balanced accuracy, sensitivity, specificity, positive predictive value, negative predictive value, and F1-score.
  • Biomarker Extraction: Rank genes based on their "variable importance" or "feature importance" metric from the best-performing model. Top-ranked genes are candidate biomarkers.

Developing and Validating a Polygenic Risk Score

The PRS development pipeline relies on large-scale genetic association data [6].

D A GWAS Meta-Analysis (Identify significant SNPs) B SNP Selection & Effect Size Estimation A->B C PRS Calculation (Weighted sum of risk alleles) B->C D Cohort Validation (Independent case-control cohorts) C->D E Subphenotype Analysis (Ovarian, Infiltrating, Peritoneal) D->E

Figure 2: Standard workflow for polygenic risk score development and validation.

Protocol 2: Polygenic Risk Score Construction and Validation [6]

  • GWAS Summary Statistics: Obtain effect sizes (beta coefficients and p-values) for SNPs from a large endometriosis GWAS meta-analysis. The largest published meta-analysis includes over 17,000 cases.
  • SNP Selection and Weighting: Select genome-wide significant SNPs (e.g., 14 top-associated SNPs from a meta-analysis). Effect sizes from the GWAS are used as weights. More advanced methods like SBayesR can be used to adjust SNP weights.
  • PRS Calculation: For each individual in a target cohort, the PRS is calculated using PLINK software as a weighted sum of risk alleles:
    • ( PRS = \sum{i=1}^{n} (\betai \times \text{Count}_i) )
    • Where ( \betai ) is the effect size (weight) of SNP i from the GWAS, and ( \text{Count}i ) is the number of effect alleles (0, 1, 2) the individual carries for SNP i.
  • Validation in Independent Cohorts: Test the association of the PRS with endometriosis status in independent cohorts, such as:
    • Surgically Confirmed Cases: Patients with laparoscopy and histologically confirmed endometriosis.
    • Registry-Based Cases: Cases identified via ICD-10 codes in national patient registries.
    • Large Biobanks: e.g., UK Biobank.
  • Subtype Analysis: Evaluate the PRS association with major endometriosis subtypes (e.g., ovarian, infiltrating, peritoneal) to assess performance across subphenotypes.

Comparative Performance and Quantitative Data

Diagnostic and Predictive Performance

Direct comparison of performance metrics reveals the distinct strengths of each approach.

Table 2: Performance Comparison of Transcriptomic Biomarkers vs. PRS in Endometriosis

Metric Transcriptomic Biomarkers (ML Model) Polygenic Risk Score (PRS)
Primary Use Case Disease classification & diagnosis Genetic risk stratification
Reported Accuracy 85.7% (Bagged CART) [88] N/A (Provides odds ratio)
Sensitivity/Specificity 100% / 75% [88] N/A
Key Performance Indicator Odds Ratio (OR) per SD increase: 1.57 - 1.72 (across subtypes) [6]
Area Under Curve (AUC) Not reported in cited studies ~0.6748 (Methylation Risk Score combined with PRS) [91]
Validation Cohort 16 cases, 22 controls (internal cross-validation) [88] 249 surgically confirmed cases, 348 controls; replicated in UK Biobank (2,967 cases) [6]

Identified Biomarkers and Genetic Associations

The output of these methodologies are specific genes and genetic variants.

Table 3: Specific Biomarkers and Variants Identified by Transcriptomic and PRS Approaches

Approach Identified Biomarkers / Key Findings Notes / Subtype Associations
Transcriptomic ML CUX2, CLMP, CEP131, EHD4, CDH24, ILRUN, LINC01709, HOTAIR, SLC30A2, NKG7 [88] Biomarkers identified from variable importance in Bagged CART model.
PRS (14-SNP) SNPs from top GWAS loci (e.g., in GREB1 region) [6] Associated with all subtypes: Ovarian (OR=1.72), Infiltrating (OR=1.66), Peritoneal (OR=1.51). Not associated with adenomyosis.
Integrated ML & Genetics Adenosine kinase, Enoyl-CoA hydratase, CCR4-NOT subunit 7 [92] Three core biomarkers identified by combining GWAS data with machine learning.

The Scientist's Toolkit: Essential Research Reagents and Platforms

Successful implementation of these methodologies requires a suite of specialized reagents, software, and analytical tools.

Table 4: Key Research Reagent Solutions for Transcriptomic and PRS Studies

Category Item Function / Application
Wet-Lab Reagents Illumina NextSeq RNA-Seq Platform High-throughput mRNA sequencing for transcriptomic data generation [88].
Tissue Biopsy Kits Minimally invasive collection of endometrial tissue samples [88].
DNA Genotyping Arrays Genome-wide genotyping of single nucleotide polymorphisms (SNPs) for PRS calculation [6].
Bioinformatics Tools FastQC Quality control check on raw sequence data [88].
Cutadapt Removes adapter sequences and other contaminant sequences from reads [88].
Bowtie2 / TopHat Alignment of sequencing reads to a reference genome (e.g., hg38) [88].
HTSeq Generation of read count data for each gene [88].
Analytical Software R / Python with scikit-learn Machine learning model development and classification (e.g., Bagged CART, XGBoost) [88] [93].
PLINK Whole-genome association analysis toolset, used for PRS calculation [6].
GCTB (SBayesR) Bayesian tool for adjusting GWAS summary statistics for improved PRS [6].
Glmnet (R) Implementation of LASSO regression for feature selection in high-dimensional data [93].

Integrated and Emerging Approaches

The field is moving beyond standalone applications towards integrated models. Evidence shows that combining a Methylation Risk Score (MRS)—capturing non-genetic, environmental influences on DNA—with a PRS yields a classification performance (AUC) consistently higher than using the PRS alone [91]. Furthermore, advanced machine learning techniques like neural networks demonstrate that the predictive value of PRS is maximized when combined with rich, structured clinical data from electronic health records (EHRs), rather than used in isolation [94].

Another innovative approach merges transcriptome-wide association studies (TWAS) with patient-derived transcriptomic data and machine learning (e.g., LASSO, Boruta algorithms) to pinpoint a minimal set of predictive genes with high biological interpretability, as demonstrated in venous thromboembolism research [95]. This methodology is directly applicable to endometriosis investigations.

Both transcriptomic biomarkers, powered by machine learning, and polygenic risk scores offer distinct and valuable paths for advancing endometriosis research. Transcriptomic ML models currently show superior performance for direct disease classification, achieving high diagnostic accuracy by capturing active disease state signals. In contrast, PRS provides a static measure of genetic predisposition, effective for risk stratification across subphenotypes but with lower standalone predictive power.

The future of endometriosis diagnostics and risk prediction lies in multi-modal integration. Combining the dynamic molecular snapshot from transcriptomics (and other omics like methylomics) with the foundational genetic risk from PRS, and contextualizing both within a framework of detailed clinical data, promises to unlock the robust, personalized risk models needed to finally shorten the long diagnostic odyssey for millions of patients. For researchers focused on PRS performance across subphenotypes, these integrative approaches are essential for explaining the significant portion of disease variance that remains unaccounted for by current genetic models.

Conclusion

The current evidence demonstrates that polygenic risk scores show significant but subtype-dependent performance for endometriosis risk prediction, with strongest associations for ovarian and infiltrating subtypes. However, stand-alone PRS models currently lack sufficient discriminative accuracy for specific clinical presentations, highlighting the need for integrated approaches combining genetic risk with epigenetic markers, inflammatory profiles, and detailed clinical phenotyping. Future research directions should prioritize developing subtype-specific PRS models, expanding diverse population representation, and leveraging machine learning for multi-omics integration. For drug development, these findings underscore the potential of PRS for patient stratification in clinical trials and identifying shared biological pathways with comorbid conditions that may reveal novel therapeutic targets. The evolving PRS landscape promises to transform endometriosis from a surgically diagnosed condition to one with pre-symptomatic risk stratification capabilities, ultimately reducing diagnostic delays and enabling personalized intervention strategies.

References