Accounting for Epistasis in PRSs Through the Coalescent

P. Fournier & F. Larribe (STATQAMUQAM)

New Statistical Methods in Genetic Studies

June 2nd, 2022

PRS & Epistasis


  • Weight estimation is based on linear models

Assumes (among other things):

  • Additivity
  • Linearity


  • Gene-Gene interaction
  • Major difficulty in the analysis of GWAS data6
  • Epistasis-aware models are possible; however, naïve ones are intractable ($\mathcal O(2^m)$ terms)

That being said,

  • Some phenotypes are simple
  • Some forms of epistasis might be reflected in additive effects7

Epistasis-aware models

  • Interaction learning8,9
  • Machine learning

For machine learning:

  • Marker selection is "the major factor that impacts on a machine learning model’s predictive performance"10
  • The mechanic through which markers affect phenotype might not be known

Model-Free Genotype Based Prediction


Goal: compute the likelihood of phenotype $\varphi^*$ given the underlying genotype $h_0^*$.
Exploit information from paired haplotype-phenotype sequence: $$ H_0 \bigtriangleup \Phi = \left\lbrace (h_0^1, \varphi_1), \ldots, (h_0^n, \varphi_n) \right\rbrace. $$
A bit of notation: $$ H_0^* = H_0 \cup \lbrace h_0^* \rbrace, \quad \Phi^* = \Phi \cup \lbrace \varphi^* \rbrace $$


Assuming unrelatedness, $$ L(\varphi^* | h_0^*, H_0, \Phi) \propto f(H_0^*, \Phi^*) $$
The law of total probabilities allows the introduction of evolutionary history in the form of genealogies: $$ f(H_0^*, \Phi^*) = \int_{\color{#4d7e65} \mathcal G} f(H_0^*, \Phi^* | {\color{#4d7e65} G}) g({\color{#4d7e65} G}) \text d{\color{#4d7e65} G} $$

Assumption: Conditional Independance

$$ (H_0^* | {\color{#4d7e65} G}) \perp (\Phi^* | {\color{#4d7e65} G}) $$

Computing the Likelihood


  • Discrete phenotype: $\Phi^* \in \lbrace 0, 1 \rbrace^{n + 1}$
  • Quantity of interest: $L(\varphi^* = 1 | H_0^*, \Phi)$
  • No recombination between causal markers
  • $G \sim$ ARG11
  • $(\Phi^* | G) \sim$ ???

First question: How to compute exact likelihood (not up to a constant)?

Conditional Density of the Phenotypes

For each marginal tree $T_i$, we compute the marginal density $f(\varphi^* = 1, \Phi | T_i)$.

Select tree $T^*$ based on absolute pointwise mutual information:

$$ T^* = \argmax_T \left\vert \text{pmi}(\Phi^*, T) \right\vert = \left\vert \frac{f(\Phi^* | G)}{f(\Phi^*)} \right\vert $$

Conditional Density II

  • $f(\Phi | T_i) \rightarrow f(\Phi)$ as ${\text{TMRCA}(T_i) \rightarrow \infty}$
  • $\varphi_k \sim \mathcal B(p)$, $\Phi \sim \mathcal B(n, p)$

Assume conditional independance on ancestor:

$$ f(\varphi_k | T_i, \Phi \setminus \lbrace \varphi_k \rbrace) = f(\varphi_k | p_{T_i}(k), \Phi\vert_{p_{T_i}(k)}) $$ Where $p_{T_i}(k)$: parent of sequence $k$, ${\Phi\vert_x =\lbrace \varphi \in \Phi : \varphi \text{ descedent of x} \rbrace}$.

Conditional Density III

$$ \begin{align*} &f(\Phi^* | T^* )\\ &\quad = f(\varphi^* | \Phi, T^*) \prod_{k = 1}^n f(\varphi_k| \Phi_{k - 1}, T^*)\\ &\quad = f(\varphi^* | p_{T_i}(*), \Phi\vert_{p_{T_i}(*)}) \prod_{k = 1}^n f(\varphi_k | p_{T_i}(k), \Phi\vert_{p_{T_i}(k)}) \end{align*} $$

Conditional Density: Single Phenotype

  • $\alpha(t): \mathbb R_+ \to [0, 1]$ strictly monotonous such that $\alpha(0) = 0$ and $\alpha(t) \to 1$ as $t \to \infty$
  • $t_k = \text{TMRCA}(\Phi\vert_{p_{T_i}(k)})$
  • $h$: U-shaped beta-binomial mass function
$$ f(\varphi_k | t_k, \Phi\vert_{p_{T_i}(k)}) = \alpha(t_k) f(\Phi) + (1 - \alpha(t_k)) h(\Phi). $$


  1. Croucha, D. J. M., & Bodmer, W. F. (2020). Polygenic inheritance, GWAS, polygenic risk scores, and the search for functional variants. Proceedings of the National Academy of Sciences of the United States of America, 117(32), 18924–18933.
  2. Meuwissen, T. H. E., Hayes, B. J., & Goddard, M. E. (2001). Prediction of Total Genetic Value Using Genome-Wide Dense Marker Maps. Genetics, 157(4), 1819–1829.
  3. Guindo-Martínez, M., et al. (2021). The impact of non-additive genetic associations on age-related complex diseases. Nature Communications 2021 12:1, 12(1), 1–14.
  4. Pozarickij, A., Williams, C., & Guggenheim, J. A. (2020). Non-additive (dominance) effects of genetic variants associated with refractive error and myopia. Molecular Genetics and Genomics, 295(4), 843.
  5. Non Additive Genetic Effects Portal - Home. (n.d.). Retrieved May 25, 2022, from
  1. Furlong, L. I. (2013). Human diseases through the lens of network biology. Trends in Genetics, 29(3), 150–159.
  2. Mäki-Tanila, A., & Hill, W. G. (2014). Influence of Gene Interaction on Complex Trait Variation with Multilocus Models. Genetics, 198(1), 355–367.
  3. Massi M.C., Franco N.R., Ieva F., Manzoni A., Paganoni A.M., Zunino P. HighOrder Interaction Learning via Targeted Pattern Search. MOX Report 59/2020, 2020. Retrieved May 25, 2022, from
  4. Franco, N. R., Massi, M. C., et al. (2021). Development of a method for generating SNP interaction-aware polygenic risk scores for radiotherapy toxicity. Radiotherapy and Oncology, 159, 241–248.
  5. Ho, D. S. W., Schierding, W., Wake, M., Saffery, R., & O’Sullivan, J. (2019). Machine learning SNP based prediction for precision medicine. Frontiers in Genetics, 10(MAR), 267.
  1. Griffiths, R. C., and Marjoram, P. (1996). An ancestral recombination graph. IMA Volume on Mathematical Population Genetics (P. Donnelly and S. Tavare, Eds.), Springer-Verlag, New York, 257–270