Accounting for Epistasis in PRSs Through the Coalescent

P. Fournier & F. Larribe (STATQAMUQAM)

New Statistical Methods in Genetic Studies

June 2nd, 2022

PRS & Epistasis

Assumptions

  • Weight estimation is based on linear models

Assumes (among other things):

  • Additivity
  • Linearity

Epistasis?

  • Gene-Gene interaction
  • Major difficulty in the analysis of GWAS data6
  • Epistasis-aware models are possible; however, naïve ones are intractable ($\mathcal O(2^m)$ terms)

That being said,

  • Some phenotypes are simple
  • Some forms of epistasis might be reflected in additive effects7

Epistasis-aware models

  • Interaction learning8,9
  • Machine learning

For machine learning:

  • Marker selection is "the major factor that impacts on a machine learning model’s predictive performance"10
  • The mechanic through which markers affect phenotype might not be known

Model-Free Genotype Based Prediction

Overview

Goal: compute the likelihood of phenotype $\varphi^*$ given the underlying genotype $h_0^*$.
Exploit information from paired haplotype-phenotype sequence: $$ H_0 \bigtriangleup \Phi = \left\lbrace (h_0^1, \varphi_1), \ldots, (h_0^n, \varphi_n) \right\rbrace. $$
A bit of notation: $$ H_0^* = H_0 \cup \lbrace h_0^* \rbrace, \quad \Phi^* = \Phi \cup \lbrace \varphi^* \rbrace $$

Likelihood

Assuming unrelatedness, $$ L(\varphi^* | h_0^*, H_0, \Phi) \propto f(H_0^*, \Phi^*) $$
The law of total probabilities allows the introduction of evolutionary history in the form of genealogies: $$ f(H_0^*, \Phi^*) = \int_{\color{#4d7e65} \mathcal G} f(H_0^*, \Phi^* | {\color{#4d7e65} G}) g({\color{#4d7e65} G}) \text d{\color{#4d7e65} G} $$

Assumption: Conditional Independance

$$ (H_0^* | {\color{#4d7e65} G}) \perp (\Phi^* | {\color{#4d7e65} G}) $$

Computing the Likelihood

Context

  • Discrete phenotype: $\Phi^* \in \lbrace 0, 1 \rbrace^{n + 1}$
  • Quantity of interest: $L(\varphi^* = 1 | H_0^*, \Phi)$
  • No recombination between causal markers
  • $G \sim$ ARG11
  • $(\Phi^* | G) \sim$ ???

First question: How to compute exact likelihood (not up to a constant)?

Conditional Density of the Phenotypes

For each marginal tree $T_i$, we compute the marginal density $f(\varphi^* = 1, \Phi | T_i)$.

Select tree $T^*$ based on absolute pointwise mutual information:

$$ T^* = \argmax_T \left\vert \text{pmi}(\Phi^*, T) \right\vert = \left\vert \frac{f(\Phi^* | G)}{f(\Phi^*)} \right\vert $$

Conditional Density II

  • $f(\Phi | T_i) \rightarrow f(\Phi)$ as ${\text{TMRCA}(T_i) \rightarrow \infty}$
  • $\varphi_k \sim \mathcal B(p)$, $\Phi \sim \mathcal B(n, p)$

Assume conditional independance on ancestor:

$$ f(\varphi_k | T_i, \Phi \setminus \lbrace \varphi_k \rbrace) = f(\varphi_k | p_{T_i}(k), \Phi\vert_{p_{T_i}(k)}) $$ Where $p_{T_i}(k)$: parent of sequence $k$, ${\Phi\vert_x =\lbrace \varphi \in \Phi : \varphi \text{ descedent of x} \rbrace}$.

Conditional Density III

$$ \begin{align*} &f(\Phi^* | T^* )\\ &\quad = f(\varphi^* | \Phi, T^*) \prod_{k = 1}^n f(\varphi_k| \Phi_{k - 1}, T^*)\\ &\quad = f(\varphi^* | p_{T_i}(*), \Phi\vert_{p_{T_i}(*)}) \prod_{k = 1}^n f(\varphi_k | p_{T_i}(k), \Phi\vert_{p_{T_i}(k)}) \end{align*} $$

Conditional Density: Single Phenotype

  • $\alpha(t): \mathbb R_+ \to [0, 1]$ strictly monotonous such that $\alpha(0) = 0$ and $\alpha(t) \to 1$ as $t \to \infty$
  • $t_k = \text{TMRCA}(\Phi\vert_{p_{T_i}(k)})$
  • $h$: U-shaped beta-binomial mass function
$$ f(\varphi_k | t_k, \Phi\vert_{p_{T_i}(k)}) = \alpha(t_k) f(\Phi) + (1 - \alpha(t_k)) h(\Phi). $$

References

  1. Croucha, D. J. M., & Bodmer, W. F. (2020). Polygenic inheritance, GWAS, polygenic risk scores, and the search for functional variants. Proceedings of the National Academy of Sciences of the United States of America, 117(32), 18924–18933. https://doi.org/10.1073/pnas.2005634117
  2. Meuwissen, T. H. E., Hayes, B. J., & Goddard, M. E. (2001). Prediction of Total Genetic Value Using Genome-Wide Dense Marker Maps. Genetics, 157(4), 1819–1829. https://doi.org/10.1093/GENETICS/157.4.1819
  3. Guindo-Martínez, M., et al. (2021). The impact of non-additive genetic associations on age-related complex diseases. Nature Communications 2021 12:1, 12(1), 1–14. https://doi.org/10.1038/s41467-021-21952-4
  4. Pozarickij, A., Williams, C., & Guggenheim, J. A. (2020). Non-additive (dominance) effects of genetic variants associated with refractive error and myopia. Molecular Genetics and Genomics, 295(4), 843. https://doi.org/10.1007/S00438-020-01666-W
  5. Non Additive Genetic Effects Portal - Home. (n.d.). Retrieved May 25, 2022, from https://nage.hugeamp.org/
  1. Furlong, L. I. (2013). Human diseases through the lens of network biology. Trends in Genetics, 29(3), 150–159. https://doi.org/10.1016/J.TIG.2012.11.004
  2. Mäki-Tanila, A., & Hill, W. G. (2014). Influence of Gene Interaction on Complex Trait Variation with Multilocus Models. Genetics, 198(1), 355–367. https://doi.org/10.1534/GENETICS.114.165282
  3. Massi M.C., Franco N.R., Ieva F., Manzoni A., Paganoni A.M., Zunino P. HighOrder Interaction Learning via Targeted Pattern Search. MOX Report 59/2020, 2020. Retrieved May 25, 2022, from https://www.mate.polimi.it/biblioteca/add/qmox/59-2020.pdf
  4. Franco, N. R., Massi, M. C., et al. (2021). Development of a method for generating SNP interaction-aware polygenic risk scores for radiotherapy toxicity. Radiotherapy and Oncology, 159, 241–248. https://doi.org/10.1016/j.radonc.2021.03.024
  5. Ho, D. S. W., Schierding, W., Wake, M., Saffery, R., & O’Sullivan, J. (2019). Machine learning SNP based prediction for precision medicine. Frontiers in Genetics, 10(MAR), 267. https://doi.org/https://doi.org/10.3389/fgene.2019.00267
  1. Griffiths, R. C., and Marjoram, P. (1996). An ancestral recombination graph. IMA Volume on Mathematical Population Genetics (P. Donnelly and S. Tavare, Eds.), Springer-Verlag, New York, 257–270