- Weight estimation is based on
**linear models**

Assumes (among other things):

- Additivity
- Linearity

- Gene-Gene interaction
- Major difficulty in the analysis of GWAS data6
- Epistasis-aware models are possible; however, naïve ones are intractable ($\mathcal O(2^m)$ terms)

That being said,

- Some phenotypes are simple
- Some forms of epistasis might be reflected in additive effects7

- Interaction learning8
^{,}9 - Machine learning

For machine learning:

- Marker selection is "the major factor that impacts on a machine learning model’s predictive performance"10
- The mechanic through which markers affect phenotype might not be known

Exploit information from paired haplotype-phenotype sequence:
$$
H_0 \bigtriangleup \Phi =
\left\lbrace (h_0^1, \varphi_1), \ldots, (h_0^n, \varphi_n) \right\rbrace.
$$

A bit of notation:
$$
H_0^* = H_0 \cup \lbrace h_0^* \rbrace, \quad
\Phi^* = \Phi \cup \lbrace \varphi^* \rbrace
$$

The law of total probabilities allows the introduction of **evolutionary history** in the form of genealogies:
$$
f(H_0^*, \Phi^*)
= \int_{\color{#4d7e65} \mathcal G} f(H_0^*, \Phi^* | {\color{#4d7e65} G}) g({\color{#4d7e65} G}) \text d{\color{#4d7e65} G}
$$

- Discrete phenotype: $\Phi^* \in \lbrace 0, 1 \rbrace^{n + 1}$
- Quantity of interest: $L(\varphi^* = 1 | H_0^*, \Phi)$
- No recombination between causal markers
- $G \sim$ ARG11
- $(\Phi^* | G) \sim$ ???

First question: How to compute **exact** likelihood (not up to a constant)?

For each marginal tree $T_i$, we compute the **marginal density** $f(\varphi^* = 1, \Phi | T_i)$.

Select tree $T^*$ based on absolute pointwise mutual information:

$$ T^* = \argmax_T \left\vert \text{pmi}(\Phi^*, T) \right\vert = \left\vert \frac{f(\Phi^* | G)}{f(\Phi^*)} \right\vert $$- $f(\Phi | T_i) \rightarrow f(\Phi)$ as ${\text{TMRCA}(T_i) \rightarrow \infty}$
- $\varphi_k \sim \mathcal B(p)$, $\Phi \sim \mathcal B(n, p)$

Assume conditional independance on ancestor:

$$ f(\varphi_k | T_i, \Phi \setminus \lbrace \varphi_k \rbrace) = f(\varphi_k | p_{T_i}(k), \Phi\vert_{p_{T_i}(k)}) $$ Where $p_{T_i}(k)$: parent of sequence $k$, ${\Phi\vert_x =\lbrace \varphi \in \Phi : \varphi \text{ descedent of x} \rbrace}$.- $\alpha(t): \mathbb R_+ \to [0, 1]$ strictly monotonous such that $\alpha(0) = 0$ and $\alpha(t) \to 1$ as $t \to \infty$
- $t_k = \text{TMRCA}(\Phi\vert_{p_{T_i}(k)})$
- $h$: U-shaped beta-binomial mass function

- Croucha, D. J. M., & Bodmer, W. F. (2020). Polygenic inheritance, GWAS, polygenic risk scores, and the search for functional variants.
*Proceedings of the National Academy of Sciences of the United States of America*,*117*(32), 18924–18933. https://doi.org/10.1073/pnas.2005634117 - Meuwissen, T. H. E., Hayes, B. J., & Goddard, M. E. (2001). Prediction of Total Genetic Value Using Genome-Wide Dense Marker Maps.
*Genetics*,*157*(4), 1819–1829. https://doi.org/10.1093/GENETICS/157.4.1819 - Guindo-Martínez, M., et al. (2021). The impact of non-additive genetic associations on age-related complex diseases.
*Nature Communications 2021 12:1*,*12*(1), 1–14. https://doi.org/10.1038/s41467-021-21952-4 - Pozarickij, A., Williams, C., & Guggenheim, J. A. (2020). Non-additive (dominance) effects of genetic variants associated with refractive error and myopia.
*Molecular Genetics and Genomics*,*295*(4), 843. https://doi.org/10.1007/S00438-020-01666-W *Non Additive Genetic Effects Portal - Home*. (n.d.). Retrieved May 25, 2022, from https://nage.hugeamp.org/

- Furlong, L. I. (2013). Human diseases through the lens of network biology.
*Trends in Genetics*,*29*(3), 150–159. https://doi.org/10.1016/J.TIG.2012.11.004 - Mäki-Tanila, A., & Hill, W. G. (2014). Influence of Gene Interaction on Complex Trait Variation with Multilocus Models.
*Genetics*,*198*(1), 355–367. https://doi.org/10.1534/GENETICS.114.165282 - Massi M.C., Franco N.R., Ieva F., Manzoni A., Paganoni A.M., Zunino P. HighOrder Interaction Learning via Targeted Pattern Search.
*MOX Report 59/2020*, 2020. Retrieved May 25, 2022, from https://www.mate.polimi.it/biblioteca/add/qmox/59-2020.pdf - Franco, N. R., Massi, M. C., et al. (2021). Development of a method for generating SNP interaction-aware polygenic risk scores for radiotherapy toxicity.
*Radiotherapy and Oncology*,*159*, 241–248. https://doi.org/10.1016/j.radonc.2021.03.024 - Ho, D. S. W., Schierding, W., Wake, M., Saffery, R., & O’Sullivan, J. (2019). Machine learning SNP based prediction for precision medicine.
*Frontiers in Genetics*,*10*(MAR), 267. https://doi.org/https://doi.org/10.3389/fgene.2019.00267

- Griffiths, R. C., and Marjoram, P. (1996). An ancestral recombination graph.
*IMA Volume on Mathematical Population Genetics*(P. Donnelly and S. Tavare, Eds.), Springer-Verlag, New York, 257–270

P. Fournier &
F. Larribe
(STATQAM — UQAM)
New Statistical Methods in Genetic Studies
SSC Annual Meeting (Online)
June 2^{nd}, 2022