EvoGenomics.AI is a network for AI applications in evolutionary genomics kindly funded by Imperial College London, CNRS and TUM.
Two of the main scientific advancements in recent years are the capacity to obtain genome sequences at massive scale and the improved predictive power of Artificial Intelligence (AI). On one side, high-throughput sequencing technologies allow for a cost-effective generation of -omics datasets for both model and non-model species. On the other side, recent advances in AI make previously unattainable tasks feasible.
By integrating these two disciplines, it is now possible to efficiently infer the evolutionary history of species of interest from their genomes. In fact, machine learning (ML) algorithms have shown promising results in the inference of past demographic history, hybridization and admixture events, spatial structure, life history traits and genes under natural (Darwinian) selection. ML algorithms automatically tune their internal parameters to maximise prediction accuracy. Supervised ML algorithms require a known data set to learn the relationship between input and output. Deep learning (DL) is a class of ML algorithms that can learn which features of the data are sufficient for the task to perform.
Our network brings together experts in the field of AI/ML applied to genomic data for evolutionary inferences. Our expertise spans from the development of mathematical models to understand genome evolution of hosts and their parasites to the application of DL algorithms for historical inferences.
Aurélien Tellier works on developing mathematical models to understand genome evolution of 1) hosts and their parasites, or 2) plants, invertebrates, bacteria, or fungi undertaking some form of dormancy (metabolic inactive state in the environment or within a host). He builds statistical methods based on Approximate Bayesian Computation (ABC) or the Sequential Markovian Coalescent (SMC) to 1) draw inference 1) the past demographic history and life-history/ecological traits, such as dormancy or selfing, and 2) reveal genes under coevolution in genomes of interacting species. The models he uses have a special feature, they show two Markovian properties: in time but along the genome (in space). He is recently becoming interested in testing for violations of these models by building non-Markovian processes along the genome and testing their existence in the real world data. He is interested in using AI to study the evolution of complex host-parasite coevolutionary history and to study non-Markovian processes.
Matteo Fumagalli’s work aims at understanding how species genetically adapt to their environment. He develops statistical and computational tools to identify signatures of natural selection from population genomic data. He is particularly interested in quantifying how much pathogen- and diet-driven selection has shaped human susceptibility to complex diseases. He has recently introduced the use of DL and convolutional neural networks to test for positive and balancing selection in the human genome.
Flora Jay has a strong interest in demographic inference, but also joint inference of selection and demography, data visualization and generation. With this in mind, she develops Bayesian, ABC and DL approaches for studying ancient and/or present-day populations. She is especially keen on proposing methods that are tailored to genomic data, such as visualization tools adapted to temporal DNA or neural networks architectures designed to account for population genetic data invariance. In the recent past she applied these tools to (archaic) human, cattle and bacterial populations. Finally, she is developing DNADNA, a software that aims to facilitate the use, sharing and implementation of neural networks in the population genetic community.
We develop software for ML and DL inferences from population genomic data. In this network, we are interested in providing user-friendly applications and proposing new algorithms to process genetic data. We are also keen to provide educational opportunities for students and research staff in an effort to set good-practice standards in AI applications for evolutionary genomics.