Biome & AI
# Tools 🔨
is the first-of-its-kind continuous benchmarking platform for metagenomics classifiers, featuring multi-objective ranking, and effective distribution of containerised software. It enables:
- Users to make informed choices and to obtain standardised and easy-to-use tools, and
- Method developers to showcase novel approaches and to get trusted benchmarks for publications.
assesses the "expected" gene content. BUSCO can be used for
- quality control of genomics data sets,
- applications in comparative genomics,
- gene predictor training, and
is a database of "equivalent" genes across species. Pioneering hierarchical orthology, OrthoDB provides:
- the broadest coverage from animals to microbes and viruses
- evolutionary and functional annotations of orthologs
ranks potential miRNA targets with thermodynamic, evolutionary, probabilistic, and sequence-based feature predictors. It is available as a webserver and a Python software library.
for processing phylogenetic trees: re-rooting, trimming, pruning, condensing, drawing (ASCII graphics or SVG).
Human virus screening from high-throughput sequencing data
A collaborative project to create a Snakemake workflow for assembly, annotation, and genomic binning of metagenomic sequence data.
A collaborative project for prophage annotation.
# MicroBiome / AI ✨
# Why metagenomics? - an open approach.
# Viromics of clinical samples
was a subject of our productive collaboration (opens new window) with the HUG virology lab.
# Epstein-Barr virus sequence diversity in chronically infected patients & G2GWAS.
# ACI-1 beta-lactamase (AMR gene) (opens new window)
is widespread across human gut microbiomes in Negativicutes due to transposons harboured by tailed prophages.
# Negativicutes bacteria (opens new window)
are gram‐negative having two cell membranes, though radiated from gram‐positive Firmicutes having only a single membrane; and phages adapted accordingly.
# Why Machine Learning? - the way forward.
# Genomics / Sequencing 🐜
# Comparative genomics
was part of many animal and arthropod sequencing projects, and we have contributed some of these.
We have also sequenced a few ourselves:
# Dipluran Campodea augens
is a blind soil-dwelling and ancestrally wingless hexapod
With Diplura as sister clade to Insecta, the C. augens genome represents a key out-group reference for studying the emergence of genomic innovations in insects. It is blind but displays light-avoidance behaviour, and it has the ability to regenerate lost body appendages.
- we uncovered a massive expansion of the chemosensory gene family of ionotropic receptors (IRs). This is by far the largest IR family known in the animal kingdom so far. This enormous expansion likely reflects adaptation to soil life and it might compensate for the loss of vision.
# Damselfly Calopteryx splendens
belongs to the early radiated winged insect clade of Palaeoptera
Odonate species have been extensively used for studying insect behaviour, ecology and evolution. Our genomic data will propel such studies to the molecular level, as it will be the first publication describing a genome of an Odonata representative. No less importantly, our data will bring us closer to studying the evolution of early insect traits such as the emergence of wings and metamorphosis.
We have found
- a detoxification gene that has not been found in the genome of any other insect,
- a unique gene architecture in the peptidoglycan recognition protein (PGRP) family of immunity-related proteins,
- a copy of the conserved odorant receptor coreceptor (Orco) gene, but no other odorant receptors (ORs).
# Drain fly Psychoda alternata
an accidental sequencing of a mostly complete transcriptome from a drain fly and a novel Rhabdovirus-like virus