Metagenomics wiki


Alpha and beta diversity

Measures for community richness within an environment or variation between different environments


Aligning and merging short fragments of sequenced DNA in order to reconstruct the original genome.

Coverage depth

Average number of times a base of a genome is sequenced.


Contiguous fragments of DNA sequence from an incomplete draft genome.

GC content

The GC content of a DNA sequence is the percentage of nucleotides that are either G or C.

Giga base pairs (Gb)

Size of a metagenomics sample in numbers of base pairs

Horizontal gene transfer (HGT)

Exchange or absorption of genetic material independent of reproduction

Environmental gene tags (EGTs)

Short DNA sequences that characterize microbial environments.


Collection of biological DNA fragments prepared for sequencing

Microbial ecosystem

A microbial ecosystem is the community of all microbes living in a specific environment


Identifying the complete set of transcripts (RNA-seq) from microbial environments


profiling of community-wide protein abundances

Multilocus sequence typing (MLST)

Technique to detect variability of housekeeping genes for identifying bacterial strains

Multiplex sequencing

DNA fragments from different samples are pooled and sequenced all together

Operational taxonomic units (OTU)

Sequence based species cluster defined by 16S gene sequence similarity

Orthologous genes

Functional identical genes in different species that evolved from a common ancestral gene


The entire gene set of a species

Phred score (Q score)

Quality value for each sequenced base

Reads per kilo base per million (RPKM)

Normalization to compare coverage of genes

Ribosomal sequences 16S

To identify and compare bacteria based on differences in their 16S ribosomal sequence

Sampling depth

How deep is enough for metagenomic shotgun sequencing

Strain-level metagenomics

Identifying the gene composition of individual strains in metagenomic samples