Calculate sequencing depth

How to calculate the optimal sampling depth (sample-size)

Minimum sequencing depth for a required species genome coverage in metagenomic samples

samplingDepth = genomeLength x coverage x 100 / abundance

genomeLength - bacterial genome length

coverage - required depth of coverage

abundance - expected relative abundance level (percent) of targeted bacteria species


Required → Depth of coverage

at least 5 to 10 X for Metagenome-Assembled Genomes (MAG)

at least 1 to 2 X for assembly free taxonomic or functional profiling (Metaphlan, HumanN) or → strain detection (PanPhlAn)

Expected taxa abundance

values might be estimated from previous 16S studies

Example

Genome length: 5Mbp

Required coverage: 2X

expected abundance level: 0.5 %

5Mbp x 2 x 100 / 0.5 = 2Gbp

5,000,000 x 2 x 100 / 0.5 = 2,000,000,000

A sampling size of 2Gbp is required to get a 2X coverage of a 5Mbp genome present at 0.5% abundance level


How many reads?

example: 2 x 200bp read length (paired reads)

samplingDepth / readLength / 2 paired reads

2,000,000,000 bp / 200 bp / 2 = 5,000,000 paired reads

Sequenceing 2Gbp by using a read length of 200bp gives a total number of 5 million paired reads equivalent to 10 million single reads.

See also:

Typical sequencing depths in metagenomic studies

→ Depth and breadth of coverage

→ Illumina Coverage Calculator