Calculate sequencing depth
How to calculate the optimal sampling depth (sample-size)
Minimum sequencing depth for a required species genome coverage in metagenomic samples
samplingDepth = genomeLength x coverage x 100 / abundance
genomeLength - bacterial genome length
coverage - required depth of coverage
abundance - expected relative abundance level (percent) of targeted bacteria species
Required → Depth of coverage
at least 5 to 10 X for Metagenome-Assembled Genomes (MAG)
at least 1 to 2 X for assembly free taxonomic or functional profiling (Metaphlan, HumanN) or → strain detection (PanPhlAn)
Expected taxa abundance
values might be estimated from previous 16S studies
Example
Genome length: 5Mbp
Required coverage: 2X
expected abundance level: 0.5 %
5Mbp x 2 x 100 / 0.5 = 2Gbp
5,000,000 x 2 x 100 / 0.5 = 2,000,000,000
A sampling size of 2Gbp is required to get a 2X coverage of a 5Mbp genome present at 0.5% abundance level
How many reads?
example: 2 x 200bp read length (paired reads)
samplingDepth / readLength / 2 paired reads
2,000,000,000 bp / 200 bp / 2 = 5,000,000 paired reads
Sequenceing 2Gbp by using a read length of 200bp gives a total number of 5 million paired reads equivalent to 10 million single reads.
See also:
→ Typical sequencing depths in metagenomic studies
→ Depth and breadth of coverage
→ Illumina Coverage Calculator