Tools‎ > ‎SAMtools‎ > ‎

SAMtools: get breadth of coverage

Get breadth of coverage of a reference genome.

How many bases of my reference genome are covered by my short reads sample at a level of 5X or higher?

# set shell variables to your file-names
REF_GENOME_FILE=my_ref_genome.fna
WGS_SAMPLE=my_short_read_sample.fastq
MIN_COVERAGE_DEPTH=5

1) Map short reads against reference genome (bowtie2)

# create bowtie2 index database (database name: refgenome)

bowtie2-build ${REF_GENOME_FILE} refgenome
ls
refgenome.1.bt2
refgenome.2.bt2
refgenome.3.bt2
refgenome.4.bt2
refgenome.rev.1.bt2
refgenome.rev.2.bt2


# map reads and sort bam file
bowtie2 -x refgenome --no-unal -U ${WGS_SAMPLE} -S - -p 12 | \
 samtools view -bS - | \
 samtools sort -m 5G - mapping_result_sorted.bam


   -U ${WGS_SAMPLE} consider all reads as unpaired reads
   -S -             bowtie output in SAM format, write to standard out
   -x refgenome     use reference genome bowtie2-database
   --no-unal        suppress SAM records for reads that failed to align
   -p 12            use 12 processors

2) Get breadth of coverage (SAMtools)

# create samtools index
samtools index mapping_result_sorted.bam

# get total number of bases covered at MIN_COVERAGE_DEPTH or higher
samtools mpileup mapping_result_sorted.bam | awk -v X="${MIN_COVERAGE_DEPTH}" '$4>=X' | wc -l
32876

# get length of reference genome
bowtie2-inspect -s refgenome | awk '{ FS = "\t" } ; BEGIN{L=0}; {L=L+$3}; END{print L}'
45678

Result: breadth of reference genome coverage
total number of covered bases: 32876  (with >= 5X coverage depth)
Depth of coverage (average per-base coverage): 0.719 X (32876 ÷ 45678) (total number of covered bases divided by reference genome length)
percent: 71.9%  (0.719 × 100) (coverage breadth in percent)