Consensus sequence

get consensus sequence (of most frequent bases) based on short reads, mapped against a reference sequence (gene or complete genome)

1) Map short reads against reference gene sequence

# Create bowtie2 database

bowtie2-build REFERENCE.fasta REF_DB

# bowtie2 mapping

bowtie2 -x REF_DB -U SAMPLE.fastq --no-unal -S SAMPLE.sam

# samtools:  sort .sam file and convert to .bam file

samtools view -bS SAMPLE.sam | samtools sort - -o SAMPLE.bam

2) Get consensus sequence from .bam file

# Get consensus fastq file

samtools mpileup -uf REFERENCE.fasta SAMPLE.bam | bcftools call -c | vcfutils.pl vcf2fq > SAMPLE_cns.fastq

  # vcfutils.pl is part of bcftools

# Convert .fastq to .fasta and set bases of quality lower than 20 to N

seqtk seq -aQ64 -q20 -n N SAMPLE_cns.fastq > SAMPLE_cns.fasta

see also

https://samtools.github.io/bcftools/howtos/consensus-sequence.html