QIIME pre-processing

Raw data processing

1) Join forward and reverse reads

# Sequencing .fastq files are in folder IlluminaPairedReads/ (two files R1 and R2 per sample)

ls IlluminaPairedReads

SampleA_L001_R1_001.fastq.gz

SampleA_L001_R2_001.fastq.gz

SampleB_L001_R1_001.fastq.gz

SampleB_L001_R2_001.fastq.gz

# merge forward and reverse reads (multiple samples)

multiple_join_paired_ends.py -i IlluminaPairedReads -o JoinedReads

# Result: one folder per sample (each containing a file: fastqjoin.join.fastq)

ls JoinedReads

SampleA_L001_R1_001

SampleB_L001_R1_001

2) Quality filter

filter out low base quality and rename samples

split_libraries_fastq.py -i sequence-files --sample_ids new-sample-names -o SEQ/ -q 19 --barcode_type 'not-barcoded'

# Example (sample-list is separated by comma without space behind comma)

split_libraries_fastq.py -i JoinedReads/SampleA_L001_R1_001/fastqjoin.join.fastq,JoinedReads/SampleB_L001_R1_001/fastqjoin.join.fastq --sample_ids SampleA,SampleB -o SEQ/ -q 19 --barcode_type 'not-barcoded'

-o SEQ/ - output: save results to folder "SEQ"

-q 19 - accept base quality Phred >= Q20

--barcode_type 'not-barcoded' - barcode not present in sequence (already removed)

# Result: all sequences in a single file seq.fna

SEQ/seqs.fna

>SampleA_1

CCTACGGGAG...

>SampleA_2

CCTACGGGAG...

# check total number of sequences in file seqs.fna

cat SEQ/seqs.fna | grep '>' | wc -l

12517932

Next

→ QIIME OTU clustering