QIIME pre-processing
Raw data processing
1) Join forward and reverse reads
# Sequencing .fastq files are in folder IlluminaPairedReads/ (two files R1 and R2 per sample)
ls IlluminaPairedReads
SampleA_L001_R1_001.fastq.gz
SampleA_L001_R2_001.fastq.gz
SampleB_L001_R1_001.fastq.gz
SampleB_L001_R2_001.fastq.gz
# merge forward and reverse reads (multiple samples)
multiple_join_paired_ends.py -i IlluminaPairedReads -o JoinedReads
# Result: one folder per sample (each containing a file: fastqjoin.join.fastq)
ls JoinedReads
SampleA_L001_R1_001
SampleB_L001_R1_001
2) Quality filter
filter out low base quality and rename samples
split_libraries_fastq.py -i sequence-files --sample_ids new-sample-names -o SEQ/ -q 19 --barcode_type 'not-barcoded'
# Example (sample-list is separated by comma without space behind comma)
split_libraries_fastq.py -i JoinedReads/SampleA_L001_R1_001/fastqjoin.join.fastq,JoinedReads/SampleB_L001_R1_001/fastqjoin.join.fastq --sample_ids SampleA,SampleB -o SEQ/ -q 19 --barcode_type 'not-barcoded'
-o SEQ/ - output: save results to folder "SEQ"
-q 19 - accept base quality Phred >= Q20
--barcode_type 'not-barcoded' - barcode not present in sequence (already removed)
# Result: all sequences in a single file seq.fna
SEQ/seqs.fna
>SampleA_1
CCTACGGGAG...
>SampleA_2
CCTACGGGAG...
# check total number of sequences in file seqs.fna
cat SEQ/seqs.fna | grep '>' | wc -l
12517932