Raw data processing1) Join forward and reverse reads
# Sequencing .fastq files are in folder IlluminaPairedReads/ (two files R1 and R2 per sample) ls IlluminaPairedReads SampleA_L001_R1_001.fastq.gz SampleA_L001_R2_001.fastq.gz SampleB_L001_R1_001.fastq.gz SampleB_L001_R2_001.fastq.gz # Result: one folder per sample (each containing a file: fastqjoin.join.fastq) ls JoinedReads SampleA_L001_R1_001 SampleB_L001_R1_001 2) Quality filterfilter out low base quality and rename samplessplit_libraries_fastq.py -i sequence-files --sample_ids new-sample-names -o SEQ/ -q 19 --barcode_type 'not-barcoded' # Example (sample-list is separated by comma without space behind comma) split_libraries_fastq.py -i JoinedReads/SampleA_L001_R1_001/fastqjoin.join.fastq,JoinedReads/SampleB_L001_R1_001/fastqjoin.join.fastq --sample_ids SampleA,SampleB -o SEQ/ -q 19 --barcode_type 'not-barcoded' -o SEQ/ - output: save results to folder "SEQ" -q 19 - accept base quality Phred >= Q20 --barcode_type 'not-barcoded' - barcode not present in sequence (already removed)# Result: all sequences in a single file seq.fna SEQ/seqs.fna >SampleA_1 CCTACGGGAG... >SampleA_2 CCTACGGGAG... # check total number of sequences in file seqs.fna
cat SEQ/seqs.fna | grep '>' | wc -l |