Extract sequence subset
How to extract or remove sequences from fasta or fastq file
# get a list of all sequence IDs
# example: get all geneIDs from a fasta file
cat genes.fasta | grep '>' | cut -f 1 -d ' ' | sed 's/>//g' > list_of_geneIDs.txt
# get subset IDs: create a text-file with selected sequence IDs
# Example: select top 3 genes as subset
head -3 list_of_geneIDs.txt > subsetIDs.txt
gene_001
gene_002
gene_003
# extract subset of gene sequences based on list of sequence IDs in .txt file
seqtk subseq genes.fasta subsetIDs.txt > gene_subset.fasta
# install Seqtk (Linux/Ubuntu)
sudo apt-get install seqtk
2) Using Python
2) Using Python