Extract sequence subset

How to extract or remove sequences from fasta or fastq file

# get list of subset IDs

# get all geneIDs from a fasta file

cat genes.fasta | grep '>' | cut -f 1 -d ' ' | sed 's/>//g' > list_of_geneIDs.txt

# edit gene list to get the subset how you need, example: get top 3 genes as subset

head -3 list list_of_geneIDs.txt > subsetIDs.txt

gene_001

gene_002

gene_003

# extract subset of gene sequences based on list of IDs in .txt file

seqtk subseq genes.fasta subsetIDs.txt > genes_subset.fastq

# install Seqtk (Linux/Ubuntu)

sudo apt-get install seqtk

2) Using Python

https://gist.github.com/brentp/477969