Tools‎ > ‎Sequence data‎ > ‎

Extract sequence subset

Remove sequences from fasta or fastq file

Seqtk


# get list of subset IDs
# get all geneIDs from a fasta file
cat genes.fasta | grep '>'  | cut -f 1 -d ' ' |  sed 's/>//g'   > list_of_geneIDs.txt
# edit gene list to get the subset how you need, example: get top 3 genes as subset
head -3 list list_of_geneIDs.txt > subsetIDs.txt
 gene_001
 gene_002
 gene_003



# extract subset of gene sequences based on list of IDs in .txt file
seqtk subseq   genes.fasta  subsetIDs.txt   > genes_subset.fastq


# install Seqtk (Linux/Ubuntu)
sudo apt-get install seqtk



Python