Download short-read sequencing data from NCBI Sequence Read Archive (SRA)
Install SRA tools
→ Install SRA-toolkit (fasterq-dump, prefetch,... )
Pre-download using prefetch (optional)
To separate download and conversion tasks, SRA files can be downloaded in advance by using prefetch.
→ SRA pre-download (using prefetch)
Download SRA samples as FASTQ using fasterq-dump
Example: download NCBI-SRA sample SRR649944 and save sequence data in location FASTQ_files/
Additionally using gzip to compress .fastq files
fasterq-dump --split-3 SRR649944 -O FASTQ_files/
gzip FASTQ_files/*.fastq
ls FASTQ_files/
SRR649944_1.fastq.gz
SRR649944_2.fastq.gz
Options
# filter read-length of SRA samples
fasterq-dump --split-3 SRR649944 -O FASTQ_files/ --min-read-len 80
option: --min-read-len 80 extracts only reads >= 80bp from SRA file
# add a large and fast temporary file location (RAM or SSD disk) used for tmp files during .sra to .fastq conversion
fasterq-dump --split-3 SRR649944 -O FASTQ_files/ -t /tmp/scratch/
# change temporary SRA download location
fasterq-dump downloads a temporary SRA file into the default directory $TMPDIR ( usually TMPDIR=/tmp/ ). To avoid that SRA files exceed the space limit of the local temporary directory, another download location can be defined as CACHE in → SRA-toolkit configuration.
Read more
https://github.com/ncbi/sra-tools/wiki/HowTo:-fasterq-dump
https://hpc.nih.gov/apps/sratoolkit.html
fasterq-dump --help