Download short-read sequencing data from NCBI Sequence Read Archive (SRA)

Install SRA tools

→ Install SRA-toolkit (fasterq-dump, prefetch,... )


Pre-download using prefetch (optional)

To separate download and conversion tasks, SRA files can be downloaded in advance by using prefetch.

→ SRA pre-download (using prefetch)



Download SRA samples as FASTQ using fasterq-dump

Example: download NCBI-SRA sample SRR649944 and save sequence data in location FASTQ_files/

Additionally using gzip to compress .fastq files

fasterq-dump --split-3 SRR649944 -O FASTQ_files/

gzip FASTQ_files/*.fastq

ls FASTQ_files/

SRR649944_1.fastq.gz

SRR649944_2.fastq.gz


Options

# filter read-length of SRA samples

fasterq-dump --split-3 SRR649944 -O FASTQ_files/ --min-read-len 80

option: --min-read-len 80 extracts only reads >= 80bp from SRA file


# add a large and fast temporary file location (RAM or SSD disk) used for tmp files during .sra to .fastq conversion

fasterq-dump --split-3 SRR649944 -O FASTQ_files/ -t /tmp/scratch/


# change temporary SRA download location

fasterq-dump downloads a temporary SRA file into the default directory $TMPDIR ( usually TMPDIR=/tmp/ ). To avoid that SRA files exceed the space limit of the local temporary directory, another download location can be defined as CACHE in → SRA-toolkit configuration.


Read more

https://github.com/ncbi/sra-tools/wiki/HowTo:-fasterq-dump

https://hpc.nih.gov/apps/sratoolkit.html

fasterq-dump --help