How to split large files

a) Using head and tail

to split a big text files into two smaller files at selected line number

head -n 1000 large_file.txt > part_1.txt # get top 1000 lines

tail -n +1001 large_file.txt > part_2.txt # get all lines starting from lines 1001 to end of file

b) Using csplit

to split a large text file into two parts at selected line number ( line=1001 defines the first line of second part )

csplit -sf part_ large_file.txt 1001

part_00 # contains lines 1..1000

part_01 # contains lines 1001...end

c) Using split

to split a large file into several parts of a maximal file size

split <size> <filename> <prefix>

# split large text file into several parts, each of size 1000 text-lines

split -l 1000 large_file.txt part_ --numeric-suffixes=1

# split large data file into several parts, each of size 2 gigabyte

split -b 2G large_file.fastq.gz part_ --numeric-suffixes=1

Alternative byte-size options:

-b 500K # 500 kilobyte

-b 10M # 10 megabyte

-b 1G # 1 gigabyte

d) Split fasta lines, using pyfasta

to split a large fasta file into several parts of same number of sequences

# split the large file "uniref100.fasta" into 8 parts

pyfasta split -n 8 uniref100.fasta

# install pyfasta

pip install pyfasta