How to split large files
a) Using head and tail
to split a big text files into two smaller files at selected line number
head -n 1000 large_file.txt > part_1.txt # get top 1000 lines
tail -n +1001 large_file.txt > part_2.txt # get all lines starting from lines 1001 to end of file
b) Using csplit
to split a large text file into two parts at selected line number ( line=1001 defines the first line of second part )
csplit -sf part_ large_file.txt 1001
part_00 # contains lines 1..1000
part_01 # contains lines 1001...end
c) Using split
to split a large file into several parts of a maximal file size
split <size> <filename> <prefix>
# split large text file into several parts, each of size 1000 text-lines
split -l 1000 large_file.txt part_ --numeric-suffixes=1
# split large data file into several parts, each of size 2 gigabyte
split -b 2G large_file.fastq.gz part_ --numeric-suffixes=1
Alternative byte-size options:
-b 500K # 500 kilobyte
-b 10M # 10 megabyte
-b 1G # 1 gigabyte
https://en.wikipedia.org/wiki/Split_(Unix)
d) Split fasta lines, using pyfasta
to split a large fasta file into several parts of same number of sequences
# split the large file "uniref100.fasta" into 8 parts
pyfasta split -n 8 uniref100.fasta
# install pyfasta
pip install pyfasta
https://pypi.python.org/pypi/pyfasta/#command-line-interface