How to split large files

a) Using head and tail

to split a big text files into two smaller files at selected line number

head -n 1000  large_file.txt > part_1.txt   # get top 1000 lines

tail -n +1001 large_file.txt > part_2.txt   # get all lines starting from lines 1001 to end of file

b) Using csplit

to split a large text file into two parts at selected line number ( line=1001  defines the first line of second part )

csplit -sf part_ large_file.txt 1001

  part_00 # contains lines 1..1000

  part_01 # contains lines 1001...end

c) Using split

to split a large file into several parts of a maximal file size

   split <size> <filename>  <prefix>

# split large text file into several parts, each of size 1000 text-lines

split -l 1000 large_file.txt part_ --numeric-suffixes=1

# split large data file into several parts, each of size 2 gigabyte 

split -b 2G large_file.fastq.gz part_ --numeric-suffixes=1

Alternative byte-size options: 

 -b 500K    # 500 kilobyte 

 -b  10M    # 10 megabyte

 -b   1G    # 1 gigabyte

d) Split fasta lines, using pyfasta

to split a large fasta file into several parts of same number of sequences

# split the large file "uniref100.fasta"  into 8 parts

pyfasta split -n 8 uniref100.fasta

# install pyfasta

pip install pyfasta

See also

→ Extract columns from file