Metagenomics‎ > ‎Glossary‎ > ‎Sequence assembly‎ > ‎

N50 statistics

N50 is a measure to describe the quality of assembled genomes that are fragmented in contigs of different length.

The N50 is defined as the minimum contig length needed to cover 50% of the genome.

It means, half of the genome sequence is in contigs larger than or equal the N50 contig size. Or, that the sum of the lengths of all contigs of size N50 or longer contain at least 50 percent of the total genome sequence.

Intuitively, to get the N50 contig length, simply sort all contigs of a genome by their length, go to the base in the center at 50% of the total genome length, get the contig size to which this base belongs to and you have the N50 contig length.

N50 is not simply the median over all contigs lengths, it is a length weighted median that gives a more robust quality value than a simple median, see explanation by Keith Bradnam:
http://www.acgt.me/blog/2013/7/8/why-is-n50-used-as-an-assembly-metric.html


Example
For an assembly fragmented into 4 contigs with lengths: 1, 2, 4, and 5 kb (total size = 12 kb), half of the genome length is covered by the two largest contigs, and hence N50=4kb is the minimum contig length required to cover 50 percent of the assembled genome sequence.


N10 is the minimum contig length to cover 10 percent of the genome.
N90 is the minimum contig length to cover 90 percent of the genome.