Metagenomics‎ > ‎Glossary‎ > ‎

RPKM calculation

RPKM - Reads per kilo base per million mapped reads

Normalization for comparing gene coverage values. RPKM corrects differences in both: sample sequencing depth and gene length.

RPKM - Reads per kilo base per million mapped reads

Formula
RPKM =   numReads / ( geneLength/1000 * totalNumReads/1,000,000 )

numReads - number of reads mapped to a gene sequence
geneLength - length of the gene sequence
totalNumReads - total number of mapped reads of a sample


CPM - counts per million

Formula
CPM = readsMappedToGene * 1/totalNumReads * 106

totalNumReads - total number of mapped reads of a sample
readsMappedToGene - number of reads mapped to a selected gene



Read more
Note, when applied to gene expression RNA-seq data,
RPKM can show a bias for lowly expressed genes.
http://bib.oxfordjournals.org/content/early/2012/09/15/bib.bbs046.long#sec-2

TPM (transcripts per million) as alternative to RPKM (in case of comparing identical genes across samples)
http://blog.nextgenetics.net/?e=51#body-anchor
http://www.ncbi.nlm.nih.gov/pubmed/22872506