RPKM calculation

Quality control

Normalization for comparing gene coverage values. RPKM corrects differences in both: sample sequencing depth and gene length.

RPKM - Reads per kilo base per million mapped reads

Formula

RPKM = numReads / ( geneLength/1000 * totalNumReads/1,000,000 )

numReads - number of reads mapped to a gene sequence

geneLength - length of the gene sequence

totalNumReads - total number of mapped reads of a sample


CPM - counts per million

Formula

CPM = readsMappedToGene * 1/totalNumReads * 106

totalNumReads - total number of mapped reads of a sample

readsMappedToGene - number of reads mapped to a selected gene


Read more

RPKM is introduced in

http://www.ncbi.nlm.nih.gov/pubmed/18516045

Note, when applied to gene expression RNA-seq data,

RPKM can show a bias for lowly expressed genes.

http://bib.oxfordjournals.org/content/early/2012/09/15/bib.bbs046.long#sec-2

TPM (transcripts per million) as alternative to RPKM (in case of comparing identical genes across samples)

http://blog.nextgenetics.net/?e=51#body-anchor

http://www.ncbi.nlm.nih.gov/pubmed/22872506