Normalization for comparing gene coverage values. RPKM corrects differences in both: sample sequencing depth and gene length.
RPKM - Reads per kilo base per million mapped reads
Formula
RPKM = numReads / ( geneLength/1000 * totalNumReads/1,000,000 )
numReads - number of reads mapped to a gene sequence
geneLength - length of the gene sequence
totalNumReads - total number of mapped reads of a sample
CPM - counts per million
Formula
CPM = readsMappedToGene * 1/totalNumReads * 106
totalNumReads - total number of mapped reads of a sample
readsMappedToGene - number of reads mapped to a selected gene
Read more
RPKM is introduced in
http://www.ncbi.nlm.nih.gov/pubmed/18516045
Note, when applied to gene expression RNA-seq data,
RPKM can show a bias for lowly expressed genes.
http://bib.oxfordjournals.org/content/early/2012/09/15/bib.bbs046.long#sec-2
TPM (transcripts per million) as alternative to RPKM (in case of comparing identical genes across samples)