CSS
Cumulative Sum Scaling (CSS)
Cumulative Sum Scaling (CSS) is a median-like quantile normalization which corrects differences in sampling depth (library size). While standard relative abundance (fraction/percentage) normalization re-scales all samples to the same total sum (100%), CSS keeps a variation in total counts between samples. CSS re-scales the samples based on a subset (quartile) of lower abundant taxa (relatively constant and independent), thereby excluding the impact of (study dominating) high abundant taxa.
References
Paulson JN, Stine OC, Bravo HC, Pop M. Differential abundance analysis for microbial marker-gene surveys. Nature methods (2013).
http://cbcb.umd.edu/software/metagenomeSeq
https://bioconductor.org/packages/release/bioc/html/metagenomeSeq.html
Install CSS containing R-package metagenomeSeq
R
source("http://bioconductor.org/biocLite.R")
biocLite("metagenomeSeq")
# With R version 3.5 or greater, install Bioconductor packages using BiocManager
if (!requireNamespace("BiocManager", quietly = TRUE))
install.packages("BiocManager")
BiocManager::install("metagenomeSeq")
https://www.bioconductor.org/packages/release/bioc/html/metagenomeSeq.html
Example of CSS normalization (applied to dada2 output)
Normalizing the 16S OTU table (read count output) of dada2 into CSS normalized read counts
R
# import OTU table of dada2
# load dada2 output including the OTU table "seqtab.nochim" (read count data of the OTU-like amplicon sequence variants, ASV)
load("dada2_output.RData")
# transpose OTU table and convert into data.frame-format
OTU_read_count = as.data.frame(t(seqtab.nochim))
# OTU read count table: rows are taxa and columns are samples
dim(OTU_read_count)
210 60 # (example: 210 taxa and 60 samples)
class(OTU_read_count)
"data.frame"
# CSS normalization
# import package metagenomeSeq, containing CSS functions
library(metagenomeSeq)
# convert OTU table into package format
metaSeqObject = newMRexperiment(OTU_read_count)
# CSS normalization
metaSeqObject_CSS = cumNorm( metaSeqObject , p=cumNormStatFast(metaSeqObject) )
# convert CSS normalized data into data.frame-formatted OTU table (log transformed data)
OTU_read_count_CSS = data.frame(MRcounts(metaSeqObject_CSS, norm=TRUE, log=TRUE))
Comments
Improvement of CSS normalized data might be partly caused by the subsequently applied log transformation, see
Costea PI, Zeller G, Sunagawa S, Bork P. A fair comparison. Nature Methods (2014).