CSS

Cumulative Sum Scaling (CSS)

Cumulative Sum Scaling (CSS) is a median-like quantile normalization which corrects differences in sampling depth (library size). While standard relative abundance (fraction/percentage) normalization re-scales all samples to the same total sum (100%), CSS keeps a variation in total counts between samples. CSS re-scales the samples based on a subset (quartile) of lower abundant taxa (relatively constant and independent), thereby excluding the impact of (study dominating) high abundant taxa.

References

Paulson JN, Stine OC, Bravo HC, Pop M. Differential abundance analysis for microbial marker-gene surveys. Nature methods (2013).

http://cbcb.umd.edu/software/metagenomeSeq

https://bioconductor.org/packages/release/bioc/html/metagenomeSeq.html

Install CSS containing R-package metagenomeSeq

R

source("http://bioconductor.org/biocLite.R")

biocLite("metagenomeSeq")

# With R version 3.5 or greater, install Bioconductor packages using BiocManager

if (!requireNamespace("BiocManager", quietly = TRUE))

install.packages("BiocManager")

BiocManager::install("metagenomeSeq")

https://www.bioconductor.org/packages/release/bioc/html/metagenomeSeq.html

Example of CSS normalization (applied to dada2 output)

Normalizing the 16S OTU table (read count output) of dada2 into CSS normalized read counts

R

# import OTU table of dada2

# load dada2 output including the OTU table "seqtab.nochim" (read count data of the OTU-like amplicon sequence variants, ASV)

load("dada2_output.RData")

# transpose OTU table and convert into data.frame-format

OTU_read_count = as.data.frame(t(seqtab.nochim))

# OTU read count table: rows are taxa and columns are samples

dim(OTU_read_count)

210 60 # (example: 210 taxa and 60 samples)

class(OTU_read_count)

"data.frame"

# CSS normalization

# import package metagenomeSeq, containing CSS functions

library(metagenomeSeq)

# convert OTU table into package format

metaSeqObject = newMRexperiment(OTU_read_count)

# CSS normalization

metaSeqObject_CSS = cumNorm( metaSeqObject , p=cumNormStatFast(metaSeqObject) )

# convert CSS normalized data into data.frame-formatted OTU table (log transformed data)

OTU_read_count_CSS = data.frame(MRcounts(metaSeqObject_CSS, norm=TRUE, log=TRUE))

Comments

Improvement of CSS normalized data might be partly caused by the subsequently applied log transformation, see

Costea PI, Zeller G, Sunagawa S, Bork P. A fair comparison. Nature Methods (2014).