Tools‎ > ‎16S tools‎ > ‎Normalization‎ > ‎

CSS

Cumulative Sum Scaling (CSS)

Cumulative Sum Scaling (CSS) is a median-like quantile normalization which corrects differences in sampling depth (library size). While standard relative abundance (fraction/percentage) normalization re-scales all samples to the same total sum (100%), CSS keeps a variation in total counts between samples. CSS re-scales the samples based on a subset (quartile) of lower abundant taxa (relatively constant and independent), thereby excluding the impact of (study dominating) high abundant taxa.





Install CSS containing R-package metagenomeSeq

R
 source("http://bioconductor.org/biocLite.R")
 biocLite("metagenomeSeq")


Example of CSS normalization (applied to dada2 output)

Normalizing the 16S OTU table (read count output) of dada2 into CSS normalized read counts


R

# import OTU table of dada2
  
   # load dada2 output including the OTU table "seqtab.nochim" (read count data of the OTU-like amplicon sequence variants, ASV)
 load("dada2_output.RData")

   # transpose OTU table and convert into data.frame-format
 OTU_read_count = as.data.frame(t(seqtab.nochim))

   # OTU read count table: rows are taxa and columns are samples
 dim(OTU_read_count) 
       210  60           #  (example: 210 taxa and 60 samples)
 class(OTU_read_count)
   "data.frame"



# CSS normalization
  # import package metagenomeSeq, containing CSS functions
    library(metagenomeSeq)
  # convert OTU table into package format
    metaSeqObject      = newMRexperiment(OTU_read_count
  # CSS normalization
    metaSeqObject_CSS  = cumNorm(metaSeqObject, p = cumNormStatFast(metaSeqObject) )
  # convert CSS normalized data into data.frame-formatted OTU table (log transformed data)
    OTU_read_count_CSS = data.frame(MRcounts(metaSeqObject_CSS, norm=TRUE, log=TRUE)) #





Comments

Improvement of CSS normalized data might be partly caused by the subsequently applied log transformation, see