Search this site
Embedded Files
Metagenomics
  • Metagenomics
    • Taxonomy
      • Alpha and beta diversity
      • Strain level
      • Pangenome
      • Marker genes
        • MLST
      • 16S
        • Operational taxonomic unit (OTU)
        • 16s vs shotgun
    • Phylogeny
      • Horizontal gene transfer (HGT)
      • Lowest Common Ancestor (LCA)
      • Long branch attraction (LBA)
    • Evolution
    • Ecosystem
      • Environmental gene tags (EGTs)
      • Soil metagenomics
    • Metatranscriptomics
      • RNA-seq versus microarray
    • QC
      • Coverage depth
      • Sampling depth
        • Calculate sampling depth
      • RPKM calculation
      • Phred score (Q score)
      • Giga base pairs
      • GC content
    • Wiki
      • Microbiome
      • Orthologs and paralogs
      • Short read mapping
      • Fecal microbiota transplant
      • DNA sequencing library
      • Multiplex sequencing
      • x
      • x
      • x
      • x
        • x
      • x
      • x
      • x
  • Tools
    • 16S tools
      • DADA2
        • Conda environment
          • Segmentation fault
      • Normalization
        • CSS
      • QIIME
        • Alpha & Beta diversity
        • Install QIIME 1
          • AttributeError:axisbg
          • biom error
          • Greengenes
          • h5py BIOM error
          • PackagesNotFoundError
          • qiime_config
          • R for QIIME
          • usearch61
        • OTU biom table
          • QIIME: split OTU table
          • ValueError column index exceeds matrix dimensions
        • QIIME mapfile
        • QIIME OTU clustering
          • Error running usearch61
          • Fungi 18S
          • IOError Errno 28 No space left on device
          • SILVA
        • QIIME pre-processing
          • Cannot find fastq-join
          • join-error
          • split-error
        • Taxonomy
        • Test statistics
    • Assembly
      • Contigs
      • N50 statistics
      • MEGAHIT
    • BLAST
      • BLAST error: Too many positional arguments
      • BLAST word-size
      • BLASTn output format 6
      • BLASTx
      • E-value & Bit-score
      • FastANI
        • Install FastANI
          • Install autoconf
          • Install GLS
      • Generate_database
        • Public sequences
      • Install BLAST
      • megablast
    • Bowtie2
      • Create bowtie2 index
      • Install bowtie2
        • libtbb.so.2
        • Segmentation fault
    • Genome
      • Prokka
    • Pathogen screening
    • Phylogenetic tree
      • File format
      • Sequence-Alignment
        • Alignment viewer
      • Tree-construction
        • FastTree
        • RAxML
          • Add bipartition
          • Install RAxML
      • Tree-viewer
        • Forester
          • Java error
    • SAMtools
      • BCFtools
      • Consensus sequence
      • Converting BAM to fastq
      • Error
        • Could not parse the header line
        • different line length in sequence
      • Install
        • Error curses.h
      • Number of reads in bam file
      • SAM file format
      • SAMtools: get breadth of coverage
    • Sequence data
      • Convert fastq to fasta
      • Extract sequence subset
      • Get random subset
      • Multi-FASTA format
      • NCBI ftp genome download
        • gff to ffn
    • Shotgun sequencing
      • Alignment
        • Viewer
          • Tablet
      • Data
      • NCBI SRA files
        • Install SRA-tools
        • prefetch
        • Error
          • path not found while resolving tree within virtual file system module
        • wget download
        • old fastq-dump
      • Quality control
        • Trim Galore
      • Remove host sequences
      • Remove too short reads from fastq files
      • Fastq file format
    • Ubuntu Linux
      • awk
      • bzip2
      • Extract columns from file
      • File properties
      • gzip & tar.gz
      • Loop over list of files
        • find
        • if greater than
        • String split
      • Rename multiple files
      • rsync
      • sed
      • Shell stderr redirect
      • Show disk space usage
      • Split large file
      • Data tables
        • Transpose table
        • Sorting
    • HPC SGE cluster
      • Submit job
      • Check cluster nodes
Metagenomics

MEGAHIT

Genome assembly

"An ultra-fast single-node solution for large and complex metagenomics assembly via succinct de Bruijn graph"

Microbial community assembly (metagenomics)

http://www.ncbi.nlm.nih.gov/pubmed/25609793

Install

https://github.com/voutcn/megahit

Example

Input: metagenomics sample as paired-end fastq files _R1 and _R2

megahit -1 SAMPLE_R1.fastq.gz -2 SAMPLE_R2.fastq.gz -t 12 -o megahit_result

-t 12 use 12 threads (number of parallel processors)

-m 0.5 use 50% of available memory (default: 90%, -m 0.9)

Result: assembled contigs are in fasta file:

megahit_result/final.contigs.fa

Intro & Tutorial

https://github.com/voutcn/megahit

https://github.com/voutcn/megahit/wiki/An-example-of-real-assembly

Memory settings

https://github.com/voutcn/megahit/wiki/MEGAHIT-Memory-setting

Help

megahit -h

MEGAHIT v1.0.2

Copyright (c) The University of Hong Kong & L3 Bioinformatics Limited

contact: Dinghua Li <dhli@cs.hku.hk>

Usage:

megahit [options] {-1 <pe1> -2 <pe2> | --12 <pe12> | -r <se>} [-o <out_dir>]

Input options that can be specified for multiple times (supporting

plain text and gz/bz2 extensions)

-1 <pe1> comma-separated list of fasta/q paired-end #1 files,

paired with files in <pe2>

-2 <pe2> comma-separated list of fasta/q paired-end #2 files,

paired with files in <pe1>

--12 <pe12> comma-separated list of interleaved fasta/q

paired-end files

-r/--read <se> comma-separated list of fasta/q single-end files

Input options that can be specified for at most ONE time (not recommended):

--input-cmd <cmd> command that outputs fasta/q reads to stdout;

taken by MEGAHIT as SE reads

Optional Arguments:

Basic assembly options:

--min-count <int> minimum multiplicity for filtering (k_min+1)-mers, default 2

--k-min <int> minimum kmer size (<= 127), must be odd number, default 21

--k-max <int> maximum kmer size (<= 127), must be odd number, default 99

--k-step <int> increment of kmer size of each iteration (<= 28),

must be even number, default 20

--k-list <int,int,..> comma-separated list of kmer size (all must be odd,

in the range 15-127, increment <= 28);

override `--k-min', `--k-max' and `--k-step'

Advanced assembly options:

--no-mercy do not add mercy kmers

--no-bubble do not merge bubbles

--merge-level <l,s> merge complex bubbles of length <= l*kmer_size

and similarity >= s, default 20,0.98

--prune-level <int> strength of local low depth pruning (0-2), default 2

--low-local-ratio <float> ratio threshold to define low local coverage

contigs, default 0.2

--max-tip-len <int> remove tips less than this value; default 2*k for

iteration of kmer_size=k

--no-local disable local assembly

--kmin-1pass use 1pass mode to build SdBG of k_min

Presets parameters:

--presets <str> override a group of parameters;

possible values:

meta '--min-count 2 --k-list 21,41,61,81,99'

(generic metagenomes, default)

meta-sensitive '--min-count 2 --k-list 21,31,41,51,61,71,81,91,99'

(more sensitive but slower)

meta-large '--min-count 2 --k-list 27,37,47,57,67,77,87'

(large & complex metagenomes, like soil)

bulk '--min-count 3 --k-list 31,51,71,91,99 --no-mercy'

(experimental, standard bulk sequencing with >= 30x depth)

single-cell '--min-count 3 --k-list 21,33,55,77,99,121 --merge_level 20,0.96'

(experimental, single cell data)

Hardware options:

-m/--memory <float> max memory in byte to be used in SdBG construction;

default 0.9 (if set between 0-1, fraction of the

machine's total memory)

--mem-flag <int> SdBG builder memory mode, default 1

0: minimum; 1: moderate; others: use all memory

specified by '-m/--memory'.

--use-gpu use GPU

--gpu-mem <float> GPU memory in byte to be used. Default: auto detect

to use up all free GPU memory.

-t/--num-cpu-threads <int> number of CPU threads, at least 2.

Default: auto detect to use all CPU threads.

Output options:

-o/--out-dir <string> output directory, default ./megahit_out

--out-prefix <string> output prefix (the contig file will be

OUT_DIR/OUT_PREFIX.contigs.fa)

--min-contig-len <int> minimum length of contigs to output, default 200

--keep-tmp-files keep all temporary files

Other Arguments:

--continue continue a MEGAHIT run from its last available check point.

please set the output directory correctly when using this option.

-h/--help print the usage message

-v/--version print version

--verbose verbose mode

https://github.com/voutcn/megahit

www.metagenomics.wiki

Author: Matthias Scholz

Google Sites
Report abuse
Page details
Page updated
Google Sites
Report abuse