"An ultra-fast single-node solution for large and complex metagenomics assembly via succinct de Bruijn graph"

Microbial community assembly (metagenomics)

http://www.ncbi.nlm.nih.gov/pubmed/25609793

Install

https://github.com/voutcn/megahit

Example

Input: metagenomics sample as paired-end fastq files _R1 and _R2

megahit -1 SAMPLE_R1.fastq.gz -2 SAMPLE_R2.fastq.gz -t 12 -o megahit_result

-t 12 use 12 threads (number of parallel processors)

-m 0.5 use 50% of available memory (default: 90%, -m 0.9)

Result: assembled contigs are in fasta file:

megahit_result/final.contigs.fa

Intro & Tutorial

https://github.com/voutcn/megahit

https://github.com/voutcn/megahit/wiki/An-example-of-real-assembly

Memory settings

https://github.com/voutcn/megahit/wiki/MEGAHIT-Memory-setting

Help

megahit -h

MEGAHIT v1.0.2

Copyright (c) The University of Hong Kong & L3 Bioinformatics Limited

contact: Dinghua Li <dhli@cs.hku.hk>

Usage:

megahit [options] {-1 <pe1> -2 <pe2> | --12 <pe12> | -r <se>} [-o <out_dir>]

Input options that can be specified for multiple times (supporting

plain text and gz/bz2 extensions)

-1 <pe1> comma-separated list of fasta/q paired-end #1 files,

paired with files in <pe2>

-2 <pe2> comma-separated list of fasta/q paired-end #2 files,

paired with files in <pe1>

--12 <pe12> comma-separated list of interleaved fasta/q

paired-end files

-r/--read <se> comma-separated list of fasta/q single-end files

Input options that can be specified for at most ONE time (not recommended):

--input-cmd <cmd> command that outputs fasta/q reads to stdout;

taken by MEGAHIT as SE reads

Optional Arguments:

Basic assembly options:

--min-count <int> minimum multiplicity for filtering (k_min+1)-mers, default 2

--k-min <int> minimum kmer size (<= 127), must be odd number, default 21

--k-max <int> maximum kmer size (<= 127), must be odd number, default 99

--k-step <int> increment of kmer size of each iteration (<= 28),

must be even number, default 20

--k-list <int,int,..> comma-separated list of kmer size (all must be odd,

in the range 15-127, increment <= 28);

override `--k-min', `--k-max' and `--k-step'

Advanced assembly options:

--no-mercy do not add mercy kmers

--no-bubble do not merge bubbles

--merge-level <l,s> merge complex bubbles of length <= l*kmer_size

and similarity >= s, default 20,0.98

--prune-level <int> strength of local low depth pruning (0-2), default 2

--low-local-ratio <float> ratio threshold to define low local coverage

contigs, default 0.2

--max-tip-len <int> remove tips less than this value; default 2*k for

iteration of kmer_size=k

--no-local disable local assembly

--kmin-1pass use 1pass mode to build SdBG of k_min

Presets parameters:

--presets <str> override a group of parameters;

possible values:

meta '--min-count 2 --k-list 21,41,61,81,99'

(generic metagenomes, default)

meta-sensitive '--min-count 2 --k-list 21,31,41,51,61,71,81,91,99'

(more sensitive but slower)

meta-large '--min-count 2 --k-list 27,37,47,57,67,77,87'

(large & complex metagenomes, like soil)

bulk '--min-count 3 --k-list 31,51,71,91,99 --no-mercy'

(experimental, standard bulk sequencing with >= 30x depth)

single-cell '--min-count 3 --k-list 21,33,55,77,99,121 --merge_level 20,0.96'

(experimental, single cell data)

Hardware options:

-m/--memory <float> max memory in byte to be used in SdBG construction;

default 0.9 (if set between 0-1, fraction of the

machine's total memory)

--mem-flag <int> SdBG builder memory mode, default 1

0: minimum; 1: moderate; others: use all memory

specified by '-m/--memory'.

--use-gpu use GPU

--gpu-mem <float> GPU memory in byte to be used. Default: auto detect

to use up all free GPU memory.

-t/--num-cpu-threads <int> number of CPU threads, at least 2.

Default: auto detect to use all CPU threads.

Output options:

-o/--out-dir <string> output directory, default ./megahit_out

--out-prefix <string> output prefix (the contig file will be

OUT_DIR/OUT_PREFIX.contigs.fa)

--min-contig-len <int> minimum length of contigs to output, default 200

--keep-tmp-files keep all temporary files

Other Arguments:

--continue continue a MEGAHIT run from its last available check point.

please set the output directory correctly when using this option.

-h/--help print the usage message

-v/--version print version

--verbose verbose mode

https://github.com/voutcn/megahit