Search this site
Embedded Files
Metagenomics
  • Metagenomics
    • Taxonomy
      • Alpha and beta diversity
      • Strain level
      • Pangenome
      • Marker genes
        • MLST
      • 16S
        • Operational taxonomic unit (OTU)
        • 16s vs shotgun
    • Phylogeny
      • Horizontal gene transfer (HGT)
      • Lowest Common Ancestor (LCA)
      • Long branch attraction (LBA)
    • Evolution
    • Ecosystem
      • Environmental gene tags (EGTs)
      • Soil metagenomics
    • Metatranscriptomics
      • RNA-seq versus microarray
    • QC
      • Coverage depth
      • Sampling depth
        • Calculate sampling depth
      • RPKM calculation
      • Phred score (Q score)
      • Giga base pairs
      • GC content
    • Wiki
      • Microbiome
      • Orthologs and paralogs
      • Short read mapping
      • Fecal microbiota transplant
      • DNA sequencing library
      • Multiplex sequencing
      • x
      • x
      • x
      • x
        • x
      • x
      • x
      • x
  • Tools
    • 16S tools
      • DADA2
        • Conda environment
          • Segmentation fault
      • Normalization
        • CSS
      • QIIME
        • Alpha & Beta diversity
        • Install QIIME 1
          • AttributeError:axisbg
          • biom error
          • Greengenes
          • h5py BIOM error
          • PackagesNotFoundError
          • qiime_config
          • R for QIIME
          • usearch61
        • OTU biom table
          • QIIME: split OTU table
          • ValueError column index exceeds matrix dimensions
        • QIIME mapfile
        • QIIME OTU clustering
          • Error running usearch61
          • Fungi 18S
          • IOError Errno 28 No space left on device
          • SILVA
        • QIIME pre-processing
          • Cannot find fastq-join
          • join-error
          • split-error
        • Taxonomy
        • Test statistics
    • Assembly
      • Contigs
      • N50 statistics
      • MEGAHIT
    • BLAST
      • BLAST error: Too many positional arguments
      • BLAST word-size
      • BLASTn output format 6
      • BLASTx
      • E-value & Bit-score
      • FastANI
        • Install FastANI
          • Install autoconf
          • Install GLS
      • Generate_database
        • Public sequences
      • Install BLAST
      • megablast
    • Bowtie2
      • Create bowtie2 index
      • Install bowtie2
        • libtbb.so.2
        • Segmentation fault
    • Genome
      • Prokka
    • Pathogen screening
    • Phylogenetic tree
      • File format
      • Sequence-Alignment
        • Alignment viewer
      • Tree-construction
        • FastTree
        • RAxML
          • Add bipartition
          • Install RAxML
      • Tree-viewer
        • Forester
          • Java error
    • SAMtools
      • BCFtools
      • Consensus sequence
      • Converting BAM to fastq
      • Error
        • Could not parse the header line
        • different line length in sequence
      • Install
        • Error curses.h
      • Number of reads in bam file
      • SAM file format
      • SAMtools: get breadth of coverage
    • Sequence data
      • Convert fastq to fasta
      • Extract sequence subset
      • Get random subset
      • Multi-FASTA format
      • NCBI ftp genome download
        • gff to ffn
    • Shotgun sequencing
      • Alignment
        • Viewer
          • Tablet
      • Data
      • NCBI SRA files
        • Install SRA-tools
        • prefetch
        • Error
          • path not found while resolving tree within virtual file system module
        • wget download
        • old fastq-dump
      • Quality control
        • Trim Galore
      • Remove host sequences
      • Remove too short reads from fastq files
      • Fastq file format
    • Ubuntu Linux
      • awk
      • bzip2
      • Extract columns from file
      • File properties
      • gzip & tar.gz
      • Loop over list of files
        • find
        • if greater than
        • String split
      • Rename multiple files
      • rsync
      • sed
      • Shell stderr redirect
      • Show disk space usage
      • Split large file
      • Data tables
        • Transpose table
        • Sorting
    • HPC SGE cluster
      • Submit job
      • Check cluster nodes
Metagenomics

Fastq file format

Full genome shotgun sequencing


FASTQ format

Four lines define each entry in a FASTQ file

@AY01234       # sequence-ID (and description)

GCATCTCGA...   # FASTA-like (nucleotide) sequence

+              # quality header line (optional: repeated sequence-ID, usually empty)

BF7'F<0&F...   # →Quality values of each sequence position


Illumina FASTQ format

FASTQ example 

Three entries of forward read sequences (Illumina R1 FASTQ file):

@A0216:173:HJNFKDSX:3:1101:2745:1016 1:N:0:TTACGCAC+AGATGGTC

GGGAGACGAGCGTCACGTTCATGAGCGCCTCCTCGACCTCGTCGACGGAGTCGAAATTACGCGCCACCCACAGCGGAAATTCGAACGTGGCGACGTTTTCC

+

BBBFBFFFFFFFBF<FBB0B<BF<FFFFFIIII'<<FF7BF<'7<BBBBB<<BBBBBBBBBBBBBBBBBBBBBB<7BB<<B<BFB<BBBB77<BB<BBBBB

@A0216:173:HJNFKDSX:3:1101:3884:1016 1:N:0:TTACGCAC+AGATGGTC

GCTTCGCCTTGCCGCCCGCACCAAACACCGTGTCATAGTGGTAGCCGCGCGGAGTAACCAAGATGGTCTCCCCATATGAGAAACTCCAGTCGAGATTACGG

+

BBBFFFFFFFFFFFFFIBFFIIIFFIIBBF0BFFF0BF<BF<BBFFBFFB<BBFBBBF0<BBFFFBBBFBBFFFBBBF<<BBBBFFFFB<BBFFFBBBB07

@A0216:173:HJNFKDSX:3:1101:4010:1016 1:N:0:TTACGCAC+AGATGGTC

TATTTTTATGCATTTGCTTAATTTTAATAGCTGCAAAAGTTAATAAAATAAAACCAATTGCTACAAAAAAAGAAGTGCAGCTTCAAGAAGATGTAAAACTG

+

<<<<<0<00000<BB7BBBB000BB0<7BB000B<0BB'<<<'0'0'<0<00'07BB'77''07<7''0000'7<<0'0<<''7<'7'77'000'77'''7

Meaning of Illumina sequence names

@instrumentID:run:lane:tile:xPos:yPos  direction(R1 or R2):quality(N=ok filtered):index sequence pairs

→ Illumina FASTQ file format


Read more

→ wikipedia FASTQ format

→ Phred quality scores

→ Paired-end reads explained by Torsten Seemann

→ Sanger FASTQ file format

→ awk: How to add "/1" and "/2" to FASTQ sequence-IDs

→ FASTA file format


www.metagenomics.wiki

Author: Matthias Scholz

Google Sites
Report abuse
Page details
Page updated
Google Sites
Report abuse