9. Third Party Tools
9.1. Assembly
IDBA-UD
Citation: Peng, Y., et al. (2012) IDBA-UD: a de novo assembler for single-cell and metagenomic sequencing data with highly uneven depth, Bioinformatics, 28, 1420-1428.
Version: 1.1.1
License: GPLv2
SPAdes
Citation: Prjibelski et al. (2020) Using SPAdes De Novo Assembler. Curr Protoc Bioinformatics. 2020;70(1):e102.
Version: 3.15.5
License: GPLv2
MEGAHIT
Citation: Li D. et al. (2015) MEGAHIT: an ultra-fast single-node solution for large and complex metagenomics assembly via succinct de Bruijn graph. Bioinformatics. 2015 May 15;31(10):1674-6
Version: 1.2.9
License: GPLv3
LRASM: Long Read Assembler
Citation:
Version: 0.1.0
License: GPLv3
RACON
Citation: Vaser R et al.(2017) Fast and accurate de novo genome assembly from long uncorrected reads. Genome Res. 2017 May;27(5):737-746.
Version: 1.4.13
License: MIT
Unicycler
Citation: Wick RR et al.(2017) Unicycler: Resolving bacterial genome assemblies from short and long sequencing reads. PLoS Comput Biol. 2017 Jun 8;13(6):e1005595.
Version: 0.5.0
License: GPLv3
9.2. Annotation
RATT
Citation: Otto, T.D., et al. (2011) RATT: Rapid Annotation Transfer Tool, Nucleic acids research, 39, e57.
Version:
License: GPLv3
Note: The original RATT program does not deal with reverse complement strain annotations transfer. We edited the source code to fix it.
Prokka
Citation: Seemann, T. (2014) Prokka: rapid prokaryotic genome annotation, Bioinformatics, 30,2068-2069.
Site: http://www.vicbioinformatics.com/software.prokka.shtml
Version: 1.14.5
License: GPLv2
Note: The NCBI tool tbl2asn included within PROKKA can have very slow runtimes (up to several hours) while it is dealing with numerous contigs, such as when we input metagenomic data. We modified the code to allow parallel processing using tbl2asn.
tRNAscan
Citation: Lowe, T.M. and Eddy, S.R. (1997) tRNAscan-SE: a program for improved detection of transfer RNA genes in genomic sequence, Nucleic acids research, 25, 955-964.
Version: 1.3.1
License: GPLv2
Barrnap
Citation:
Site: http://www.vicbioinformatics.com/software.barrnap.shtml
Version: 0.9
License: GPLv3
BLAST+
Citation: Camacho, C., et al. (2009) BLAST+: architecture and applications, BMC bioinformatics, 10, 421.
Site: ftp://ftp.ncbi.nlm.nih.gov/blast/executables/blast+/2.2.29/
Version: 2.10.0
License: Public domain
blastall
Citation: Altschul, S.F., et al. (1990) Basic local alignment search tool, Journal of molecular biology, 215, 403-410.
Site: ftp://ftp.ncbi.nlm.nih.gov/blast/executables/release/2.2.26/
Version: 2.2.26
License: Public domain
Phage_Finder
Citation: Fouts, D.E. (2006) Phage_Finder: automated identification and classification of prophage regions in complete bacterial genome sequences, Nucleic acids research, 34, 5839-5851.
Version: 2.1
License: GPLv3
Glimmer
Citation: Delcher, A.L., et al. (2007) Identifying bacterial genes and endosymbiont DNA with Glimmer, Bioinformatics, 23, 673-679.
Version: 302b
License: Artistic License
ARAGORN
Citation: Laslett, D. and Canback, B. (2004) ARAGORN, a program to detect tRNA genes and tmRNA genes in nucleotide sequences, Nucleic acids research, 32, 11-16.
Version: 1.2.36
License: GPLv2
Prodigal
Citation: Hyatt, D., et al. (2010) Prodigal: prokaryotic gene recognition and translation initiation site identification, BMC bioinformatics, 11, 119.
Version: 2_60
License: GPLv3
tbl2asn
Citation:
Version: 25.8 (2022 Jun 13)
License: Public Domain
Warning
tbl2asn must be compiled within the past year to function. We attempt to recompile every 6 months or so. Most recent compilation is 27 Feb 2018
AntiSmash
Citation: Kai Blin et al. (2021) antiSMASH 6.0: improving cluster detection and comparison capabilities, Nucleic Acids Research, Volume 49, Issue W1, 2 July 2021, Pages W29–W35
Version: 6.1.1
License: AGPL-3.0
9.3. Alignment
HMMER3
Citation: Eddy, S.R. (2011) Accelerated Profile HMM Searches, PLoS computational biology, 7, e1002195
Version: 3.1b1
License: GPLv3
Infernal
Citation: Nawrocki, E.P. and Eddy, S.R. (2013) Infernal 1.1: 100-fold faster RNA homology searches, Bioinformatics, 29, 2933-2935.
Version: 1.1rc4
License: GPLv3
Bowtie 2
Citation: Langmead, B. and Salzberg, S.L. (2012) Fast gapped-read alignment with Bowtie 2, Nature methods, 9, 357-359.
Version: 2.5.1
License: GPLv3
BWA
Citation: Li, H. and Durbin, R. (2009) Fast and accurate short read alignment with Burrows-Wheeler transform, Bioinformatics, 25, 1754-1760.
Version: 0.7.12
License: GPLv3
MUMmer3
Citation: Kurtz, S., et al. (2004) Versatile and open software for comparing large genomes, Genome biology, 5, R12.
Version: 3.23
License: GPLv3
RAPSearch2
Citation: Zhao et al. (2012) RAPSearch2: a fast and memory-efficient protein similarity search tool for next-generation sequencing data. Bioinformatics. 2012 Jan 1;28(1):125-6
Version: 2.23
License: GPL
minimap2
Citation: Li, H. (2018) Minimap2: fast pairwise alignment for nucleotide sequences. Bioinformatics, 34:3094-3100.
Version: 2.24
License: MIT
diamond
Citation: Buchfink, Xie C., D. Huson (2015) Fast and sensitive protein alignment using DIAMOND, Nature Methods 12, 59-60
Version: v0.9.22.123
License: GPLv3
9.4. Taxonomy Classification
Kraken2
Citation: Wood, D.E. and Salzberg, S.L. (2014) Kraken: ultrafast metagenomic sequence classification using exact alignments, Genome biology, 15, R46.
Version: 2.0.7-beta
License: MIT
Metaphlan
Citation: Blanco-Míguez A., et al. (2023) Extending and improving metagenomic taxonomic profiling with uncharacterized species using MetaPhlAn 4. Nat Biotechnol. 2023 Feb 23.
Version: 4.0.6
License: MIT License
GOTTCHA
Citation: Tracey Allen K. Freitas, Po-E Li, Matthew B. Scholz, Patrick S. G. Chain (2015) Accurate Metagenome characterization using a hierarchical suite of unique signatures. Nucleic Acids Research (DOI: 10.1093/nar/gkv180)
Version: 1.0c
License: GPLv3
GOTTCHA2
Citation:
Version: 2.1.6 BETA
License: BSD 3-Clause
9.5. Phylogeny
FastTree
Citation: Morgan N. Price, Paramvir S. Dehal, and Adam P. Arkin. 2009. FastTree: Computing Large Minimum Evolution Trees with Profiles instead of a Distance Matrix. Mol Biol Evol (2009) 26 (7): 1641-1650
Version: 2.1.9
License: GPLv2
RAxML
Citation: Stamatakis,A. 2014. RAxML version 8: A tool for phylogenetic analysis and post-analysis of large phylogenies. Bioinformatics, 30:1312-1313
Site: http://sco.h-its.org/exelixis/web/software/raxml/index.html
Version: 8.0.26
License: GPLv2
Bio::Phylo
Citation: Rutger A Vos, Jason Caravas, Klaas Hartmann, Mark A Jensen and Chase Miller, (2011). Bio::Phylo - phyloinformatic analysis using Perl. BMC Bioinformatics 12:63.
Version: 0.58
License: GPLv3
PhaME
Citation: Sanaa Afroz Ahmed, Chien-Chi Lo, Po-E Li, Karen W Davenport, Patrick S.G. Chain. From raw reads to trees: Whole genome SNP phylogenetics across the tree of life. bioRxiv doi: http://dx.doi.org/10.1101/032250
Version: 1.0
License: GPLv3
9.6. Specialty Genes
ShortBRED
Citation: Kaminski J, et al. (2015) High-specificity targeted functional profiling in microbial communities with ShortBRED. PLoS Comput Biol.18;11(12):e1004557.
Version: 0.9.4M
License: MIT
RGI (Resistance Gene Identifier)
Citation: McArthur & Wright. (2015) Bioinformatics of antimicrobial resistance in the age of molecular epidemiology. Current Opinion in Microbiology, 27, 45-50.
Version: 5.2.1
License: Apache Software License
9.7. Metagenome
MaxBin2
Citation: Wu YW, et al. (2016) MaxBin 2.0: an automated binning algorithm to recover genomes from multiple metagenomic datasets”, Bioinformatics, 32(4): 605-607, 2016.
Site: https://downloads.jbei.org/data/microbial_communities/MaxBin/MaxBin.html
Version: 2.2.6
License: BSD
CheckM
Citation: Parks DH, et al. (2015) CheckM: assessing the quality of microbial genomes recovered from isolates, single cells, and metagenomes. Genome Research, 25: 1043–1055.
Version: 1.2.2
License: GPLv3
9.8. Visualization and Graphic User Interface
jsPhyloSVG
Citation: Smits SA, Ouverney CC, (2010) jsPhyloSVG: A Javascript Library for Visualizing Interactive and Vector-Based Phylogenetic Trees on the Web. PLoS ONE 5(8): e12267.
Version: 1.55
License: GPL
JBrowse
Citation: Skinner, M.E., et al. (2009) JBrowse: a next-generation genome browser, Genome research, 19, 1630-1638.
Site: http://jbrowse.org
Version: 1.16.8
License: Artistic License 2.0/LGPLv.1
KronaTools
Citation: Ondov, B.D., Bergman, N.H. and Phillippy, A.M. (2011) Interactive metagenomic visualization in a Web browser, BMC bioinformatics, 12, 385.
Version: 2.8.1
License: BSD
JQuery
Site: http://jquery.com/
Version: 1.10.2
License: MIT
JQuery Mobile
Site: http://jquerymobile.com
Version: 1.4.3
License: CC0
DataTables
Site: https://datatables.net
Version: 1.10.11
License: MIT
jQuery File Tree
Version: 1.01
License: GPL and MIT
Raphael - JavaScript Vector Library
Version: 1.4.3
License: MIT
Tooltipster
Version: 3.2.6
License: MIT
Lazy Load XT
Version: 1.0.6
License: MIT
Plupload
Site: http://www.plupload.com
Version: 2.1.7
License: GPLv2 and OEM
hello.js
Version: 1.8.1
License: MIT
bokeh
Citation: Bokeh Development Team (2014). Bokeh: Python library for interactive visualization
Version: 0.12.10
License: BSD 3-Clause
9.9. Utility
Chromium
Citation:
Site: https://www.chromium.org
Version: 95.0.4615.0
License: Google-authored portion is released under the BSD license.
BEDTools
Citation: Quinlan, A.R. and Hall, I.M. (2010) BEDTools: a flexible suite of utilities for comparing genomic features, Bioinformatics, 26, 841-842.
Version: 2.19.1
License: GPLv2
Pilon
Citation: Walker BJ et al. (2014) Pilon: an integrated tool for comprehensive microbial variant detection and genome assembly improvement. PLoS One. 2014 Nov 19;9(11):e112963.
Version: 1.23
License: GPLv2 & MIT
R
Citation: R Core Team (2013). R: A language and environment for statistical computing. R Foundation for Statistical Computing, Vienna, Austria. URL http://www.R-project.org/.
Version: 3.6.3
License: GPLv2
R_Packages
Custom built direcotry containing all the packages required to install R packages offline
The majority of the packages were downloaded automatically using the following R commands.
# Function to get dependencies and imports for a given list of packages.
getPackages <- function(packs){ packages <- unlist( tools::package_dependencies(packs, available.packages(), which=c("Depends", "Imports"), recursive=TRUE) ) packages <- union(packs, packages) packages }
# Use the function by providing the names of the desired packages.
packages <- getPackages(c("packageName", "packageName2")) # For example #packages <- getPackages(c("MetaComp","gtable","gridExtra","devtools","phyloseq","webshot","plotly","shiny","DT","ape", "igraph", "vegan","BH","plogr","dplyr","ade4","codetools","iterators","foreach","gplots"))
# Download packages to current/desired directory.
download.packages(packages, destdir="./", type="source")
The packages specific to bioconductor (‘phyloseq’, ‘Biobase’, ‘biomformat’, ‘rhdf5’, ‘BiocGenerics’, ‘Biostrings’, ‘multtest’,’S4Vectors’,’IRanges’,’XVector’,’Rhdf5lib’,’zlibbioc’) needed to be manually downloade from the site.
stringi defaults to downloading icudt55I.zip from online, the following method, from their documentation, was used to build a custom stringi package to avoid connecting to the internet.:
1. File the `git clone https://github.com/gagolews/stringi.git` command. 2. Edit the `.Rbuildignore` file and get rid of the `^src/icu55/data` line. 3. Run `R CMD build stringi_dir_name`.
# index the downloaded packages into PACKAGES files.
require(tools) write_PACKAGES('.')
MetaComp: EDGE Taxonomy Assignments Visualization
Citation:
Version: 1.0.2
License: BSD 3-Clause
GNU_parallel
Citation: O. Tange (2011): GNU Parallel - The Command-Line Power Tool, ;login: The USENIX Magazine, February 2011:42-47
Version: 20190422
License: GPLv3
tabix
Citation:
Version: 0.2.6
License: MIT/Expat License
Primer3
Citation: Untergasser, A., et al. (2012) Primer3–new capabilities and interfaces, Nucleic acids research, 40, e115.
Version: 2.3.5
License: GPLv2
SAMtools
Citation: Li, H., et al. (2009) The Sequence Alignment/Map format and SAMtools, Bioinformatics, 25, 2078-2079.
Site: http://www.htslib.org/
Version: 1.16.1
License: MIT
FaQCs
Citation: Chienchi Lo, PatrickS.G. Chain (2014) Rapid evaluation and Quality Control of Next Generation Sequencing Data with FaQCs. BMC Bioinformatics. 2014 Nov 19;15
Version: 2.08
License: GPLv3
Seqtk
Citation: Heng Li https://github.com/lh3/seqtk
Version: 1.3
License: MIT
NanoPlot
Citation: De Coster W, et al.(2018) NanoPack: visualizing and processing long read sequencing data, Bioinformatics. 2018 Mar 14.
Version: 1.40.0
License: GPLv3
Porechop
Citation:
Version: 0.2.4
License: GPLv3
wigToBigWig
Citation: Kent, W.J., et al. (2010) BigWig and BigBed: enabling browsing of large distributed datasets, Bioinformatics, 26, 2204-2207.
Site: https://genome.ucsc.edu/goldenPath/help/bigWig.html#Ex3
Version: 4
License: Free for academic, nonprofit, and personal use. A license is required for commercial usage.
sratoolkit
Citation:
Version: 3.0.0
License: Public Domain
ea-utils
Citation: Erik Aronesty (2011) ea-utils : “Command-line tools for processing biological sequencing data”
Version: 1.1.2-537
License: MIT License
Mambaforge (Python 3)
Citation:
Version: 22.11.1-4
License: 3-clause BSD
9.10. Amplicon Analysis
QIIME2
Citation: Caporaso et al. (2010) QIIME allows analysis of high-throughput community sequencing data. Nat Methods. 2010 May;7(5):335-6
Site: http://qiime2.org/
Version: 2023.5
License: BSD 3-Clause
DETEQT: Diagnostic targeted sequencing adjudication
Citation: Conrad TA et al. (2019) Diagnostic targETEd seQuencing adjudicaTion (DETEQT): Algorithms for Adjudicating Targeted Infectious Disease Next-Generation Sequencing Panels.
Version: 0.3.0
License: GPLv3
9.11. RNA-Seq Analysis
PyPiReT: Pipeline for Reference based Transcriptomics.
Citation:
Version: 0.3.2
License: GPLv3