Installation
Ubuntu/Debian/Mint
# Installation of main system dependencies
sudo apt install ncbi-blast+ git libglpk-dev r-base-core bc curl libcurl4-openssl-dev libssl-dev libsbml5-dev diamond-aligner mmseqs2
# installation of required R-packages
R -e 'install.packages(c("data.table", "stringr", "getopt", "R.utils", "stringi", "jsonlite", "httr", "pak"))'
R -e 'if (!requireNamespace("BiocManager", quietly = TRUE)) install.packages("BiocManager"); BiocManager::install("Biostrings")'
R -e 'pak::pkg_install("Waschina/cobrar")'
# Download latest gapseq version from github
git clone https://github.com/jotech/gapseq && cd gapseq
# Download reference sequence database (Bacteria and Archaea)
bash ./gapseq update-sequences -t Bacteria
bash ./gapseq update-sequences -t Archaea
Test your installation with:
./gapseq test
Centos/Fedora/RHEL
Note
The recommended aligners diamond and mmseqs2 are not installed with the following commands, because these tools are currently not available in the YUM package manager. To use diamond/mmseqs2 with gapseq, please follow the installation instructions here: (diamond, mmseqs2).
# Installation of main system dependencies
sudo yum install ncbi-blast+ git glpk-devel hmmer bc libcurl-devel curl openssl-devel libsbml-devel
# installation of required R-packages
R -e 'install.packages(c("data.table", "stringr", "getopt", "R.utils", "stringi", "jsonlite", "httr", "pak"))'
R -e 'if (!requireNamespace("BiocManager", quietly = TRUE)) install.packages("BiocManager"); BiocManager::install("Biostrings")'
R -e 'pak::pkg_install("Waschina/cobrar")'
# Download latest gapseq version from github
git clone https://github.com/jotech/gapseq && cd gapseq
# Download reference sequence database (Bacteria and Archaea)
bash ./gapseq update-sequences -t Bacteria
bash ./gapseq update-sequences -t Archaea
Test your installation with:
./gapseq test
MacOS
Using homebrew. Please note: Some Mac-Users reported difficulties to install gapseq on MacOS using the following commands. The issues are mainly due to some Mac-specific functioning of central programs such as sed, awk, and grep. If you are experiencing issues, we recommend to try to install gapseq in an own conda environment using the steps described below.
# Installation of main system dependencies
brew install coreutils binutils git glpk blast r grep bc gzip curl bc brewsci/bio/libsbml diamond mmseqs2
# installation of required R-packages
R -e 'install.packages(c("data.table", "stringr", "getopt", "R.utils", "stringi", "jsonlite", "httr"))'
R -e 'if (!requireNamespace("BiocManager", quietly = TRUE)) install.packages("BiocManager"); BiocManager::install("Biostrings")'
R -e 'pak::pkg_install("Waschina/cobrar")'
# Download latest gapseq version from github
git clone https://github.com/jotech/gapseq && cd gapseq
# Download reference sequence database (Bacteria and Archaea)
bash ./gapseq update-sequences -t Bacteria
bash ./gapseq update-sequences -t Archaea
Test your installation with:
./gapseq test
Some additional discussion and and trouble shooting can be found here: 1, 2, 3.
conda
Install Mini-/Anaconda: Follow the instructions provided by conda to install Anaconda/Miniconda.
Using conda, you can either install a specific release of gapseq or the latest development version of gapseq:
Stable gapseq release using conda
Thanks to @cmkobel, a gapseq conda package is available for linux and osx platforms:
conda create -c conda-forge -c bioconda -n gapseq gapseq
# activate gapseq environment
conda activate gapseq
# Download reference sequence database
gapseq update-sequences -t Bacteria
gapseq update-sequences -t Archaea
Development version using conda
The following commands create a conda environment for gapseq (named gapseq-dev) and installs gapseq along with all it’s dependencies.
# Cloning the development version of gapseq
git clone https://github.com/jotech/gapseq
cd gapseq
# Create and activate a conda environment "gapseq-dev"
conda env create -n gapseq-dev --file gapseq_env.yml
conda activate gapseq-dev
# Download reference sequence database (Bacteria and Archaea)
bash ./gapseq update-sequences -t Bacteria
bash ./gapseq update-sequences -t Archaea
Optional dependencies
Alternative aligners
In addition, gapseq can use the sequence alignment tools diamond and mmseqs2. Using one of these tools can reduce the runtime of the modules gapseq find and gapseq find-transport. On Ubuntu/Debian/Mint Linux systems, these tools can be installed via apt:
sudo apt install mmseqs2 diamond-aligner
For other installation options, please follow the installation instructions provided on the websites of the two tools. Once installed you can specify the alignment tool using the option -A, e.g.:
# diamond
gapseq find -p all -A diamond genome.faa.gz
gapseq find-transport -p all -A diamond genome.faa.gz
# mmseqs2
gapseq find -p all -A mmseqs2 genome.faa.gz
gapseq find-transport -p all -A mmseqs2 genome.faa.gz
Nucleotide genome to protein genome translation
gapseq expects the input genome as protein amino acid sequences. If the input sequence is a genomic nucleotide fasta file, gapseq can automatically predict open reading frames (ORFs) and translates these to the respective amino acid sequences. For this, gapseq uses pyrodigal. It can be easily installed from PyPi:
pip install pyrodigal
For other installation options, please follow these instructions.
SBML support
The Systems Biology markup Language (SBML) can be used to exchange model files between gapseq and other programs.
The above installation instructions for linux systems, MacOS, and using conda should already include the SBML support. If there were no errors during the installation, you should be all set using gapseq with SBML format exports.
Occasionally, the installation can cause some issues that is why SBML is listed as optional dependency.
There should be a libsbml package (version 5.18.0 or later) available for most linux distributions:
sudo apt install libsbml5-dev # debian/ubuntu
sudo yum install libsbml-devel # fedora/centos
For MacOS, libsbml is part of homebrew from the brewsci/bio tap.
If you want to manually install libsbml, please make sure, that libsbml is installed together with its extensions “fbc” and “groups”. Those are extensions of the libsbml library, which cobrar requires.
cplex solver support
We recommend using cplex as LP-solver as it is usually faster than glpk. The cplex solver is included in the IBM ILOG CPLEX Optimization Studio, which might be at no charge to students and academics through the IBM Academic Initiative program (see here). Please follow the installation instructions for cplex provided by IBM. The R-package for the interface between R and the cplex solver can be obtained from github (Waschina/cobrarCPLEX). For cobrarCPLEX installation please refer to instructions here.
Troubleshooting
NCBI blast version 2.2.30 (10/2014) or newer is needed. If your distribution only contains an older version, try to download a binary directly from ncbi
Older blast version could cause the
Error: Unknown argument: "qcov_hsp_perc"
If you are getting the installation error
'lib = "../R/library"' is not writablewhile installing the R packages, then try this command beforehand:
Rscript -e 'if( file.access(Sys.getenv("R_LIBS_USER"), mode=2) == -1 ) dir.create(path = Sys.getenv("R_LIBS_USER"), showWarnings = FALSE, recursive = TRUE)'