pan-draft: species-level models

When dealing with environmental samples, it’s common for Metagenome-Assembled Genomes (MAGs) obtained from genome-centric metagenomics to be incomplete and contaminated. Consequently, Genome-Scale Models (GEMs) derived from these incomplete MAGs lack a substantial portion of the metabolic potential of the corresponding species. In response, once a set of draft models is reconstructed, they can be combined into a comprehensive model specific to the taxonomic group. This approach aims to fill the gaps present in individual models by combining homology-based searches and pan-reactome analysis.

Using gapseq pan, draft species-level models can be generated to enhance the representativeness of the taxa (species) metabolism.

For a detailed scientific description of the panDraft module, please check the publication De Bernardini et al. (2024) Genome Biology.

Basic usage

The main input files required by pan-draft are the draft metabolic network reconstructions and the pathway predictions tables generated by the gapseq modules gapseq find and gapseq draft.

Required parameters:

  • -m|--models_path – Lists with paths to the draft model files in RDS format (”…-draft.RDS”).

  • -w|--pathways.table.path – Lists with paths to the pathway predictions tables in tbl format (”…-all-Pathways.tbl”).

There are three options to provide the lists of input files:

  1. Lists of file paths separated by commas. E.g.:

gapseq pan -m MAG01-draft.RDS,MAG02-draft.RDS,MAG03-draft.RDS,MAG04-draft.RDS -w MAG01-all-Pathways.tbl,MAG02-all-Pathways.tbl,MAG03-all-Pathways.tbl,MAG04-all-Pathways.tbl
  1. Lists of file paths using wildcards. E.g.:

gapseq pan -m MAG*-draft.RDS -w MAG*-all-Pathways.tbl
  1. (Recommended) Path to folders containing the desired files . E.g.:

mkdir -p modfiles
mv MAG*-draft.RDS modfiles
mv MAG*-all-Pathways.tbl modfiles
gapseq pan -m modfiles/ -w modfiles/

Optional parameters:

  • -h|--help – Help information for pan draft reconstructions.

  • -t|--min.rxn.freq.in.mods – Minimum reaction frequency (mrf) to include the reactions in the pan-Draft. Default: 0.06 (See details below)

  • -b|--only.binary.rxn.tbl – Perform only models comparison to get a binary table summarizing reaction presence/absence.

  • -f|--output.dir – Path to directory, where output files will be saved (default: current directory).

  • -s|--sbml.no.output – Do not save model as sbml file.

Suggested number MAGs and mrf threshold value

We recommend utilizing a minimum of 30 MAGs for the reconstruction of a pan-draft draft. However, there is no specified lower limit to this number. Clearly, the minimum reaction frequency threshold (mrf), which determines whether a reaction should be included in the species-level model, is meaningful only when the number of MAGs used is above a certain minimum number. The parameter (mrf) can take values between 0 and 1 (default 0.06). A value of 0 means that all reactions present in any of the input models will be included in the draft model, while a value of 1 means that only reactions present in all input models will be included.

The mrf threshold can be modified using the option --min.rxn.freq.in.mods.

./gapseq pan -m toy/M*-draft.RDS -c toy/M*-rxnWeights.RDS -g toy/M*-rxnXgenes.RDS -w toy/M*-all-Pathways.tbl --min.rxn.freq.in.mods 0.15

Output

The tool generates several outputs, including the draft model of the taxon (panModel-draft.RDS), an updated pathway table (panModel-tmp-Pathways.tbl), statistics on the species reactome features (pan-reactome_stat.tsv), and a binary matrix summarizing the presence/absence of reactions in each input model (rxnXmod.tsv).

After reconstructing the pan-draft species-level models of the taxa of interest, the gapseq pipeline can proceed with the gap filling step.

Full example

For a full example workflow that included the pan-draft module, please see the tutorial.