# pan-draft: species-level models When dealing with environmental samples, it's common for Metagenome-Assembled Genomes (MAGs) obtained from genome-centric metagenomics to be incomplete and contaminated. Consequently, Genome-Scale Models (GEMs) derived from these incomplete MAGs lack a substantial portion of the metabolic potential of the corresponding species. In response, once a set of draft models is reconstructed, they can be combined into a comprehensive model specific to the taxonomic group. This approach aims to fill the gaps present in individual models by combining homology-based searches and pan-reactome analysis. Using `gapseq pan`, draft species-level models can be generated to enhance the representativeness of the taxa (species) metabolism. *For a detailed scientific description of the panDraft module, please check the publication [De Bernardini et al. (2024) Genome Biology](https://doi.org/10.1186/s13059-024-03425-1).* ### Basic usage The main input files required by pan-draft are the draft metabolic network reconstructions and the pathway predictions tables generated by the gapseq modules `gapseq find` and `gapseq draft`. **Required parameters:** - `-m|--models_path` – Lists with paths to the draft model files in RDS format ("...-draft.RDS"). - `-w|--pathways.table.path` – Lists with paths to the pathway predictions tables in tbl format ("...-all-Pathways.tbl"). There are three options to provide the lists of input files: 1. Lists of file paths separated by commas. E.g.: ```sh gapseq pan -m MAG01-draft.RDS,MAG02-draft.RDS,MAG03-draft.RDS,MAG04-draft.RDS -w MAG01-all-Pathways.tbl,MAG02-all-Pathways.tbl,MAG03-all-Pathways.tbl,MAG04-all-Pathways.tbl ``` 2. Lists of file paths using wildcards. E.g.: ```sh gapseq pan -m MAG*-draft.RDS -w MAG*-all-Pathways.tbl ``` 3. (Recommended) Path to folders containing the desired files . E.g.: ```sh mkdir -p modfiles mv MAG*-draft.RDS modfiles mv MAG*-all-Pathways.tbl modfiles gapseq pan -m modfiles/ -w modfiles/ ``` **Optional parameters:** - `-h|--help` – Help information for pan draft reconstructions. - `-t|--min.rxn.freq.in.mods` – Minimum reaction frequency (***mrf***) to include the reactions in the pan-Draft. Default: 0.06 (See details below) - `-b|--only.binary.rxn.tbl` – Perform only models comparison to get a binary table summarizing reaction presence/absence. - `-f|--output.dir` – Path to directory, where output files will be saved (default: current directory). - `-s|--sbml.no.output` – Do not save model as sbml file. ### Suggested number MAGs and mrf threshold value We recommend utilizing a minimum of 30 MAGs for the reconstruction of a pan-draft draft. However, there is no specified lower limit to this number. Clearly, the minimum reaction frequency threshold (***mrf***), which determines whether a reaction should be included in the species-level model, is meaningful only when the number of MAGs used is above a certain minimum number. The parameter (***mrf***) can take values between 0 and 1 (default 0.06). A value of 0 means that all reactions present in any of the input models will be included in the draft model, while a value of 1 means that only reactions present in all input models will be included. The mrf threshold can be modified using the option `--min.rxn.freq.in.mods`. ``` ./gapseq pan -m toy/M*-draft.RDS -c toy/M*-rxnWeights.RDS -g toy/M*-rxnXgenes.RDS -w toy/M*-all-Pathways.tbl --min.rxn.freq.in.mods 0.15 ``` ### Output The tool generates several outputs, including the draft model of the taxon (`panModel-draft.RDS`), an updated pathway table (`panModel-tmp-Pathways.tbl`), statistics on the species reactome features (`pan-reactome_stat.tsv`), and a binary matrix summarizing the presence/absence of reactions in each input model (`rxnXmod.tsv`). After reconstructing the pan-draft species-level models of the taxa of interest, the gapseq pipeline can proceed with the gap filling step. ### Full example For a full example workflow that included the pan-draft module, please see the tutorial.