Prediction of Pseudomonas lipopeptide structure

You are here:
< All subjects

Potential Pseudomonas lipopeptide biosynthetic gene clusters (BGCs) on annotated genomes can be readily located using AntiSMASH (Blin, 2021). The identified BCGs are inspected for a distinct organization similar to CLiPs of the Mycin family (Girard, 2020) or the presence of transport (pleABC) and regulatory (luxR) genes flanking the predicted NRPS genes for most other lipopeptide families (Girard, 2021). The possibility that two non-contiguous genomic regions constitute a single NRPS system needs to be considered. The NRPSs feature a typical modular organization, with each module consisting of three domains ordered as C (condensation), A (adenylation), and T (thiolation). Epimerization domains such as present in pyoverdine NRPS, are absent from the lipopeptide NRPSs. With the exception of Mycins, the terminating NRPS carries a tandem of TE (thioesterase) domains for peptide release, often with concomitant cyclization. A lipopeptide family membership can be provisionally inferred according to the number of modules and their distribution across the NRPSs. Further comparative analysis of the investigated NRPS sequences with those of the known family members can be used to verify this assignment and assess overall homology with known or predicted NRPS systems. Phylogeny-based prediction of the peptide sequence and its tentative stereochemistry involves scrutiny of the A-domains and C-domains, respectively.

A. Lipopeptide classification by NRPS sequence analysis

Amino acid sequences of NRPSs or their domains can be aligned with tools like MUSCLE (Edgar, 2004).  Alignments are trimmed to remove short unaligned ends, if any, to avoid phylogenetic noise (Adamek, 2019). Then, a phylogenetic tree can be inferred using IQ-TREE (Minh, 2020) or other equivalent phylogeny software such as MEGA. An advantage of using IQ-TREE is the possibility of using ultrafast (UF) bootstraps (Hoang, 2018) which significantly improves the computation time. Online tools such as iTOL (Letunic, 2021) support the visualization or annotation of the phylogenetic trees.

Statistical methodMaximum Likelihood
Substitution modelJones-Taylor-Thornton (JTT)
Distribution among sitesGamma Distributed with Invariant Sites (G+I)
Ultrafast bootstrap replicates1000
Parameters suitable for tree inference with amino acid sequences

The NRPS systems for different lipopeptides are specified by the names of the gene products. Individual NRPS genes/proteins are denoted with an extension (A, B or C) according to their BGC organization and involvement in collinear biosynthesis.

  • Mycins: Fst, syringotoxin; Nun, nunamycin; Syr, syringomycin; Tha, thanamycin
  • Peptins: Cip, cichopeptin; Fus, fuscopeptin; Jes, jessenipeptin ; Nup, nunapeptin ; Pdf, sclerosin; Syp, syringopeptin
  • Other CLiPs: Ams, amphisin; Ani, anikasin; Arf, arthrofactin; Asp, asplenin; Ban, bananamide; Coc, cocoyamide; Etl, entolysin; Gam, gacamide; Lok, lokisin; Mass, massetolide; Mdn, MDN-0066; Mlk, milkisin; Nep, nepenthesin; Oak, oakridgin; Ofa, orfamide; Pdm/Pse, pseudodesmin; Pmn, pseudophomin; Poa, poaeamide; Ppz, PPZPM; Pek, prosekin; Pso, putisolvin; Ses, sessilin; Ste, stechlisin; Taa, tolaasin F; Ten, tensin; Tol, tolaasin I; Vis/Visc, viscosin; Vsa/Vsm, viscosinamide; Wip/Wlc/ Wlm/Wlp/Wly/Wpp, WLIP; Xtl, xantholysin
  • Factins: Cif, cichofactin; Syf, syringafactin; Vif, virginiafactin
  • Thanafactin: Thf, thanafactin

A.1. Comparative analysis of NRPS systems within a lipopeptide family

To confirm the affiliation of a NRPS system to a particular lipopeptide family, a phylogenetic analysis based on multiple amino acid sequence alignments of collinearly concatenated NRPSs can be performed. The use of platforms such as Geneious (Kearse, 2012) facilitates concatenation of NRPS sequences.

Below, we make a list of accession numbers of NRPS sequences from known CLiP producers available in the form of an archive of Excel files. You can download an overview of all Pseudomonas NRPS systems, or those of selected families (version: October 2021).

A.2. Peptide sequence prediction by A-domain analysis

The peptide sequence synthesized by a NRPS system, based on the collinearity rule, can be predicted by performing A-domain phylogenies (Li, 2013). NRPS domain sequences can be retrieved with the PKS/NRPS analysis tool (Bachmann, 2009). A-domain comparison involves multiple amino acid sequence alignments of the extracted A-domains with those of known substrate specificity. The most likely substrate of an A-domain is inferred from co-clustering with previously validated A-domains.

Below, we make a lists of A-domain sequences from producers of known lipopeptides available in the form of an archive of .FASTA formatted files. You can download an overview of all Pseudomonas NRPS systems, or those of selected families (version: October 2021).

A.3. Peptide stereochemistry prediction by C-domain analysis

Phylogenetic analysis enables to distinguish the three C-domain types present in the Pseudomonas lipopeptide modules: the Cstart-domain (acylation of first peptide residue), the LCL-domain (connecting two L-amino acids), and the more abundant E/C-domain. The E/C-domain has dual catalytic activity: epimerization of an L-amino acid (incorporated by the previous module) to its D-configuration prior to coupling it to the next L-amino acid recruited by the module. When the nth module is proficient in epimerization, this results in a D-configured amino acid at the (n-1)th position in the peptide, thus allowing to predict the stereochemistry of the peptide. However, it appears that not all E/C-type domains are functional in epimerization. The co-clustering of a novel E/C-domain within a cluster of epimerization-deficient E/C-domains may suggest a lack of L-to-D conversion activity but this assumption should be interpreted with care and still requires validation by chemical structure elucidation.

Below, we make a lists of C-domain sequences (archive of .FASTA formattef files) from producers of CLiPs with resolved stereochemistry available. (version: October 2021). The E/C-domains that were shown to lack epimerization activity are marked with an asterisk. You can download an overview of all Pseudomonas NRPS systems, or those of selected families

B. Lipopeptide family classification by transporter system

Comparative analysis of PleB transporters

The sequence of the inner-membrane component PleB of the tripartite transport system PleABC has diagnostic value for the chemical structure of its exported lipopeptide. This correlation allows to provisionally assign a certain NRPS system to a particular chemical family or a small subset of closely related chemical families by phylogenetic analysis of PleB sequences (Girard, 2021). Co-clustering of a PleB sequence from an unknown lipopeptide BGC with a PleB set linked to a particular lipopeptide family, provides a strong indication of chemical structure similarity of the lipopeptide. This approach can be particularly useful when initially incomplete or fragmented NRPS sequences are available.

Below, we make a list of PleB sequences (archive of .FASTA formatted files) from producers of known CLiPs available (version: October 2021).

Useful links

AntiSMASH bacterial versionhttps://antismash.secondarymetabolites.orgBlin, 2021
EMBOSS Transeq, 2022
Geneioushttps://www.geneious.comKearse, 2012
GenBank, 2016
IQ-TREEhttp://www.iqtree.orgMinh, 2020
iTOLhttps://itol.embl.deLetunic, 2021
MAFFT, 2013
MEGAhttps://www.megasoftware.netTamura, 2021
MUSCLE, 2004
NORINE, 2020
Phylo.io, 2016
PKS/NRPS analysishttp://nrps.igs.umaryland.eduBachmann, 2009
RASThttps://rast.nmpdr.orgBrettin, 2015
Useful external links


Adamek, et al. “Applied evolution: phylogeny-based approaches in natural products research.” Natural Product Reports 36, 9 (2019):

Bachmann, et al. “Chapter 8. Methods for in silico prediction of microbial polyketide and nonribosomal peptide biosynthetic pathways from DNA sequence data.” Methods Enzymol 458 (2009):

Blin, et al. “antiSMASH 6.0: improving cluster detection and comparison capabilities.” Nucleic Acids Res 49, W1 (2021):

Brettin, et al. “RASTtk: a modular and extensible implementation of the RAST algorithm for building custom annotation pipelines and annotating batches of genomes.” Sci Rep 5 (2015):

Clark, et al. “GenBank.” Nucleic Acids Res 44, D1 (2016):

Edgar. “MUSCLE: multiple sequence alignment with high accuracy and high throughput.” Nucleic Acids Res 32, 5 (2004):

Flissi, et al. “Norine: update of the nonribosomal peptide resource.” Nucleic Acids Res 48, D1 (2020):

Girard, et al. “Transporter gene-mediated typing for detection and genome mining of lipopeptide-producing Pseudomonas.” Applied Environmental Microbiology  (2021):

Girard, et al. “Lipopeptide families at the interface between pathogenic and beneficial Pseudomonas-plant interactions.” Critical Reviews in Microbiology 46, 4 (2020):

Hoang, et al. “UFBoot2: Improving the Ultrafast Bootstrap Approximation.” Mol Biol Evol 35, 2 (2018):

Katoh, et al. “MAFFT Multiple Sequence Alignment Software Version 7: Improvements in Performance and Usability.” Molecular Biology and Evolution 30, 4 (2013):

Kearse, et al. “Geneious Basic: an integrated and extendable desktop software platform for the organization and analysis of sequence data.” Bioinformatics 28, 12 (2012):

Letunic, et al. “Interactive Tree Of Life (iTOL) v5: an online tool for phylogenetic tree display and annotation.” Nucleic Acids Research 49, W1 (2021):

Li, et al. “The antimicrobial compound xantholysin defines a new group of Pseudomonas cyclic lipopeptides.” PLoS One 8, 5 (2013):

Madeira, et al. “Search and sequence analysis tools services from EMBL-EBI in 2022.” Nucleic Acids Res 50, W1 (2022):

Minh, et al. “IQ-TREE 2: New Models and Efficient Methods for Phylogenetic Inference in the Genomic Era.” Mol Biol Evol 37, 5 (2020):

Robinson, et al. “ Interactive Viewing and Comparison of Large Phylogenetic Trees on the Web.” Mol Biol Evol 33, 8 (2016):

Tamura, et al. “MEGA11: Molecular Evolutionary Genetics Analysis Version 11.” Mol Biol Evol 38, 7 (2021):