Genome mining and phylogenetic analysis
Genome mining is a technique that uses the DNA sequences of organisms to discover biosynthetic gene clusters (BGCs) involved in the production of secondary metabolites such as CLPs, based on sequence patterns and similarities detected by predictive software. A big advantage to genome mining is that it can be done relatively easily and much faster than the traditional experimental methods. This is further aided by the rapid developments in the field of genome sequencing, leading to an ever-increasing number of available genomes. Another advantage is that it avoids the problem of “silent” BGCs. Organisms containing such clusters would produce false-negative results unless they are tested in specific conditions triggering expression, whereas the cluster itself can still be detected when looking at its genome.
Phylogenetic analysis enables comparison between identified BGCs and already known ones as well as assessment of the genetic relationship between the organisms carrying them. When applied to a novel CLP cluster, domains that are responsible for the integration of particular amino acid, can be compared systematically to other known domains. This allows for a prediction of the amino acid composition and configuration of the CLP.
The predictions obtained via this methodology are not absolute and have to be verified experimentally; however, the acquired data greatly aids this process. Knowledge about the type of CLP and its novelty allows for a more focused and efficient approach to experimental verification, including chemical structure determination further down the line.