HOME Secondary Structure & Angle Prediction Protein Structure Prediction MUFOLD-LOOP MUFOLD-DB Clustering Members

MUFOLD-CL for Protein Structure

Current protein structure prediction methods often generate a large population of candidates (models), and then select near-native models through clustering. Existing structural model clustering methods are time consuming due to pairwise distance calculation between models. We developed a novel method for fast model clustering without losing the clustering accuracy. Instead of the commonly used pairwise RMSD and TM-score values, we propose two new distance measures, Dscore1 and Dscore2, based on the comparison of the protein distance matrices for describing the difference and the similarity among models, respectively. The analysis indicates that both the correlation between Dscore1 and RMSD and the correlation between Dscore2 and TM-score are high. Our Dscore1-based clustering achieves a calculation time linearly proportional to the number of models while obtaining almost the same accuracy for near-native model selection in comparison to existing methods with calculation time quadratic to the number of models. By using Dscore2 to select representatives of clusters, we can further improve the quality of the representatives with little increase in computing time. In addition, for large size (~500k) of models, we can give a fast data visualization based on the Dscore distribution in seconds to minutes. Our method has been implemented in a package named MUFOLD-CL. The executable codes and some examples can be downloaded by the following linkages.
Files Specification
MUFOLD_CL_ProteinStructure Linux version. Tools for structural decoys clustering
Decoys Decoys for examples
Natives Native structures for examples
Readme The help information
Reference: Jingfen Zhang, Dong Xu: Fast algorithm for population-based protein structural model analysis. Proteomics. 2013 Jan 1;13(2):221-9