Automated function prediction in protein structures
The FEATURE system

A central goal of molecular biology is the determination of macromolecular structure, and the analysis of how structural elements produce an observed function. Structural genomics projects are now greatly increasing the number of available protein structures, with a focus on those with low sequence similarity to known proteins, which may contain novel folds. Although structure determination is fast becoming high-throughput, understanding the relationship between structure and function and subsequent functional annotation of these newly derived structures has not kept pace. Manual inspection of the structures, while informative, is too time-consuming to be realistically applicable to this problem, and risks missing important relationships that would be revealed by an exhaustive correlation. There is therefore a need for automated tools to produce useful representations of the key biophysical and biochemical features of biological structures that can be used to understand function, especially when the sequence and structure have no close homologues to existing molecules.

Results of a FEATURE scan for metal binding sites in ribosomal protein S2. Putative sites are shown as red spheres within a 7 Angstrom radius. Charged, acidic residues are highlighted in blue.

To address this problem, we developed a system for modeling functional sites in protein structures, called FEATURE. FEATURE represents the local 3D environment around sites of interest using many physicochemical properties (at the atomic, molecular, residue, and secondary structure level) collected in radial, spherical volumes centered on the site. The environments between positive and negative examples of the functional site is then compared to reveal those properties which are relatively abundant or absent in the positive examples as opposed to the negative examples. WebFEATURE 4.0 employs both naive Bayes and Support Vector Machine scoring functions to evaluate the likelihood that a potential site contains a particular function based on that model.

We have extended FEATURE to generate models automatically from 1D sequence motifs, a method we call SeqFEATURE, and implemented a web tool (WebFEATURE) to allow scanning and analysis of protein structures for over 250 functions. See our publications to read more about FEATURE and related projects.

Read brief descriptions of currently on-going, related projects.