Automated function prediction in protein structures
On-going and related projects

FEATURE is a suite of automated tools that examine biological structures and produce useful representations of the key biophysical and biochemical features of these structures that are critical for understanding function. The utility of this system extends from medical/pharmaceutical applications (model-based drug design, comparing pharmacological activities) to industrial applications (understanding structural stability, protein engineering).

Some improvements we are actively working on include:

  • Built-in SVM and Random Forest learning and prediction
  • Improved speed performance
  • Featurizing the PDB, including incremental updates, as a user-searchable and downloadable database

We continue to improve WebFEATURE system, both in higher prediction accuracy, and in overall usability. Some improvements we are actively working on include:

  • Functional preduction using Random Forests.
  • FEATURE property variable ranking with RF-trained measures mean decrease Gini and mean decrease in accuracy.
  • Model suggestions based on user-provided plain English queries based on information retrieval techniques on relevant PubMed literature.
  • Metal cofactor binding domain models using hand-selected training data.

This study introduces an algorithm that seeks similar microenvironments within two binding sites, and assesses overall binding site similarity by the presence of multiple shared microenvironments. The method has relatively weak geometric requirements and uses multiple biophysical and biochemical measures to characterize the microenvironments (to allow for diverse modes of ligand binding). The method is able to recognize several proven distant relationships, and predicts unexpected shared ligand binding.

Collaboration with San Francisco State University

FEATURE and many of its related projects are originally invented by Stanford Helix Group, and are developed in collaboration with San Francisco State University Computer Science Department and the Center for Computing for Life Sciences.

The goals of this collaboration are to:

  • Apply software engineering practices for extensibility and maintenance
  • Explore possible performance improvements through machine learning, GPU, and cloud computing