Automated function prediction in protein structures
Model Training Data

SeqFEATURE is a method to:

  1. Identify active sites using PROSITE functional classes as the basis for ground truth
  2. Compute microenvironments at known active sites and known non-active sites to develop training data for a protein functional class
  3. Use machine learning algorithms such as Naive Bayes and Support Vector Machine to train predictive models based on the training data.

WebFEATURE 4.0 currently uses SeqFEATURE 2013, which can identify active sites for 251 functional classes, distributed over 606 models of key reactive atoms at active sites. Some active sites have more than one key reactive atoms. We anticipate releasing some custom-made models, including a calcium binding model, in the near future.

SeqFEATURE 2013 training data for models are based on PROSITE v20.81 and are accessible for download. Select a model below to view the model information page. From the model information page, choose "Download Data" from the upper-right-hand corner to download the training data.

Choose a model for which to download data: