How to use WebFEATURE
WebFEATURE allows a user to scan a molecular structure for a particular functional site. First-time users must register.
Step 1. Pick a structure to scan
Step 2. Choose a type of site to scan
- If you know the PDB (Protein Data Bank) identification number of the molecule, enter it into the textfield labeled "PDB ID".
- If you do not know the PDB id, click on the link to "PDB". The PDB will allow you to look up a structure using the molecule's name. Then enter its id number into the PDB ID text field in WebFEATURE.
- If you have a structure (in PDB format) on your local machine, you can upload it by clicking the "Browse" button next to "Upload a structure".
Step 3. Enter your E-mail (first-time users must register) and Optional Job Description
- Pre-made models are available in the drop-down menu. Choose from over 600 SeqFEATURE models (built from PROSITE patterns).
Step 4. Click "Scan it!" button.
- If you have not registered, click on register link.
- If you have registered, enter your e-mail address.
- The optional job description will be included in the e-mailed results to help you quickly recognize each job result in your e-mail.
- You will get results via e-mail in a few minutes.
INTERPRETING SEQFEATURE FILES
SeqFEATURE accepts input for calculating feature vectors in the form of "points files" (extension .points or .ptf). Points files contain a label for the protein, the X, Y, and Z coordinates, as well as a label for the residue ID, chain ID, and atom ID for each point. An example is shown below (the first column sometimes contains a unique identifier for the site, similar to "Env_1bqy_0"):
You might encounter these files if you run a full SeqFEATURE library scan on a structure, or if you download, install, and run Feature from your own machine.
SeqFEATURE calculates feature vectors for each point it is given and outputs them into feature files (extension .features or .ff), one feature file for each points file. Each line consists of one feature vector; each feature vector contains the values calculated for 480 features (see publications for details), tab-delimited, with the site identifier in the first column and the site description (residue ID, chain, and atom) in the last column. You might encounter these files if you run a full SeqFEATURE library scan on a structure, or if you download, install, and run Feature from your own machine.
SeqFEATURE outputs results of its scans into score files (extension .scores or .zscores), usually one score file per feature file. Score files contain the site identifier in the first column, the score or z-score in the second column, the X, Y, and Z coordinates, and the site description (residue ID, chain, and atom). See example below:
In the case of data from the PDB scan, which can be retrieved using either PDB ID or SeqFEATURE model, the second column contains the name of the model, and the site description is split into its three constituent parts. See below:
Additional information for each model can be used to evaluate score files and are available for each model. The model statistics file contains the precision-recall curves, normalized score distributions for the positive and negative classes, and information about the original training data sources.
When evaluating the strength of a prediction, one should consider a number of factors:
- Model precision performance (How well does the model predict true positives as opposed to false positives?)
- Score distribution of training sites (How well does the model distinguish between real sites and background?)
- Model cutoff (How much higher than the cutoff is the score?)
- Multiple hits (Does the model predict multiple hits in a same region, or do other similar models predict hits in the same region?)
- Visual inspection (Does the local region contain features characteristic of the predicted function?)
- Corroboration with other methods
ROC and other performance plots can be found on each model's Info page, accessible through the model drop-down menu on the main WebFEATURE page.