Automated function prediction in protein structures

Usage  
How to use WebFEATURE

Contents:

ONLINE ANALYSIS
  1. Using the WebFEATURE Interface
  2. Viewing results interactively in Chime
OFFLINE ANALYSIS
  1. Downloading results for further analysis
  2. Viewing results in RasMol
  3. Viewing results in PyMOL
INTERPRETING SEQFEATURE FILES
  1. Points files
  2. Feature files
  3. Score files
  4. Interpreting results

ONLINE ANALYSIS

Using the WebFEATURE Interface

WebFEATURE allows a user to scan a molecular structure for a particular functional site. The WebFEATURE interface is fairly simple to use.

Step 1. Pick a structure to scan
  • If you know the PDB (Protein Data Bank) identification number of the molecule, enter it into the textfield labeled "PDB ID".
  • If you do not know the PDB id, click on the link to "PDB". The PDB will allow you to look up a structure using the molecule's name. Then enter its id number into the PDB ID text field in WebFEATURE.
  • If you have a structure (in PDB format) on your local machine, you can upload it by clicking the "Browse" button next to "Upload a structure".
Step 2. Choose a type of site to scan
  • Pre-made models are available in the drop-down menu. Choose from hand-curated models (RNA_binding, ATP, Calcium, and Chloride), or SeqFEATURE models (all upper-case, built from PROSITE patterns).
Step 3. Choose how you want to receive your results.
    You can run WebFEATURE either in:
  • "Interactive" mode, where you wait for your results to return to the web browser. Choose this mode if your structure is not too large.
  • or by E-mail, where WebFEATURE sends you a URL link to your results via e-mail, once the WebFEATURE scan of your structure is complete. Choose this mode if your structure is large, i.e. a ribosomal structure, as it may take around 10-15 minutes to scan.
  • Choose either mode by clicking the respective option button. If "E-mail" mode is chosen, then the user must input their e-mail address in the "email" text field.
Step 4. Click "Submit" button.

back to contents

Viewing results interactively in Chime
Step 1. Make sure Chime plug-in is installed properly in your browser. You can download Chime from http://www.mdlchime.com/chime/. Follow their instructions for Chime installation.

Step 2. Turn on JavaScript Support for browser.

Step 3. Open browser window.
  • If "interactive" mode was chosen when the WebFEATURE scan was started, the browser will be open and results will be automatically loaded into the browser.
  • If e-mail mode was chosen during the scan, open the results page from the URL link provided in the e-mail message sent to you by WebFEATURE.
Step 4. Play with your results!

Navigating the WebFEATURE Results Window

Below are screenshots of the results page. Each panel is labeled with its function. Click the image for a bigger view.

Chime Viewer Window
  • The structure, het atoms, and hits above cutoff are automatically opened in the Chime Viewer Window of the WebFEATURE results page. The model is displayed in "cartoon" representation, the het atoms in "dot" representation, and hits above cutoff as red spheres. Use the Hits Panel and Represenation buttons to change their representation and cutoff. 
Hits Panel
  • Change the hits (red spheres) visualized in the Chime viewer by adjusting the cutoff in Hits Panel. This is done by: Entering a new score cutoff in the "Cutoff" textfield, or by clicking on a bar in the "Score Distribution" histogram.
  • Color visualized hits by score. Click on the "By Score" color button in the Hits Panel. Lowest hits are colored closer to the blue end of the color spectrum, while highest scoring hits are colored towards the red end of the color spectrum.
Model Info Panel
  • The model Info Panel provides background information about the model used for scanning.
  • Click on the "More Info" button to view the 2-D plot of the statistical model used in scanning. Red squares represent abundant property-volume pairs in the sites versus the nonsites. Green squares represent deficient property-volume pairs in the sites versus the nonsites.
Manipulating Representations
(Novice Chime Users)
  • The buttons below the Chime Viewer provide basic manipulations on the molecule, hetero atoms, and WebFEATURE hits.
  • Click on a selection first, i.e. molecule, water, or het.
  • Then Click on a "representation" button, i.e., spacefill, dot, cartoon, backbone, or sticks.
  • To make hits appear bigger, select the desired cutoff by either clicking on a bar in the Score Distribution Histogram, or by entering a score cutoff in the "Cutoff" textfield. Then click the "Spacefill" button under the Chime Viewer.
(Advanced Chime Users)
  • Go to http://www.umass.edu/microbio/chime for a tutorial on Chime
  • WebFEATURE hits are represented as the residue type "HIT" when loaded into Chime. Use the right-button mouse functions to change the representations of the hits and molecule. Hit scores are represented as B-Factor, or Temperature. Coloring by temperature will color the hits according to score. Lower scoring hits are closer to blue in the color spectrum, while highest scoring hits are closer to red in the color spectrum.
Running Another WebFEATURE Scan
  • You can run another scan from the WebFEATURE results page by filling out the appropriate information in the "WebFEATURE Scan" Panel. WebFEATURE will return the results in whichever mode was previously run, "Interactive" or "E-mail" modes.

back to contents


OFFLINE ANALYSIS

Downloading results for further analysis
The results of a WebFEATURE scan can be downloaded for further analysis using the molecular modeling tools RasMol, and PyMOL. These tools allow integration of WebFEATURE results with other structural and bioinformatic analyses and the generation of publication quality images as well as provide a more powerful command line interface. More information on installation, usage, and tutorials on these tools can be found at: http://www.openrasmol.org/, http://www.pymol.org.
  • Before you begin offline analysis, either RasMol or PyMOL must be installed on your local machine as well as the python scripts and modules we provide from this site.
Step 1. Download results
  • For RasMol analysis, click on the link to "pdb-hit" in the "Files:" area of the Hits Panel (see screen shot)
    • Name and save your file.
    • If you are using Internet Explorer, you will be prompted to save the pdb-hit file to disk.
    • If you are using Netscape, right-mouse click on the link "pdb-hit". Choose "Save Link As..." and save the pdb-hit file with the .pdb extension to its filename. Otherwise, simply clicking on the link will open the pdb-hit file in a new Chime viewer window.
  • For analysis in PyMOL, click on the link to "hits" in the "Files:" area of the Hits Panel (see screen shot)
  • A list of X,Y,Z coordinates and scores will be shown as text in the browser.
  • You can save this as a file or copy and paste the text into a new file using any text editing tool.
  • Be sure to label the file using the file extension ".hit", i.e. "yourfilename.hit".
Step 2. Download Scripts for Visualization Tools
Step 3. Launch either RasMol or PyMOL
and follow the directions in the next section.

back to contents


Step 0. Install RasMol
  1. Download and install RasMol from http://www.openrasmol.org/
Step 1. Download results
  1. Download the pdb-hit file from the Files section of the Hits Panel (see screenshot)
    1. Right click on pdb-hit and select "Save Target As..." (Internet Explorer) or "Save Link As..." (Netscape) to save the pdb-hit file
Step 2. Load structure into RasMol
  1. Start up RasMol
  2. Go to File | Open and select the pdb-hit file downloaded from Step 1.
  3. Initialize the representation by typing the following RasMol commands:
         wireframe off; cartoon;
         select hetero and not water and not hit; dots;
         select hit; color atoms red;
    
         select hit; spacefill off;
         select hit and temperature > 5000; spacefill 100;
    
    This will turn off the default wireframe representation of the structure and display it as a cartoon. It will then display the hetero atoms as dot spheres. Finally it will render the hits above cutoff 50.0 from WebFEATURE as red spheres of radius 100 RasMol units (1/250th Angstroms). The hits are encoded as HETATMs of atom name HIT and residue name HIT with the hit scores stored in the temperature field. Cutoff scores must be scaled by 100 before being used as cutoff values for the temperature. For instance, to have a cutoff of 50.0, use 5000 as the threshold for the temperature.
Step 3. Adjust cutoff and hit representation
  1. The cutoff can be adjusted by typing the following RasMol commands:
         select hit; spacefill off;
         select hit and temperature > cutoff_times_100;
         spacefill 100;
    
    This will render hits above the specified cutoff as spheres of radius 100 RasMol units (1/250th Angstroms). The cutoff score must be scaled by 100 before being used as the cutoff value for the temperature. For instance, to use a cutoff of 50.0, use a threshold of 5000 for the temperature.
  2. The color of the hits can be changed by:
         select hit; color atoms color_name;
    
  3. The radius of the hits can be changed by replacing:
         spacefill 100;
    
    with:
         spacefill hit_radius;
    
    when setting the cutoff. The radius can be specified as an integer for RasMol units (1/250th Angstroms) or as a value containing a decimal point for Angstroms.
Step 4. Manipulate and interact with model
  1. Information on selecting, changing representation, and interacting with the model is available at http://www.openrasmol.org/ and http://www.umass.edu/microbio/RasMol/

back to contents


Step 0. Install PyMOL and viewhits.py
  1. Download and install PyMOL from http://www.pymol.org/
  2. Download the viewhits module for PyMOL
  3. Save the viewhits.py in a location easily remembered
Step 1. Download results
  1. Download the hits and pdb file from the Files section of the Hits Panel (see screenshot).
    1. Right click on pdb and select "Save Target As..." (Internet Explorer) or "Save Link As..." (Netscape) to save the pdb file
    2. Right click on hits and select "Save Target As..." (Internet Explorer) or "Save Link As..." (Netscape) to save the hits file
Step 2. Load structure into PyMOL
  1. Start PyMOL
  2. Load the pdb file previously downloaded in Step 1 by typing:
         load pdb_filename
    
Step 3. Load viewhits module
  1. Load the viewhits.py module by typing:
         run viewhits.py
    
    or
         run viewhits_sf.py
    
Step 4. Adjust cutoff and hit representation
  1. The hits can be viewed by typing (substitute "viewhits_sf" if you are using viewhits_sf.py):
         viewhits hits_filename
    
  2. The cutoff can be adjusted by adding the cutoff parameter:
         viewhits hits_filename, cutoff
    
  3. The color of the hits can be changed by:
         viewhits hits_filename, color=color_name
    
  4. The radius of the hits can be changed by:
         viewhits hits_filename, radius=radius
    
  5. The full usage for viewhits is:
         viewhits hits_filename [, cutoff=cutoff]
             [, radius=radius] [, color=color_name]
    
  6. More information can be available by typing:
         help viewhits
    
Step 5. Manipulate and interact with model
  1. Information on selecting, changing representation, and interacting with the model is available at http://www.pymol.org/

back to contents


INTERPRETING SEQFEATURE FILES


SeqFEATURE accepts input for calculating feature vectors in the form of "points files" (extension .points or .ptf). Points files contain a label for the protein, the X, Y, and Z coordinates, as well as a label for the residue ID, chain ID, and atom ID for each point. An example is shown below (the first column sometimes contains a unique identifier for the site, similar to "Env_1bqy_0"):


You might encounter these files if you run a full SeqFEATURE library scan on a structure, or if you download, install, and run Feature from your own machine.

SeqFEATURE calculates feature vectors for each point it is given and outputs them into feature files (extension .features or .ff), one feature file for each points file. Each line consists of one feature vector; each feature vector contains the values calculated for 480 features (see publications for details), tab-delimited, with the site identifier in the first column and the site description (residue ID, chain, and atom) in the last column. You might encounter these files if you run a full SeqFEATURE library scan on a structure, or if you download, install, and run Feature from your own machine.

SeqFEATURE outputs results of its scans into score files (extension .scores or .zscores), usually one score file per feature file. Score files contain the site identifier in the first column, the score or z-score in the second column, the X, Y, and Z coordinates, and the site description (residue ID, chain, and atom). See example below:


In the case of data from the PDB scan, which can be retrieved using either PDB ID or SeqFEATURE model, the second column contains the name of the model, and the site description is split into its three constituent parts. See below:



Statistics for each model can be used to evaluate score files and are available here. The model statistics file contains the AUC, partial AUC (calculated using only the top-scoring 100 negative sites as the background), the 100% specificity z-score cutoff (at which 100% of the negatives from the training set are predicted correctly), the corresponding training set sensitivity at that cutoff, the 99% specificity z-score cutoff, the sensitivity at that cutoff, and the 95% specificity z-score cutoff and sensitivity at that cutoff.

When evaluating the strength of a prediction, one should consider a number of factors:
- Model AUC and ROC curve (How well does the model predict true positives as opposed to false positives?)
- Score distribution of training sites (How well does the model distinguish between real sites and background?)
- Model cutoff (How much higher than the cutoff is the score?)
- Multiple hits (Does the model predict multiple hits in a same region, or do other similar models predict hits in the same region?)
- Visual inspection (Does the local region contain features characteristic of the predicted function?)
- Corroboration with other methods

ROC and other performance plots can be found on each model's Info page, accessible through the model drop-down menu on the main WebFEATURE page.

back to contents