First, a few definitions. InterPro is a database of protein families and domains that was developed by the European Bioinformatics Institute. InterPro integrates many of the most popular protein family and domain databases such as PFAM and PRODOM together into one federated database. InterProScan is a tool that they also developed to automatically annotate proteins with families and domains. It runs the search tools of the databases included in InterPro, (hmmer, ProfileScan and others) and then maps the search results to entries in the InterPro database.
Geneious is a desktop computer program for DNA and protein sequence analysis made by Biomatters. They also maintain a public API that enable people such as myself can use to write plugins that extend the functionality of Geneious.
If you don’t know what any of this means then you probably don’t need this software.
- Scan all of the major protein family and protein databases at one time.
- Automatically submits your proteins “batch-wise” to the InterProScan server, increasing throughput up to ten fold.
- Export InterPro annotations to a table that can be opened in a spreadsheet. Then you can do all sorts of fun stuff like making pie charts, etc.
This plugin submits protein sequences to the InterProScan server at EBI, and then parses the results, creating annotations on the proteins. The annotations include protein domains from any of the 12 InterPro member databases as well as the InterPro terms themselves.
Please remember that I have no control over the InterProScan web service or the content of InterPro. More importantly, please also keep in mind that the EBI has not written this plugin and therefore cannot answer any questions about its use. If you have questions about using the plugin, contact me. If you have questions about the content of InterPro, then please contact EBI.
Simple. You can install the plugin from within Geneious by going to the Tools menu and selecting Plugins… From there you can find InterProScan from the list of plugins and install it.
Using the plugin
After the plugin is installed you will see a new item in the “Annotate and Predict” menu. Select one or more protein sequences and then select “Find protein domains with InterProScan”.
Email address: This is required by the InterProScan service and is only used for reporting errors during the processing of your sequences. You will not be added to any mailing lists. Please don’t abuse the service by using bogus email addresses, you will probably only succeed in getting your computer banned from the service. If you get email messages from InterPro then you might want to wait a few hours and try to submit your sequences again. If the errors continue then you can forward the messages to me and I’ll try to figure our what’s going on.
Add Features to proteins without InterPro results…: When this is selected, proteins that have no InterPro terms will be given an annotation indicating that there are no InterPro terms for the protein. If an error was received while a protein was being processed, an error annotation is added. This might seem like a lot of extra cruft, but it allows you to know when a protein was not successfully processed by the web service.
Show InterPro Terms: InterPro terms can be added to the protein as their own feature or as qualifiers added to the protein domains returned by InterProScan. Separately: Add InterPro Terms as separate features, apart from the evidence (PFAM etc). This is useful if you want to export the InterPro terms as a table that can be imported into a spreadsheet. As Qualifiers: Include the InterPro terms as qualifiers (descriptions) on the features.
More Options: Clicking the More Options arrow in the lower left corner opens a panel that allows you to enable or disable the individual database search tools. This may be useful if you are only interested in seeing the results from one or a few of the search tools.
- Rerun proteins with errors. The InterPro Scan server sometimes throws errors for no apparent reason. This can result in your protein sequences having “InterProScan Error” features, if you selected the checkbox described above. This type of feature exists so that you can identify proteins that resulted in errors and rerun them using the plugin. If a sequence keeps returning errors, you might try to wait a while and resubmit the protein for analysis after an hour or so. If a protein keeps resulting in errors, then the sequence might be incompatible with the InterProScan server (it might have internal stop symbols).
- PFAM results might not match exactly the results obtained using the server on the PFAM home page. The same goes for the other search tools. The reason is that InterPro server does some filtering of the search results that are obtained with the individual search tools.
- Scan large batches of sequences in small chunks. The plugin, and the InterProScan server, are capable of processing thousands of sequences at a time, but processing can be rather slow. If you quit Geneious or cancel the plugin while its running you will lose any results that have already been obtained. If you need to scan a large number of proteins, I recommend that you scan them in batches, perhaps a few hundred at a time.
- Export InterPro results to a spreadsheet. when you run the plugin on a batch of sequences, select Show InterPro Terms Separately. Then, then the analysis has finished you can see a table view of the features by selecting the Annotations tab in the sequence viewer. Export this table and then import it into a spreadsheet program such as Excel. In the spreadsheet you can create a pivot table that can show you the number of proteins annotated with each InterPro Term. From there you can make pie charts or other figures. This is especially useful for comparing sets of proteins, for example, comparing the proteomes of different species.