Sequence database setup: Spectral library

Mascot Server uses NIST MSPepSearch for searching spectral libraries. Library files in MSP format can be downloaded or created using Database Manager. Several libraries from NIST and PRIDE / EBI are predefined in Database Manager. These can be enabled on your Mascot Server very easily:

  1. From the Library menu in Database Manager, choose Enable predefined definition
  2. Choose Enable for the library of interest
  3. Unless you wish to change the location for the local files, choose Next
  4. A default reference database will be suggested. In most cases, just choose Create
  5. The MSP file will be downloaded and converted to NIST binary format. Once it shows as In use in Database Status, the library is available for searching.

Libraries are also available from PeptideAtlas / ISB and GPM. The ISB libararies are in SpectraST *.sptxt format, which is very similar to MSP and may be supported in a future release. The GPM *.mgf libraries are equivalent to MSP files, and can be downloaded and configured using Database Manager. These files are not predefined because there are so many of them (some 197) and the two most important organisms, human and mouse, are split into chromosomes, so you have to download 20 odd files and merge them to get a useful library.

If you wish to configure library files that are not predefined:

  1. From the Library menu in Database Manager, choose Create new
  2. Choose a suitable name (click on the question mark for advice on naming)
  3. Choose from Custom, Copy of, and Use predefined definition template
    • Custom: A new custom database definition from scratch.
    • Copy Of: Copy an existing database. You will be required to enter a new name and given the choice of copying the existing database files.
    • Use predefined definition template: Start from a predefined definition. The differences between this and enabling a predefined definition are (i) you can make changes to the configuration, (ii) the definition will not be kept up-to-date automatically.
  4. Assuming you chose Custom, the next page of the wizard allows you to specify where the *.MSP file can be found. This can be a download URL, a file path, or you can copy and rename the file yourself. A third option, to create the library from Mascot search results, is described below.
  5. Once the MSP file has been copied or downloaded, you can review and modify the configuration. See notes, below, for information about these settings
  6. The MSP file will be converted to NIST binary format. Once it shows as In use in Database Status, the library is available for searching.

Configuration notes

Parse rules

If the library entries includes protein accessions, you must choose suitable parse rules to extract an accession and description. These accessions are not critical – most entries will get a more useful set of accessions from the reference database. In most cases, there will be an accession but no description, so you can choose \(.*\) for both. The MSP accession will only used if the peptide fails to map to any entry in the reference database.

MS/MS tolerances

The default library tolerance is quite wide, 0.6 Da / 500 ppm, because the entries may come from any type of instrument, and having a tolerance that is too wide is much better than one that is too narrow. If you are creating a library from data acquired on a specific instrument, capable of high MS/MS accuracy, you may be able to use a much tighter fragment tolerance.

The reference database

Protein inference for library matches is accomplished by assigning a reference Fasta database to each library as part of the library configuration. A detailed description can be found on the relevant help page. The default reference database for predefined databases is SwissProt, usually with an appropriate taxonomy filter. If SwissProt is not available on your Mascot Server, you will need to choose an alternative protein Fasta file (cannot be NA). You can select any locally available Fasta file, but we advise against choosing a very large database, such as NCBIprot, even with a restricted taxonomy, because the huge number of proteins mapping to each peptide sequence will make compression, searching, and reporting very much slower than with a less redundant file, such as SwissProt or a UniProt complete proteome.

Taxonomy

In most cases, library files are compiled for an individual organism, so there is no requirement to identify the taxonomy of individual entries. Even if this was not the case, the entries in a library are peptides, not proteins, so taxonomy assignment would be tricky. Hence, Mascot allows taxonomy to be specified in the filter used to construct a library, but not as a filter when searching a library.

Create a library from Mascot search results

From the Library menu in Database Manager, choose Create new, Custom, Create from search results. You must then define filters that control which peptide matches will be added to the library. More information about spectral library filters can be found on the Spectral library search help page.

You must specify at least one filter, which must be a score or expect value threshold, typically expect < 0.01, because you only want high confidence matches in a library. It can be useful to apply quite narrow restrictions for an individual library. For example, you might want one library for human SILAC data and another for human phosphopeptides and another for human MHC peptides. The same peptide match may make an appearance in several libraries and, if you change your mind about the criteria, you can easily modify the filters and create a new library.

Within a library, a peptide sequence with a particular charge and set of modifications appears only once and is represented by the match with highest score. That is, libraries built by Database Manager do not contain consensus spectra. Only matches from uninterpreted MS/MS data are considered; PMF and sequence queries results are ignored, as are matches from the second pass of an error tolerant search. Modifications and Taxonomy apply to the match, not the search parameters. If the filter includes Phospho (ST), only matches containing Phospho (ST) pass, not all matches in a search where this was a variable mod. All taxonomies at or below the specified node will pass.