Mascot database search > Help > Mascot search parameters

Mascot search parameters

Your name; Email

On the free, public Mascot server, your name and email address must be entered in these fields. This information will not be used by us or anyone else to send you "spam" or junk mail. The reason for requiring this information is to allow the results of a search to be returned by email. Usually, search results are returned promptly to your browser window. However, if your connection to the web site is broken before the search is complete, they will be emailed to the supplied address. If you become disconnected from the site after submitting a search, please do not resubmit the search, just check your email. This facility also means that you don’t have to wait for search results if you don’t want to, particularly during peak hours when the response may be slower than normal.

To save you having to type in this information for every search, your browser will attempt to save it as a local “cookie”. If you refuse to accept this cookie, or your browser doesn’t support cookies, the information cannot be saved and you will have to type it in for every search. If you change the contents of either of these fields, the new values will be saved when the search is submitted.

With an in-house Mascot Server, use of these fields is optional.

Search title

A text string which will be printed at the top of results report pages. Can be left blank.

Database

Select the sequence database(s) to be searched. These can be Fasta files containing amino acid (AA) or nucleic acid (NA) sequences or spectral library (SL) files. The databases available on the free, public Mascot server are:

Database	Type	Comment
EST	NA	EST divisions of EMBL, (Environmental_EST, Fungi_EST, Human_EST, Invertebrates_EST, Mammals_EST, Mus_EST, Plants_EST, Prokaryotes_EST, Rodents_EST, Vertebrates_EST)
NCBIprot	AA	Comprehensive, non-identical protein database
SwissProt	AA	High quality, curated protein database
contaminants	AA	Common contaminants compiled by Max Planck Institute of Biochemistry, Martinsried
cRAP	AA	Common contaminants compiled by the Global Proteome Machine Organization
NIST_Mouse_IonTrap	SL	A collection of spectra from Mouse proteins (classic ion trap) from NIST.
NIST_S.cerevesiae_IonTrap	SL	A collection of spectra from baker’s yeast proteins (classic ion trap) from NIST.
PRIDE_Human	SL	A collection of spectra from human proteins from EBI.
PRIDE_Contaminants	SL	A collection of common contaminant spectra from EBI.

For a Peptide Mass Fingerprint, spectral libraries and NA databases are not available. It makes no sense to search a set of peptide masses against EST because the entries are just short stretches of sequence, not complete proteins.

For a Sequence Query or an MS/MS Ions Search, on the free, public Mascot server, you must search one of the protein databases before searching an EST database. If the protein database search fails to produce a positive match, the master results page will allow you to repeat the search against an EST database.

You can multi-select more than one database for a search. This is useful when you want to search a single organism database and include sequence of common contaminants in the search, such as BSA and trypsin.

Taxonomy

The Taxonomy parameter allows searches to be limited to entries from particular species or groups of species. This can speed up a search, and ensures that the hit list will only contain entries from the selected species. If the search data are marginal, and you are completely confident of the origin of the protein, this can help bring a weak match to the top of the list.

The top level classification, All entries, is self-explanatory. Beneath this are a number of classifications representing taxons or species, such as Rodentia (Rodents). The three classifications below Rodentia are Mus, Rattus, and Other rodentia. Selecting Other rodentia would limit a search to Rodentia excluding Mus and Rattus.

The unclassified level contains database entries for which the species is undefined or is a species which doesn’t fit into any current classification. There are about 50,000 such sequences in the NCBIprot database.

The Species information unavailable level contains those database entries from which Mascot was unable to extract taxonomy information. Taxonomy information may be present in the entry, but Mascot was unable to find it. Thus, if a search limited to a more selective classification than All entries fails to give a result, it may be a wise precaution to repeat it against Species information unavailable.

For non-redundant databases, a single entry may represent identical sequences from multiple species. The accession string and title text from the FASTA entry, listed on the master results page, will usually describe just one of these entries. To see the equivalent entries, and to explore their taxonomy, follow the accession number link in the results list to the Protein View. If the hit is from a non-redundant database, and represents multiple entries with identical sequences, the Protein View will include links to NCBI Entrez and the NCBI Taxonomy Browser for all equivalent entries.

Monoisotopic or Average

Specify whether the experimental mass values are average or monoisotopic. If you are unsure which to choose, refer to the mass accuracy help page.

Modifications

Select any known or suspected modifications.

Mascot supports two types of modification. Fixed modifications are applied universally, to every instance of the specified residue(s) or terminus. There is no computational overhead associated with a fixed modification, it is simply equivalent to using a different mass for the modified residue(s) or terminus. For example, selecting Carboxymethyl (C) means that all calculations will use 161 Da as the mass of cysteine.

Variable modifications are those which may or may not be present. Mascot tests all possible arrangements of variable modifications to find the best match. For example, if Oxidation (M) is selected, and a peptide contains 3 methionines, Mascot will test for a match with the experimental data for that peptide containing 0, 1, 2, or 3 oxidised methionine residues.

Variable modifications can be a very powerful means of finding a match, but there are also dangers to be aware of. Even a single variable modification will generate many possible additional peptides to be tested. More than one variable modification causes the number of arrangements to increase geometrically. This means that a search can take dramatically longer than the same search with fixed modifications. More importantly, testing all possible arrangements of modifications generates many more random matches, so that discrimination can be sharply reduced.

The best advice is to use variable modifications sparingly; never select a large number "just in case". Mascot allows up to 9 variable modifications to be specified but, in most cases, a better approach is to do a first pass search with a small number of variable modifications followed by an error tolerant second pass search to pick up additional matches to peptides containing unusual modifications.

If chemically inconsistent fixed modifications are combined, an error message will generated by the search engine.

The ‘Show all mods.’ checkbox switches between a short list of the most common modifications and a complete list of all available modifications. The default state for this checkbox, and all search form fields, is set using the search form defaults page.

Precursor

Certain data file formats, SCIEX API III, PerSeptive (.PKS), and Bruker (.XML), do not include m/z information for the precursor peptide. For these formats only, the Precursor field is used to specify the m/z value of the parent peptide. The charge state is defined by the setting of the Peptide Charge field.

Protein Mass

The mass of the intact protein in Da applied as a sliding window. That is, the mass of the contiguous stretch of sequence which contains all of the matched peptide mass values. This will generally be less than the mass of the entire sequence entry. If this field is left blank, there is no restriction on protein mass.

Peptide tol. ±

The error window on experimental peptide mass values, (not the error window for MS/MS fragment ion mass values, which is set using the MS/MS tol. ± parameter).

Units can be selected from:

%	fraction expressed as a percentage
mmu	absolute milli-mass units, i.e. units of .001 Da
ppm	fraction expressed as parts per million
Da	absolute units of Da

# ¹³C

Sometimes, peak detection chooses the ¹³C peak rather than the ¹²C. In extreme cases, it may pick the ¹³C₂ peak. The normal test for a precursor match is:
TOL > absolute(exp – calc)
Assuming the mass values and tolerance are in Da, if this field is set to 1, the test will also succeed for
TOL > absolute(exp – calc – 1)
If this field is set to 2, the test will succeed for the above two conditions, plus:
TOL > absolute(exp – calc – 2)

This means that you can use a tight mass tolerance and still get a match to a ¹³C peak. If you are using a very high accuracy instrument, note that the precise shifts are the carbon isotope spacings of 1.00335 and 2.00670, rather than 1 and 2.

MS/MS tol. ±

Error window for MS/MS fragment ion mass values. Units can be ppm, Da or mmu, as above

Mass values

Specifies whether experimental peptide mass values in a peptide mass fingerprint search include the mass of the charge carrier, MH⁺ or M-H^-, or whether they correspond to neutral, M_r values.

Peptide charge

Used to specify the precursor peptide charge state in a sequence query or an MS/MS ions search. The peptide mass value supplied in an MS/MS data file is usually an observed m/z value. The charge state field is used to calculate the relative molecular mass (M_r) of the precursor from the observed m/z unless the data file explicitly specifies a different charge state.

N.B. The notation "1+", "2+", etc. is used to save space and because some HTML form fields do not support the use of superscripts and subscripts. "1+" always means MH⁺, "1-" always means M-H^-, "2+" always means MH₂⁺⁺, etc.

For electrospray data, select "2+" if the peptide m/z data are known to be doubly charged. If the charge state is uncertain, select "2+ and 3+" to include both charge states in the search and see which most clearly discriminates the score of the top matched protein.

For MALDI-PSD, the precursor peptides will generally be MH⁺, so the charge state should be set to "1+".

Missed Cleavages

Setting the number of allowed missed cleavage sites to zero simulates a limit digest. If you are confident that your digest is perfect, with no partial fragments present, this will give maximum discrimination and the highest score.

If experience shows that your digest mixtures usually include some partials, that is, peptides with missed cleavage sites, you should choose a setting of 1, or maybe 2 missed cleavage sites. Don’t specify a higher number without good reason, because each additional level of missed cleavages increases the number of calculated peptide masses to be matched against the experimental data. If the actual digest does not contain extended partials, this simply increases the number of random matches, and so reduces discrimination.

Data file

Browse to a peak list file which will be uploaded when the search is submitted. Details of supported file formats can be found here.

Data file URL

Enter the URL to a peak list file. This option is only available if Mascot security is enabled, when the security settings specify which protocols are available (http, ftp, file) and the maximum permitted file size (MB).

Query

For a Peptide Mass Fingerprint, unless a peak list file is specified, the query window must contain a list of peptide mass values, one per line. An intensity value after the mass value is optional. Anything after the second numeric value on each line is ignored.

If intensity information is available, values will be selected according to their intensity so as to get the best score. This can be disabled by setting IteratePMFIntensities to 0 in mascot.dat

For a Sequence Query, each line entered into the query window must consist of one experimental peptide mass value, optionally followed by qualifiers for that peptide:

M seq(…) comp(…) ions(…) tag(…) etag(…)

M is an experimental mass value, seq(…) is AA sequence information, comp(…) is AA composition information, ions(…) contains MS/MS fragment mass and (optionally) intensity values, tag(…) is a sequence tag, etag(…) is an error tolerant sequence tag.

A line may contain zero, one, or many qualifiers. If there are multiple sequence tag qualifiers, and one or more is error tolerant, then all tags are treated as error tolerant.

N.B. ions(…), tag(…), and etag(…) qualifiers are scored probabilistically. That is, the more qualifiers that match, the higher the score, but all qualifiers are not required to match. In contrast, seq(…) and comp(…) are treated as filters. If a seq(…) or comp(…) qualifier fails to match, then the entire query is discarded. Hence, only include seq(…) or comp(…) qualifiers which are known with a high degree of confidence. Note that using a seq(…) qualifier in a Mascot search is not equivalent to a performing a Blast search.

If you re-Search a Sequence Query from the results page, you may notice two additional qualifiers which are used by Mascot internally: from(…) and title(…).

Report hits

This parameter determines the maximum number of hits displayed in a search results report. If your connection to the internet is slow, selecting a low number of hits will reduce the time taken to load and display a search report.

Choose AUTO to display only protein hits with significant scores. In a protein summary report, one additional hit is reported after the cutoff at the significant score. This is to ensure that the report provides some feedback, even though there may be no significant matches.

Precursor removal

The precursor peak can often have very high intensity relative to the fragment peaks, which may give rise to spurious fragment ion matches. It is usually best if the precursor is removed before the search.

With the default arguments of -1,-1, a smart filter is created. This removes peaks within the fragment ion tolerance window about each of the precursor isotope peaks. The number of isotopes is assumed to be as follows:

Mr	Number
< 1000	3
1000 – 1999	4
2000 – 2999	5
3000 – 3999	6
4000 – 4999	7
5000 – 5999	8
6000 – 6999	9
> 7000	10

So, if the precursor m/z was 800, the charge was 2, and fragment ion tolerance was +/- 0.1 Da, the filter would remove 4 notches of width

m/z 800.0 +/- 0.1
m/z 800.5 +/- 0.1
m/z 801.0 +/- 0.1
m/z 801.5 +/- 0.1

At first sight, this may seem a strange mix of m/z and Da. The reason is that we need to avoid matches from 1+ fragment ions, whatever the charge on the precursor. If the arguments are anything other than -1,-1, a single notch is used where the first argument is the mass offset of the beginning of the notch and the second value is the mass offset of the end of the notch. For the precursor in the last example, if the arguments were -1,4 then the notch would run from m/z 799.5 to m/z 802.0. However, if the precursor charge was 1, then the notch would be from m/z 799 to m/z 804.

Instrument

For an MS/MS Ions Search, choose the description which best matches the type of instrument used to acquire the data. This setting determines which fragment ion series will be used for scoring, according to the following table. "Default" corresponds to the configuration used in Mascot version 1.7 and earlier.

	Default	ESI QUAD TOF	MALDI TOF PSD	ESI TRAP	ESI QUAD	ESI FTICR	MALDI TOF TOF	ESI 4 SECT	FTMS ECD	ETD TRAP	MALDI QUAD TOF	MALDI QIT TOF	MALDI ISD
1⁺ fragments	X	X	X	X	X	X	X	X	X	X	X	X	X
2⁺ fragments if precursor 2⁺ or higher	X	X		X	X	X		X	X	X	X
2⁺ fragments if precursor 3⁺ or higher
Immonium ions			X				X	X			X	X
a series ions	X		X				X	X				X	X
a-NH₃ if fragment includes RKNQ	X		X				X					X
a-H₂O if fragment includes STED			X				X					X
b series ions	X	X	X	X	X	X	X	X			X	X
b-NH₃ if fragment includes RKNQ	X	X	X	X	X	X	X	X			X	X
b-H₂O if fragment includes STED		X	X	X	X	X	X	X			X	X
c series ions									X	X			X
x series ions
y series ions	X	X	X	X	X	X	X	X	X	X	X	X	X
y-NH₃ if fragment includes RKNQ	X	X		X	X	X	X				X	X
y-H₂O if fragment includes STED		X		X	X	X	X				X	X
z series ions								X
z+H series ions									X	X
z+2H series ions									X	X			X
internal yb < 700 Da							X	X			X	X
internal ya < 700 Da							X	X			X	X
y or y⁺⁺ must be significant
y or y⁺⁺ must be top scoring series
d or d’ series ions							X
v series ions							X
w or w’ series ions							X

Other Parameters

There are a number of other search parameters, but their default settings should not be changed under normal circumstances. For this reason, they are not accessible from the browser interface. The defaults can be over-ridden by using embedded parameters, either in a data file or in the query window. But, be warned that you change them at your own risk!

Matrix Science