This chapter describes in detail all the steps that the user accomplishes in their data exploration session. The general workflow is to start by looking a a protein identification results window and then by going into the details of the various identifications listed in it. This latter task entails looking into the peptides that provided the protein identification and then looking at the mass spectrum that provided the peptide identification. The mass spectrum, that is, the MS/MS spectrum has features aimed at allowing the user to make an informed opinion on the validity of the peptide vs mass spectrum match (PSM) at hand. At each moment, it is possible to invalidate a PSM and the identification results are recomputed automatically by taking into account the modification entered by the user.
When identification results files are loaded, X!TandemPipeline automatically perform the protein inference process by using the configuration settings set as described in Section 3.4.1, “Configuration of the parameters”.
When the protein inference process is finished, the X!TandemPipeline displays the protein identifications list in a table view, as pictured in Figure 4.1, “”.
The contents of the protein identifications list window are detailed below:
Checked: if checked, the identified protein sitting on the table row is set to an “accepted” state. By default, all proteins are set to this accepted state. Unchecking a protein determines the protein inference reprocessing, because disregarding a protein modifies the whole protein identifications results set.
group: the protein group the protein belongs to.
accession: the accession number field of the protein database.
description: the description field in the protein database.
log(E-value): the Log10 of the protein E-value;
E-value: the protein E-value;
spectra: the number of spectra that identified the protein.
specific spectra: the number of spectra that identified only this protein.
sequences: the number of peptidic sequences that can be assigned to this protein.
specific sequences: the number of peptidic sequences that can be assigned only to this protein.
coverage: the percentage of the protein sequence covered by the peptides that identified it.
MW: the molecular weight of the protein (Mr).
PAI: “Protein abundance index”. This index was defined as the “number of peptides identified divided by the number of theoretically observable tryptic peptides”. See https://www.ncbi.nlm.nih.gov/pmc/articles/PMC186633/.
emPAI: “Exponentially modified protein abundance index”. This index was defined as emPAI = 10PAI − 1. See https://pubmed.ncbi.nlm.nih.gov/15958392/.
It is possible to select the columns that must be displayed in the table by checking or unchecking the corresponding item in the
menu.The
menu allows one to select the kind of protein items to be shown:: when check the program only shows valid proteins, that is protein identifications that fullfill the restriction parameters, like protein E-value, for example. These parameters were set at protein identification results fiel loading time but can be modified later.
not into account the proteins that were disregarded.
: show only the proteins that were checked. This setting is useful when the user has unchecked a number of proteins and that they want to regularly keep an eye on them. When proteins are unchecked, the protein inference process is run anew to compute a new grouping by taking: only show the proteins that belong to a group.
The protein identifications list window picture in the figure show greyed protein identities. These are proteins that, by current filter parameters (E-value threshold, for example), are considered not valid.
The protein identifications list window houses a number of pretty interesting features that let the user scrutinize the protein identifications and also modify the results to suit either more or less stringent filtering parameters.
Searching data in the table view. One interesting feature of the protein identifications list window is the ability to search through the table's contents using the Search item at the bottom of the window. A number of fields of the protein record, that is, columns in the table view might be searched.
Dynamic setting of the filter parameters. X!TandemPipeline provides a rather high level of flexibility: once a protein identification results set of files has been loaded and that the protein inference process is achieved, the resulting protein groups are displayed in the protein identification list window. At this time, the grouping was performed using the parameters set as pictured in Section 3.4.1, “Configuration of the parameters”. It is nonetheless possible to modify these parameters on the fly using the main program window's Filter parameters tab, as pictured in Figure 4.2, “Protein identification filter parameters tab of the main window”.
Real time update of the false discovery rate. The false discovery rate (FDR) is recalculated at each protein inference process. The data regarding this quality assessment criterion are shown in Figure 4.3, “False discovery rate (FDR) data after a protein inference process is run”.
Distribution of mass errors on PSMs plotted in a histogram. It is possible to visualize the distribution of the mass errors over the whole dataset, as pictured in Figure 4.4, “Mass precision quality assessment”. The histogram plots the number of mass spectra that could achieve a PSM against the mass error (mass delta), that is, the difference between the experimental peptide mass and the calculated peptide mass. Figure 4.4, “Mass precision quality assessment”.
The mass delta calculation involves only the peptides that successfully identifed proteins that are currently checked in the protein identification list and that satisfy the filter parameters. The proteins identified in the decoy database are not processed. The unit of the mass delta may be selected using the Unit drop-down list. Two units are available: ppm (for part-per-million) or Dalton.
Exporting the final protein identifications list to a spread sheet. Once all the proteins in the identifications list have been properly checked, the user might export the data set to an OpenDocumentFormat (ODF) spread sheet file using the menu item of the main window's menu.
The protein identifications list table view, as pictured in Figure 4.1, “” is actually an active matrix where the user can easily trigger the exposition of the data that yielded any protein identification element of the table. This is simply done by clicking onto any cell of the table at the row matching the protein for which scrutiny of the data is requested.
Depending on the column at which the mouse click happens, there might be two different windows showing up:
The Protein details window, showing the sequence of the protein, the matching peptides, as pictured below:
When one cell in any one of the remaining columns is clicked, the window that shows up is the Peptide list window showing all the peptide identifications list, to be described in the next section.
When clicking one cell in one column and one given row, the corresponding window shows up, if one was not already opened. If one window was already opened, no other window shows up, but the existing window has its data updated to match the new protein row being clicked on.
It is possible to have multiple windows opened at a time by clicking a new row while maintaining the Ctrl key pressed.
The peptide identifications list window displays all the data in a table view similar to the one used to display the protein identifications list described in the previous sections.
The peptide identifications list table view has a pretty large number of columns to display all the data about each peptide that identified a given protein. These columns are described in the following figures.
The table's contents are well described by the column headers that are self-explanatory. When hovering over a column header with the mouse cursor, a tool-tip explanatory text is displayed.
It must be noted that more columns might make the table view depending on the protein identification data that were loaded. Indeed, depending on the database searching engine that was used for the protein identification, the data to be displayed vary. The whole list of columns that might be displayed in the table view are pictured in Figure 4.8, “Columns that populate the peptide identifications list table view”
The peptide identifications list window houses a number of pretty interesting features that let the user scrutinize the peptide details.
Searching data in the table view. One interesting feature of the peptide identifications list window is the ability to search through the table's contents using the Search item at the bottom of the window. A number of fields of the protein record, that is, columns in the table view might be searched.
Exporting the final protein identifications list to a spread sheet. Once all the peptides in the identifications list have been properly checked, the user might export the data set to an OpenDocumentFormat (ODF) spread sheet file using the menu item of the main window's menu.
The peptide identifications list table view, as pictured in Figure 4.6, “Peptide identifications list table view (1)” is actually an active matrix where the user can easily trigger the exposition of the data that yielded any peptide identification element of the table. This is simply done by clicking onto any cell of the table at the row matching the peptide for which scrutiny of the data is requested.
When clicking on any cell of the peptide identifications list table view, one window shows up that details the various data elements for the peptide documented in the table row. The window is pictured in Figure 4.9, “Peptide vs mass spectrum details”.
In Figure 4.9, “Peptide vs mass spectrum details”, the two graphs show the following:
The top graph displays the mass spectrum of this PSM. This MS/MS spectrum has its recognized peaks labelled in the y and bion series.
The bottom graph plots, for each matching MS/MS peak (that is, b or y series ions), the error (mass delta) compared to the theoretical ion mass. In this example, we see that the y ion series is almost perfectly matched (low error and also all the errors in the same value range).
It is possible to zoom in on a region of the graphs by positioning the mouse cursor on the region of interest and then rotating the mouse wheel. To unzoom, simply rotate the mouse wheel in the reverse direction.
The right hand side margin provides a number of data about the PSM, like the peptide E-value, the HyperScore, the ion charge, the theoretical and experimental masses, the retention time at which this ion was detected… The data bits are self-explanatory.
One interesting feature of the Peptide details window, is the XIC button (top right) that triggers the calculation of an extracted ion current chromatogram, as pictured in Figure 4.10, “The extracted ion current (XIC) chromatogram viewer window”.
The notion of extracted ion current chromatogram is best explained by describing the computation that yields that chromatogram.
The user defines the m/z value for which the chromatogram is to be determined. The program iterates in each MS (that is, full scan) spectrum and looks if an ion by that m/z value was encountered. If so, a variable holding the cumulated intensity of that ion is incremented for the retention time at which the mass spectrum was acquired. For example, if m/z value 1254.25 is searched for, and an ion of that m/z value is found in the mass spectrum acquired at retention time 2.5 min, then a tuple variable is stored like this: (2.5, intensity). Then, another mass peak by that m/z value is found in mass spectrum acquired at retention time 47 min, for which another tuple is created: (47, intensity).
If the data are from ion mobility—mass spectrometry (IM-MS) experiments, then there might be a large number of spectra acquired at a given retention time. For example, data from the Waters Synapt2 instrument have 200 spectra acquired for any given retention time value (the spectra are drift-related spectra). In Bruker timsTOF data, there are more than 700 spectra acquired at any given retention time. Thus, the searched m/z value might be found more than once for a retention time value. In this case, the tuple's intensity value is incremented by the intensity of the new peak of the m/z value at that specific retention time value.
When the program has finished iterating in all the mass spectra of the acquisition, it plots the XIC chromatogram as intensity=f(retention time). This is the reason why it is a chromatogram.
The extracted ion current (XIC) chromatogram viewer is useful to scrutinize the mass data at the very origin of a PSM. It is routinely used to ensure that the PSM is faithful. If not, the corresponding peptide can be unchecked from the peptide identifications list table view, which triggers the running anew of the protein inference process.
The XIC viewer window displays the “guts” of the of MS spectrum of the precursor ion that was fragmented and that yielded a PSM. The XIC chromatogram (left plot panel) is actually a set of XIC chromatograms that are superimposed in the plot widget (see Figure 4.11, “The extracted ion current (XIC) chromatogram viewer window (zoomed view)”). One of the traces (legend +0) is for the first peak of the isotopic cluster of the searched ion in the MS data of the acquisition; the second trace (legend +1) is for the second peak of the isotopic cluster. In the typical informatics-oriented style of numbering, the first isotopic peak (only light isotopes enter in the composition of the peptidic ion), is “isotope 0”; the the second isotopic peak (one light isotoope is substituted with a heavy one) is “isotope 1”.
The right panel is a bar plot showing the theoretical isotopic ratio between the first and the second peak of the isotopic cluster (blue) with, superimposed, the experimental ratio. In the example, the match between the experimental and the theoretical cluster shape is perfect.
Another interesting bit of information is the Fraction of Isotopic distribution number that reflects the ratio between the plotted isotopic cluster peaks over the whole theoretically calculated isotopic cluster (in which more than one light isotope is substituted with a heavy isotope (with a mass increment of +2 and not +1). In the example that ratio is 80 %.
To zoom in/out regions of the XIC chromatogram plot widget, hover the mouse cursor over the region of interest and rotate the mouse wheel.
X!TandemPipeline is able to cope with phospho-peptides. The mass spectrometric data are acquired exactly as usual with the mass spectrometer, but the sample preparation goes along theses steps:
Separate digestion of the samples (when there are more than one);
Labeling of the peptides, each sample gets a different label;
Pool of the whole set of peptides into a single mixture;
Separation of the peptides on a strong cation exchange (SCX) resin, collection of the fractions;
Phospho-peptide enrichment using IMAC[7] for each SCX fraction. The SCX fraction is loaded onto the IMAC resin and, following a wash step, the phospho-peptides are eluted (pH-based elution). There is thus a one-to-one relation between a SCX fraction and an IMAC-based purification fraction.
Mass spectrometric analysis of each IMAC-based phospho-peptide-enriched fraction.
X!Tandem needs to be configured in such a manner that it can generate all the theoretical peptides (and fragments) that might bear the phosphoryl group. This process is described in the section below.