Jump to contentJump to page navigation: previous page [access key p]/next page [access key n]
Applies to X!TandemPipeline 0.4.43

1 Generalities

In this chapter, I wish to introduce some general concepts around the X!TandemPipeline program, the reference to be used to cite the software in publications, the building and installation procedures.

1.1 General concepts and terminologies

This section describes the general concepts at the basis of the analysis of proteomics data that one needs to grok in order to properly assimilate the workings of the X!TandemPipeline software.

1.1.1 Bottom-up Proteomics or Top-down Proteomics?

Proteomics is a mass spectrometry-based field of endeavour that is aimed at characterizing the protein complement of a given genome. The protein complement of a genome is the set of proteins that are expressed at a given instant in the life of a cell, a tissue or an organ, for example. Characterizing that protein complement actually means identifiying the proteins expressed by a given living cell or tissue or organ. Optionally, if feasible, the characterization of post-translational modifications might be desirable.

There are two main variants of protemics: bottom-up proteomics and top-down proteomics:

  • The first variant—bottom-up proteomics—identifies proteins on the basis of the identification of all the peptides obtained by first digesting all the proteins of the sample using an enzyme of known specificity. In this variant, the sample that is injected in the mass spectrometer is the resulting peptide mixture (first resolved by high performance liquid chromatography). The identification of the proteins contained in the initial sample is performed in a number of steps that are actually the focus of X!TandemPipeline. Indeed the X!TandemPipeline software is a bottom-up-oriented software program.

  • The second variant—top-down proteomics—identifies proteins on the The second variant identifies proteins on the basis of intact proteins directly injected in the mass spectrometer. Of course, it might be necessary to fragment the proteins in the mass spectrometer and to use the fragments to actually identify the protein. However, the fact that the protein is first detected and analyzed as one entity (and not as set of peptides), allows for some very useful discoveries, like the identity and number of post-translational modifications, for example.

Note
Note

At the moment, X!TandemPipeline does not handle top-down proteomics data: it is a bottom-up proteomics software project.

1.1.2 Typical cycle of a mass spectrometer data acquisition

Once the initial sample, containing all the proteins to identify, has been digested using a protease of known cleavage specificity (trypsin, typically), the peptidic mixture (that might be highly complex) needs to be resolved as much as possible using chromatography. In the vast majority of the proteomics experimental settings, the chromatography setup is connected to the mass spectrometer so that when the gradient is developed, all the peptides are immediately injected on line to the mass spectrum ion source.

The mass spectrometer runs an analysis cycle that can be summarized like the following:

  • Acquire a full scan mass spectrum of the whole set of ions at a given chromatography retention time. This kind of mass spectrum is called a MS spectrum;

  • Enter a loop during which ions having the most intense signal are subjected in turn to collision-induced dissociation (CID), that is, are fragmented by accelerating them against gas molecules in a fragmentation cell. The mass spectra that are collected at each one of these fragmentation acquisitions are called MS/MS spectra because they are obtained after two mass analysis events: the first event is the measurement of the intact peptide ion's m/z value (full scan mass spectrum) and the second event is the measurement of all the obtained fragments' m/z values (MS/MS scan).

Each instrument records all the MS and MS/MS spectra in a raw data format file that is specific of the vendor. Free Software developers cannot know the internal structure of the files. To use the mass spectrometric data, they need to rely on a specific software that performs the conversion from the raw data format to an open data format (mzML). That program is called msconvert, from the ProteoWizard project.

Note
Note

Mass spectrometrists used to call ions that were analyzed in full scan mass spectra parent ions. They also used to call fragment ions arising upon fragmentation of a parent ion daughter ions. This terminology has been deprecated and has been replaced with precursor ion and product ion, respectively. In our document, we thus use the new terminology.

1.1.3 Outline of an X!TandemPipeline working session

X!TandemPipeline loads mzXML- and mzML-formatted files and needs for its operations to have accesss to all the MS and MS/MS spectra. Once data files have been loaded, X!TandemPipeline allows the user to perform the following tasks, that will be detailed in later chapters:

  • Configure the X!Tandem database searching software (that is, the software, external to X!TandemPipeline that actually performs the peptide-mass spectrum matches);

  • Run the X!Tandem software and load its results;

  • Display the results to the user in a way that they can be scrutinized and checked. The peptide identification results serve as the basis for another processing step that is integrally performed by X!TandemPipeline: the protein inference. That step aims at using the peptide identifications to actually craft a list of proteins identities. The user is provided with various means to control that step in various ways.

1.2 Citing the X!TandemPipeline software.

Please, cite the software using the following citation: Olivier Langella, Benoît Valot, Thierry Balliau, Mélisande Blein-Nicolas, Ludovic Bonhomme, and Michel Zivy (2016) X!TandemPipeline: A Tool to Manage Sequence Redundancy for Protein Inference and Phosphosite Identification. J. Proteome Res. 2017, 16, 2, 494–503. https://doi.org/10.1021/acs.jproteome.6b00632.

1.3 Installation of the software

The installation material is available at http://pappso.inrae.fr/en/bioinfo/xtandempipeline/download/.

1.3.1 Installation on MS Windows and macOS systems

The installation of the software is extremely easy on the MS-Windows and macOS platforms. In both cases, the installation programs are standard and require no explanation.

1.3.2 Installation on Debian- and Ubuntu-based systems

The installation on Debian- and Ubuntu-based GNU/Linux platforms is also extremely easy (even more than in the above situations). ; is indeed packaged and released in the official distribution repositories of these distributions and the only command to run to install it is:

$ [3] sudo apt install <package_name>RETURN

In the command above, the typical package_name is in the form xtpcpp for the program package and xtpcpp-doc for the user manual package.

Once the package has been installed the program shows up in the Science menu. It can also be launched from the shell using the following command:

$ xtpcppRETURN

Tip
Tip

If the Debian system onto which the program is to be installed is older than testing, that is, older than Buster (Debian 10), then using the AppImage program bundle might be a solution. See below for the method to run mineXpert2 as an AppImage bundle.

1.3.3 Installation with an AppImage software bundle

The AppImage software bundle format is a format that allows one to easily run a software program on any GNU/Linux-based distribution. From the http:/appimage.org/:

 

The key idea of the AppImage format is one app = one file. Every AppImage contains an app and all the files the app needs to run. In other words, each AppImage has no dependencies other than what is included in the targeted base operating system(s).

 
 --Simon Peter

There are AppImage software bundles for the various mineXpert2 versions that are available for download. As of writing, the software bundle has been tested on Centos version 8.3.2011 and on Fedora version 22. These are pretty old distribution versions and thus mineXpert2 should also run on more recent versions of these computing platforms. The AppImage bundle of mineXpert2 was created on a rather current Debian version: the testing Debian 11-to-be distribution.

In order to run the mineXpert2 software AppImage bundle, download the latest version (like mineXpert2-0.7.4-x86_64.AppImage). Once the file has been downloaded to the desired directory, change to that directory and change the permissions to make it executable:

$ chmod a+x mineXpert2-0.7.4-x86_64.AppImageRETURN

Finally, execute the file that has become a normal program:

$ ./mineXpert2-0.7.4-x86_64.AppImageRETURN

Tip
Tip

If the program complains about a locale not being found, please, modify the command line to read:

$ LC_ALL="C" ./mineXpert2-0.7.4-x86_64.AppImageRETURN

1.4 Building the software from source

The mineXpert2 software build is under the control of the CMake build system. There are a number of dependencies to install prior to trying to build the software, as described below.

1.4.1 The dependencies required to build X!TandemPipeline

The dependencies to be installed are listed here with package names matching the packages that are in Debian/Ubuntu. In other RPM-based software, most often the package names are similar, albeit with some slight differences.

Dependencies
The build system

cmake

Conversion of svg files to png files

graphicsmagick-imagemagick-compat

For the parallel computations

libgomp1

For the isotopic cluster calculations

libisospec++-dev

For all the raw mass calculations like the data model, the mass spectral combinations…

libpappsomspp-dev, libpappsomspp-widget-dev

For all the plotting

libqcustomplot-dev

For the C++ objects (GUI and non-GUI)

qtbase5-dev, libqt5svg5-dev, qttools5-dev-tools, qtchooser

For the man page

docbook-to-man

For the documentation (optional, with -DMAKE_USER_MANUAL=1 as a flag to the call of cmake, see below.)

daps, libjeuclid-core-java, libjeuclid-fop-java, docbook-mathml, libjs-jquery, libjs-highlight.js, libjs-mathjax, fonts-mathjax, fonts-mathjax-extras, texlive-fonts-extra, fonts-ebgaramond-extra

1.4.2 Getting the source tarball

In the example below, the version of the software to be installed is 7.3.0. Replace that version with any latest version of interest, which can be looked for at https://gitlab.com/msxpertsuite/minexpert2/-/releases.

1.4.2.1 Using git

The rather convoluted command below only downloads the branch of interest. The whole git repos is very large…

$ git clone https://gitlab.com/msxpertsuite/minexpert2.git --branch master/7.3.0-1 --single-branch minexpert2-7.3.0

1.4.2.2 Using wget to download the tarball

wget https://gitlab.com/msxpertsuite/minexpert2/-/archive/7.3.0/minexpert2-7.3.0.tar.gz

Untar the tarball, which creates the minexpert2-7.3.0 directory:

tar xvzf minexpert2-7.3.0.tar.gz

1.4.3 Building of the software



        Change directory:

        $ cd minexpert2-7.3.0

        Create a build directory:

        $ mkdir build

        Change directory:

        $ cd build

        Configure the build:

        $ cmake ../ -DCMAKE_BUILD_TYPE=Release

        Build the software:

        $ make

      



[3] The prompt character might be % in some shells, like zsh.

Print this page