The final EuBIC Winter School 2019 programme is now available.
Talk titles and abstracts
Label-free Quantification of Complex Proteomes using Ion-Mobility-based DIA
Developing the tools for the personalized medicine revolution: Using mass spectrometry for longitudinal molecular profiling
Bioinformatics for Proteomics – any open questions?
Proteomics, especially with mass spectrometry has reached many milestones. Several challenges postulated as being show stoppers have been addressed: identification with limited false positives, quantification, finding "all" gene-coded proteins, modifications (plus localization), usable standard formats. In parallel, instruments and algorithms became more sensitive, more exact and data more sustainable. But there are still some unexplained phenomenons, all-day questions to solve, closed doors to open. For example, the increasing mass accuracy creates new challenges to false-discovery rate estimation. Or, shared peptides could be used for a better quantification. To open the box of pandora - all our method development in mass spectrometry for Proteomics may become obsolete some day.
ProteomeTools&Prosit: Accessing high quality reference spectra via ProteomicsDB
In mass spectrometry-based proteomics, the identification and quantification of peptides heavily relies on sequence database searching. However, the lack of accurate predictive models for fragment ion intensities impairs the realization of the full potential of this approach. Via ProteomicsDB, reference spectra from the ProteomeTools project and predictions from Prosit are now available, allowing their integration into various proteomic workflows. This talk will showcase some of the many applications in DDA, DIA and PRM workflows.
Debate topic: If and to what degree can we rely on predicted spectra and what consequences does this have?
Novel DIA Data Analysis Workflow: Integration of De Novo Sequencing and Database Search
Utilizing a new feature-based identification approach, PEAKS X enables full DIA analysis support without the need of a spectrum library. With this solution, DIA spectra are associated to their respective LC-MS peptide features and directly analyzed to determine the peptide sequences. With the combined de novo sequencing and sequence database search technique, PEAKS removes biases found in commonly utilized approaches and provided accurate estimation of false discovery rate.
STRING — Large-scale integration of data and text
Methodological advances have in recent years given us unprecedented information on the molecular details of living cells. However, it remains a challenge to collect all the available data on individual genes and to integrate the highly heterogeneous evidence available with what is described in the scientific literature. The STRING database aims to address this by consolidating known and predicted protein–protein association data for a large number of organisms. In my presentation, I will give an overview of the STRING database and describe the general approach we use to unify heterogeneous data, provide comparable quality scores for all evidence types, and automatically mine associations from the biomedical literature.
While protein networks are commonly used for visualization of proteomics data, they are often — and for good reason — referred to as “hairballs” or “ridiculograms”. When retrieving
a protein network for the significantly regulated proteins in a proteomics study, for example, from the STRING database, the result is typically a network of 100–1000 proteins and 1000–10000 interactions. Visualizing such a big network in a meaningful manner is inherently very difficult. To make matters worse, proteomics researchers often want to visualize time-course and/or site-specific data for each protein onto the network. The debate will focus on common challenges encountered when visualizing proteomics data on networks as well as how we can simplify and visualize the data to produce more meaningful figures.
Debate: Challenges in network visualization of proteomics data
Using phosphoproteomics data to study context-specific signalling.
Phosphoproteomics data provide a snapshot of the phosphorylation-based signaling state of cells. They can therefore be used to dissect the dynamic networks active in a cell in a given condition. Several methods infer context-specific signalling networks from phosphoproteomics data by using as informative priors either existing protein-protein interaction networks or networks from pathway databases. These suffer from severe study bias and therefore data-driven analyses could provide more scope for novel discoveries and an improved understanding of context-specific cell signalling.
To allow non-bioinformaticians to perform data-driven analyses on phospho-proteomics datasets I developed SELPHI, which performs automated data integration and correlation-based network inference. We applied SELPHI to phospho-proteomics data from B cells in variable inhibitor and stimulation conditions, and we identified a novel substrate recognition motif for the Fes kinase. Our follow-up study showed that the motif is recognized by the CSK kinase, and led to explaining the dual oncogenic and tumor suppressive function of Fes.
Currently, we have taken advantage of published phosphoproteomics datasets to generate a data-driven kinase signalling network that can be used as an informative prior for network inference. I will also present preliminary results on a new method that uses phosphoproteomics data to derive context-specific cell signaling networks.
Debate topic: Are we currently getting the most that we can from our proteomics data analyses? If not what else could we get and how could we go about it?
Ionbot: a novel, fully data-driven search engine for open modification and mutation searches.
Modern shotgun proteomics is entirely dependent on accurate search engine tools to match observed spectra to the peptide sequences that generated them. Here we focus on the widely applied approach that is based on a target database that contains all proteins (peptides) expected to be in the sample under consideration. To accommodate for the high computational cost of matching tens of thousands of MS/MS spectra, peptides in the target database are typically considered to be modified by a few of the most common modifications only.
We present Ionbot, a completely new and highly powerful search engine that allows for matching MS/MS spectra against extremely large target databases (allowing thousands of potential protein modifications including mutations). It achieves high processing speeds by implementing a new data-driven approach for selecting candidate peptides for a given MS/MS spectrum. Further, a novel PSM scoring function based on predicted MS/MS spectra is presented as a means to maintain a high degree of sensitivity (at fixed FDR) when handling very large target databases. Ionbot will be demonstrated to perform very well in open modification and mutation searches.
Topic for discussion: “The future of MS/MS peptide identification?”
Insights into the multi-functioning proteome.
It is well accepted paradox in biology genome size does not correlate with organismal complexity. In terms of proteins, it could be argued that in higher organisms the proteome is simply too small for the complex functions that it is has to perform.
The chemical space that the proteome occupies in higher organisms is vastly expanded by post translational modification, but the numbers and roles of differently functioning proteoforms in a cell are currently uncertain.
In this talk I will review methods that shed light on different functional roles of proteins, from establishing multiple subcellular locations of proteins, to determining additional nucleic acid binding properties of metabolic enzymes. I will also discuss difficulties in trying to determine alternative functional roles for proteins.
Debate: “I would hope the debate takes in thoughts about structural informatics, protein quantitation, determination of PTMs, top down proteomics etc....”
XCorDIA: a new database search engine to detect genetic variants from DIA data
Single nucleotide polymorphisms and other genomic sequence variants can have profound impact on susceptibility to disease. Even still, most shotgun proteomics workflows focus on detecting canonical protein sequences found in FASTA databases. While proteogenomics methods that combine customized exome sequencing with mass spectrometry are emerging for data dependent acquisition (DDA), data independent acquisition (DIA) approaches frequently rely on curated spectrum libraries that lack sequence variants. Moreover, because most variants result in small retention time and M/Z shifts, these peptides often co-isolate and fragment together in wide DIA precursor isolation windows. Variant peptides produce many of the same fragment ions as canonical peptides and confidently distinguishing different forms is challenging. Moreover, peptide-centric search engines can produce undetectable false positives using these shared ions when searching for low mass PTMs or sequence variants caused by either SNPs, paralogs, or orthologs. We present XCorDIA, a new database search engine that detects and statistically validates PTMS and peptide variants in PEFF databases from DIA data. XCorDIA searches for PTMs and sequence variants by batching peptides that share fragment ions and confirming the presence of specific variants using a PTM/variant detection algorithm similar to PTM site-localization algorithms. We validate XCorDIA using methionine oxidized peptides. Oxidation shifts precursor mass by only +16 Da, which produces mass shifts similar to most SNPs (e.g. +14 Da, V L). Without variant-specific scoring, we find that based on shared fragment ions approximately 1/3rd of oxidized peptides are incorrectly detected at the retention time corresponding to the unmodified form. Finally, we demonstrate how XCorDIA detects sequence variants from ClinVar using clinical amyloidosis samples.
Trapped ion mobility spectrometry: a new dimension for mass spectrometry-based proteomics
The fast scan speed of time-of-flight analyzers allows adding ion mobility spectrometry as a third dimension of separation. Trapped ion mobility spectrometry (TIMS) is particularly attractive due to its compact design and highly efficient ion utilization. We have recently introduced a novel scan mode termed parallel accumulation – serial fragmentation (PASEF), which synchronizes the release of peptide ions from the TIMS device with the precursor selection in the quadrupole (PMID: 26538118). In data-dependent acquisition, PASEF increases the sequencing by more than 10-fold without any loss in sensitivity (PMID: 30385480). Transferring the PASEF principle to data-independent acquisition could, in principle, capture a much larger proportion of the available ion current as compared with classical DIA, thereby improving sensitivity and acquisition speed several fold. We further demonstrate that peptide collisional cross sections can be readily measured at the scale of 100,000s with high precision. We conclude that TIMS in combination with PASEF is an exciting addition to the technological toolbox in proteomics, with many unique operating modes and applications still left to be explored.
Debate: In my talk, I will introduce trapped ion mobility spectrometry and how it benefits DDA and DIA. This is a very exciting field of research with many challenges left. Depending on the audience, the discussion could be very general about ion mobility, the value of this additional dimension, novel scan modes, how to analyze 4D data, potential applications in proteomics.
Workshop titles and abstracts
Introduction to computational mass spectrometry using OpenMS (educational)
We will use the OpenMS library to explore mass spectrometric raw data and data processing basic concepts in mass spectrometry. We will learn about the visualization tools in OpenMS, the scripting capabilities using Python and the internal algorithms and datastructures available. We will also talk about the community and how you can write your own tools in OpenMS and contribute to the project.
Computational introduction into DIA (educational)
Label-free quantification: concepts and algorithms (educational)
Quantitative proteomics, statistics, clustering and complexes (educational)
Basic guidelines and methods for visual inspection of quantitative proteomics data, to apply statistical tests, clustering of multivariate data and quantitative assessment of the behavior of protein complexes
The essentials before and after spectrum identification
In this workshop the participants will learn more about two fundamental pillars of most proteomics studies: the protein databases and the protein inference. We will discuss and show in hands on tutorials which databases are suitable for which analyses and what needs to be considered for the right choice. Furthermore, workflows for performing protein inference using PIA will be explained and teached hands-on.
Attendees should bring their own laptop for the workshop.
Quality Control and Benchmarking of Label-Free Quantification Workflows with LFQBench
Discovering the open-source Proline software suite, a new efficient and user friendly solution for label-free quantification
DDA Label-free quantification based on precursor ion intensity is a widely used method for quantifying differentially expressed proteins across different conditions or samples. An ideal software solution should allow the production of reliable and comprehensive results, and be flexible enough to allow the integration of existing tools without compromising ease-of-use. To meet these objectives we developed the Proline software, a next-generation tool based on a modular data processing toolbox. This tool constitutes a very interesting alternative to competing solutions, combining robustness, performance, modularity and user-friendliness.
This workshop will be a good opportunity to discover the data processing functionalities of Proline and also the various visualization tools integrated in the Proline-Zero desktop application. After a short introduction of the main software features, we will follow several tutorials aiming at providing a global overview of the tool. During this hands-on session, we will run an the data analysis of a standard dataset composed of an equimolar mixture of 48 human proteins (UPS1, Sigma) spiked at different concentrations into a yeast cell lysate background.
SELPHI: using data-driven approaches for analysis of phosphoproteomics datasets
Current phosphoproteomics data analysis pipelines focus mostly on identifying differentially regulated peptides and mapping them on known pathways. This limits our insight around pathways that are well studied and annotated. SELPHI aims to take a data driven approach, to help biologists explore the space less studied in their datasets.
In this workshop I will explain what the aim of SELPHI is and how it works. I will also describe how to generate files for use with SELPHI and will perform a walk through of all the different results that you can acquire using this tool.
Proteome Discoverer 2.3 Workshop
· New Features in PD 2.3
· Statistics and Quantification roll-up strategies for Precursor and Reported based quantification
· Advanced featured and nodes (Cross-linking, Top down)
· Node programming
· Q & A
Advanced data acquisition methods with MaxQuant.Live
MaxQuant.Live (www.maxquant.live) is a freely available software framework for real-time monitoring of mass spectrometric data and controlling of the data acquisition. It enables advanced data acquisition strategies on Q Exactive mass spectrometers such as BoxCar (Meier et al., Nat. Methods 2018) and EASI-tag quantification (Virreira Winter et al., Nat. Methods 2018) via a user-friendly graphical interface. Furthermore, it recognizes thousands of peptide precursors in real-time by live re-calibration in three dimensions. In this workshop, you will get familiar with the MaxQuant.Live app store and start generating your own methods, for example a global targeting method for over 20,000 peptides in a single run.
Validation of peptide identifications
1) Get predicted spectra from ProteomicsDB
2) Compare results to [proteogenomics/sORF] data using R
3) Investigate effects of pre-processing on spectra similarity
Network visualization with Cytoscape and stringApp
The workshop will first provide a quick introduction on the Cytoscape network analysis and visualization tool as well as the Cytoscape stringApp, which makes it easy to import networks from STRING into Cytoscape. Afterwards, we will move on to hands-on exercises, which will teach you how to:
· retrieve networks for proteins or small-molecule compounds of interest
· retrieve networks for a disease or an arbitrary topics in PubMed
· layout and visually style the resulting networks
· import external data and map them onto a network
· perform enrichment analyses and visualize the results
· merge and compare networks
· select proteins by attributes
· identify functional modules through network clustering
If time permits, I will also try to demonstrate how Cytoscape can be used to address some of the challenges that came up during the debate.
A Complete Solution for Discovery Proteomics with DDA and DIA Support
PEAKS is a complete, vendor neutral software package for both data dependent (DDA) and data independent (DIA) acquisition shotgun proteomics data analysis. The software offers the full workflow from de novo sequencing, PEAKS DB (database searching)-based protein identification, PEAKS PTM (post translational modification) analysis, and SPIDER homology search in one package to fully maximize the identification and accuracy of sequencing results. Relative quantification by label free, isobaric labeling (e.g. TMT or iTRAQ), or metabolic labeling (e.g. SILAC) can also be performed. Intuitive result visualization tools are provided at every stage of analysis and analysis results can be exported.