Flow and mass cytometry analysis:
the practical summary
Updated the 7th of March 2021
Mass cytometry is a powerful technique to investigate the phenotype of cells by detecting the presence of more than 40 intra and extracellular cell markers. It is a great exploratory tool that we aim to explain here:
👉🏽We created a training center for flow and mass cytometry users who would like to start analysing their own data with the free software R and stop using manual gating strategies. It is suitable for flow, mass and spectral cytometry.
👉🏽We can also provide an analysis service for your mass cytometry data. You will receive a full customised report from us within one day.
❓Still have a question about our services? You can simply book some time with us, we will try to help you.
A workflow for mass and flow cytometry data analysis
Mass cytometry analysis is the interpretation of the FCS files acquired by mass cytometry or CyTOF and can be divided in pre-processing, clustering analysis, dimensionality reduction and visualisation. There are several ways to analyse your mass cytometry data and we present here a common and useful workflow for an efficient visualisation of cytometry data. This workflow can also be applied to flow cytometry data.
Pre-processing of the FCS files
Mass cytometry data needs to be cleaned before you can analyse it. Depending on how you stained and acquired your samples, it might be necessary to:
- Debarcode your samples
- Use the compensation matrix to remove spillover between channels. This is not a compulsory steps and first publications on CyTOF were not compensated. This step improves sample quality.
- Apply arcsinh(5) transformation of all your samples before proceeding to further analysis. This step is explained in our training course.
First ensure that your mass cytometry process was successfully performed and you obtained a consistent staining through all your samples.
There are several ways to ensure a better reproducibility of your data, like freezing several aliquotes of antibody mix or avoiding waiting times between your samples so the voltage of the detector stays within the same range. Some tools are available online to remove the batch effects between several experiments.
Another method to check your antibody staining quality is the Average Overlap Frequency, measuring the separation between the negative and positive peak.
✏️ In a nutshell:
Apply compensation matrix (optional)
Pre-process your FCS files with arcsinh5 transformation
Perform a quality check by running the Average Overlap Frequency algorithm.
Cells are phenotypically analysed with more than 40 markers by mass cytometry. A pairwise combination of 40 markers would results in 780 possible pairwise combinations. This is of course not achievable and would not grab the global feature of all the markers together. Manual gating is therefore not possible.
Clustering analysis is the best alternative to manual gating strategy and remove the user biais to automatically define subsets of cells sharing a similar phenotype. The computer itself defines the different cell populations present in all the samples. A subset or a cluster is a group of cells sharing a similar phenotype.
There are many clustering analysis tools available and making your choice can be hard. The three most popular ones, according to a recent publication are SPADE, PhenoGraph and FlowSOM. However, SPADE is less and less used. DensVM has been more cited than used. FlowSOM is more and more present and can be used for flow or mass cytometry analysis. It presents a very efficient clustering strategy. In our training, we showcase a workflow to check the differences between those different algorithms among others like ClusterX for example and how to see the differences at a glance.
If FlowSOM and PhenoGraph can be directly applied on your cytometry data, it is not the case with DensVM, for which you need to proceed to a dimensionality reduction step beforehand.
Cells are phenotypically analysed with more than 40 markers by mass cytometry. To allow representation of all the samples and to visualise their differences, we would need to represent the samples in a 40-dimensional space, which is not possible. Dimensional reduction allows visualisation of the parameters studied in mass cytometry by reducing it mainly to two other parameters, enabling a two-dimensional representation. These two other parameters can be seen as a summary of the 40 studied parameters.
In mass cytometry, there are mainly 3 different dimensionality reduction techniques.
Historically, PCA analysis has been performed. This has been then outperformed by tSNE (a paper cited more than 17 000 times!), preserving local similarities whereas PCA is better at preserving large distances. This means that PCA preserves global similarities and would not be able to highlight differences between quite different samples. Let's take a simple example and assume you would like to compare the Belgium, Dutch, Italian, Canadian and United States citizen based on many parameters including spoken langage, food habits, location of the country etc... PCA would probably gather the Dutch, Italian and Belgian together against the Canadian and US citizen. This is a rough comparison taking only global differences: European versus American style. t-SNE however, would make the difference between the Italian and the Dutch-speaking citizens (Dutch and Belgian being quite similar regarding italian lifestyle). In another word, t-SNE preserves local similarities.
t-SNE has been developed by a Dutch scientist from Tilburg University and is currently a Research Director at Facebook AI Research (FAIR), leading the FAIR’s New York site.
These three algorithms shared a similar purpose: enable the user to see patterns in their samples. On our example here, we can conclude that Canadian and the US shared similar phenotype (they are Americans) and the Dutch, Italian, and Belgian shared another one (they are Europeans). Now let's imagine one flag is one cell, you can then define populations based on their location in the t-SNE map.
Most of these tools are available in R. t-SNE has been also improved to another level: HSNE for Hierarchical A-t-SNE (accelerated t-SNE). You can define clusters in Cytosplore, based on this algorithm. After dimensionality reduction step, it is also possible to perform the clustering analysis DensVM.
✏️ In a nutshell:
The most historically used dimensionality reduction algorithm was PCA, which has been replaced over the time by t-SNE. Nowadays, other alternatives has been developed like HSNE (Cytosplore) or UMAP.
Visualisation of mass cytometry data
All the previous tools are great but one step is missing to interpret your mass cytometry data. How to visualise the results generated so far?
A package, Cytofast, has been developed, enabling you to draw in no time several heatmaps which allow you to visualise your flow or mass cytometry data in a qualitative (phenotype) and quantitative (abundance) point of view. The output can be divided in three different parts.
On our example, we present the heatmap of the identified regulatory CD4+ T cells. The top blue-to-red heatmap represents the expression of different cell surface markers (per row) of the different subsets (per column). The blue or red color representing respectively the absence or the presence of the marker.
The middle green-to-purple heatmap represents the abundance of each subset (represented by a column) in every sample (represented by a row). For example, on average, "cluster Treg1" is more abundant in Group 1 than Group 2, 3 and 4. For example, a purple square on the left last row means that sample S32 is highly abundant in "Cluster Treg1". The dendrogram on the left gives an idea on how your samples cluster together.
Finally the bottom part represents the abundance of each subset per group in a bar graph.
✏️ In a nutshell:
To visualise your flow or mass cytometry data, Cytofast is an easy tool to rapidly identify immunological patterns.
This visual part gives an idea of the mass cytometry analysed. But it does not inform on any statistical significance of the represented clusters. A possible way to perform statistics on mass cytometry data could be to perform a Dunett's test. This test is very common in medical experiment, and is performed by computing a Student's t-test. Some famous papers use Dunnett's multiple comparison.
Another powerful test is the globaltest. This test was originally used in genetics and is also available in mass or flow cytometry.
✏️ In a nutshell:
Statistics can be done in R by using the globaltest or conventional statistical tests by applying corrections for the multiple testing. All the tools explained here are meant to be used in R, a software you can learn by following our training.
I am working on flow cytometry, is that really different to analyse my data compared to mass cytometry?
✏️ In a nutshell:
Manual gating strategy should be preferred to check the presence or the absence of an already known population. Manual gating should be only performed to validate previous findings but cannot be used in an exploratory manner.
I have flow cytometry data, what should I do to analyse it like mass cytometry data?
Flow cytometry data can be analysed similarly to mass cytometry data. The only difference relies on the pre-processing. While usually mass cytometry data are always arcsinh5 transformed, this is rarely the case in flow cytometry data. A workflow exists and is explained in our online training to pre-process flow cytometry data and make it looking like mass cytometry data. Once each marker has been specifically transformed, the same process as described above can be performed.
✏️ In a nutshell:
Transform your mass cytometry by arcsinh5
Transform your flow cytometry data by a specific arcsinh, depending on the marker intensity.
I have completed my mass cytometry analysis and I found an interesting subset, how to check it by flow cytometry?
If you think you found an interesting subset by mass cytometry and you would like to confirm it by flow cytometry data, you can use HyperGate, which will define the best gating strategy to isolate your population of interest. The author is the same as the UMAP implementation in mass cytometry.
✏️ In a nutshell:
To specifically investigate a cell subset, HyperGate will define the best gating strategy.
The 6 reasons why traditional gating strategies are not enough to analyse your cytometry data (and no: some Plug-Ins in FlowJo won't make it up).
1. You will discover new subsets
If you define immune populations by manual gating strategies, you will always find the same phenotype of cells and the same clusters. If you subtract the total population to the sum of all the gated cells, it will result into 'ungated' cells. These cells represent the uncharted territory you are missing in your data and voluntary ignoring. With unsupervised gating strategy, there are no 'ungated' cells and you identify every single cell of your samples.
That's how, today, many scientists can claim finding new interesting subsets in their field: they released themselves from the fixed path and the pre-made gating strategies.
2. The FlowJo plug-ins: can they really make it?
FlowJo added new functions in their software, like t-SNE, UMAP or others. If this can cosmetically have an added values, these plug-ins are not the core business of such software. They cannot replace a complete unsupervised method established in R. They present significant drawbacks. Their algorithm needs to downsample your data, sometimes removing 90% of the number of cells. Their tools are computationally demanding, and in fine you won't be able to draw clinical or experimental patterns from it. There are no suitable visualisation solutions offered if you are running a multi-color panel like 20 colors (or more). Moreover, these plug-ins are all an adaptation of free algorithms available freely in R. So you better go to the source, free of charge.
3. You will save money (for your project and yourself)
Save your money from your research projects and learn R. You can analyse your flow data in a more efficient, cheaper and easier way than the traditional gating, which is very costly. Flow cytometry analysis is not limited to FlowJo and you can try alternatives. They are free!
4. You will save time and be more efficient in your analysis.
It's a great investment to learn how to analyse your data by yourself and be able to draw any graphs you would like from it. Moreover, once you have your data, you just need to write your script (or understand an already written script) and apply it for your data. You can always improve your pipeline. You decide how to analyse and how to represent your data. If you want a punctual analysis, you can also require our service.
5. You will always be up-to-date in cytometry analysis (and other fields)
By relying on FlowJo plug-ins, you are waiting for others to transpose already existing tools to be implemented in FlowJo: you are therefore behind the novelty. You will stay at the top of research by directly going to the source: R. Moreover, R is not only used for flow cytometry analysis but for almost every data analysis. In our field, you can assess epitope prediction, RNA-sequencing, single-cell RNA-seq (mainly via the Seurat package) and much more. Find more at Bioconductor.
6. You will find patterns in your flow data in no time
What if after getting out from the flow cytometry unit or the CyTOF machine, you would just need to click on one button to analyse your data? This is now possible in R. No limited number of users, no limited period of time. Learn how by visiting our Training Centre.
✏️ In a nutshell:
To analyse your mass cytometry data, avoid gating strategies, allow yourself to find new subsets and automatise your process. Save money, time and take the most out of your data.
Conclusion on flow and mass cytometry data analysis
You now know everything about flow and mass cytometry analysis and you can even apply it to your flow data. You can train yourself with our own training and request an analysis from us. You will receive a report within 2 days, depending on the workload.
🚧 Our website is continuously evolving, visit us regularly!
Meet the team
Madeleine was responsible for the administration
(MD PhD candidate)
Guillaume wrote the content of the website and created the online training
Maurits checked the website and gave his voice to the online training
Jana checked the website and the videos and gave her voice to the online training
Carine, Principal Investigator
I needed some mass cytometry analysis in a punctual way. Being assisted by VisuaLyte helped me getting directly what I wanted to highlight in the data I generated
I am using mass cytometry for a time course and group comparison. I could see immunological patterns in my data rapidly.
John, PhD candidate
VisuaLyte helped me identifying clusters and significant patterns I was looking for in my mass cytometry data.