Flow cytometry bioinformatics

Flow cytometry and related methods allow the quantification of multiple independent biomarkers on large numbers of single cells.

It is also possible to characterize data in more comprehensive ways, such as the density-guided binary space partitioning technique known as probability binning, or by combinatorial gating.

[9] The rapid increase in the dimensionality of flow cytometry data, coupled with the development of high-throughput robotic platforms capable of assaying hundreds to thousands of samples automatically have created a need for improved computational analysis methods.

A simplified, if not strictly accurate, way of considering flow cytometry data is as a matrix of M measurements times N cells where each element corresponds to the amounts of molecules.

One approach is to visualize summary statistics, such as the empirical distribution functions of single dimensions of technical or biological replicates to ensure they are the similar.

[23] With this method, higher standard deviation can indicate outliers, although this is a relative measure as the absolute value depends partly on the number of bins.

Normalization methods to remove technical variance, frequently derived from image registration techniques, are thus a critical step in many flow cytometry analyses.

[24] The complexity of raw flow cytometry data (dozens of measurements for thousands to millions of cells) makes answering questions directly using statistical tests or supervised learning difficult.

Thus, a critical step in the analysis of flow cytometric data is to reduce this complexity to something more tractable while establishing common features across samples.

The regions on these plots can be sequentially separated, based on fluorescence intensity, by creating a series of subset extractions, termed "gates".

These gates can be produced using software, e.g. FlowJo,[28] FCS Express,[29] WinMDI,[30] CytoPaint (aka Paint-A-Gate),[31] VenturiOne, Cellcion, CellQuest Pro, Cytospec,[32] Kaluza.

In datasets with a low number of dimensions and limited cross-sample technical and biological variability (e.g., clinical laboratories), manual analysis of specific cell populations can produce effective and reproducible results.

[36] To address this issue, principal component analysis has been used to summarize the high-dimensional datasets using a combination of markers that maximizes the variance of all data points.

Density-based down-sampling and clustering was used to better represent rare populations and control the time and memory complexity of the minimum spanning tree construction process.

The FlowCAP (Flow Cytometry: Critical Assessment of Population Identification Methods) project, with active participation from most academic groups with research efforts in the area, is providing a way to objectively cross-compare state-of-the-art automated analysis approaches.

[54] After identification of the cell population of interest, a cross sample analysis can be performed to identify phenotypical or functional variations that are correlated with an external variable (e.g., a clinical outcome).

flowType and RchyOptimyx (as discussed above) expand this technique by adding the ability of exploring the impact of independent markers on the overall correlation with the external outcome.

ISAC is considering replacing FCS with a flow cytometry specific version of the Network Common Data Form (netCDF) file format.

[66] It captures relations among data, metadata, analysis files and other components, and includes support for audit trails, versioning and digital signatures.

The lack of gating interoperability has traditionally been a bottleneck preventing reproducibility of flow cytometry data analysis and the usage of multiple analytical tools.

To address this shortcoming, ISAC developed Gating-ML, an XML-based mechanism to formally describe gates and related data (scale) transformations.

[10] The draft recommendation version of Gating-ML was approved by ISAC in 2008 and it is partially supported by tools like FlowJo, the flowUtils, CytoML libraries in R/BioConductor, and FlowRepository.

This new version offers slightly less flexibility in terms of the power of gating description; however, it is also significantly easier to implement in software tools.

Although it was originally designed for the field of flow cytometry, it is applicable in any domain that needs to capture either fuzzy or unambiguous classifications of virtually any kinds of objects.

AutoGate[68] performs compensation, gating, preview of clusters, exhaustive projection pursuit (EPP), multi-dimension scaling and phenogram, produces a visual dendogram to express HiD readiness.

A web-based interface provides easy access to these tools and allows the creation of automated analysis pipelines enabling reproducible research.

Firstly, CytoBank, which is a complete web-based flow cytometry data storage and analysis platform, has been made available to the public in a limited form.

A representative of this class of datasets is a study which includes analysis of two bone marrow samples using more than 30 surface or intracellular markers under a wide range of different stimulations.

[86] This was echoed by another group of Pacific Biosciences and Stanford University researchers, who suggested that cloud computing could enable centralized, standardized, high-throughput analysis of flow cytometry experiments.

[87] This article was adapted from the following source under a CC BY 4.0 license (2013) (reviewer reports): Kieran O'Neill; Nima Aghaeepour; Josef Spidlen; Ryan Brinkman (5 December 2013).