[1] Single-cell transcriptomics makes it possible to unravel heterogeneous cell populations, reconstruct cellular developmental pathways, and model transcriptional dynamics — all previously masked in bulk RNA sequencing.
[2] The development of high-throughput RNA sequencing (RNA-seq) and microarrays has made gene expression analysis a routine.
Higher throughput and speed allow researchers to frequently characterize the expression profiles of populations of thousands of cells.
The data from bulk assays has led to identifying genes differentially expressed in distinct cell populations, and biomarker discovery.
[3] These studies are limited as they provide measurements for whole tissues and, as a result, show an average expression profile for all the constituent cells.
Lastly, when your goal is to study cellular progression through differentiation, average expression profiles can only order cells by time rather than by developmental stage.
Still, many new computational approaches have had to be designed for this data type to facilitate a complete and detailed study of single-cell expression profiles.
[5] There is so far no standardized technique to generate single-cell data: all methods must include cell isolation from the population, lysate formation, amplification through reverse transcription and quantification of expression levels.
The most commonly used house keeping genes include GAPDH and α-actin, although the reliability of normalisation through this process is questionable as there is evidence that the level of expression can vary significantly.
Evidence suggests that this is not the case given fundamental differences in size and features, such as the lack of a polyadenylated tail in spike-ins and therefore shorter length.
[15] Insights based on single-cell data analysis assume that the input is a matrix of normalised gene expression counts, generated by the approaches outlined above, and can provide opportunities that are not obtainable by bulk.
Three main insights provided:[18] The techniques outlined have been designed to help visualise and explore patterns in the data in order to facilitate the revelation of these three features.
Dimensionality reduction is frequently used before clustering as cells in high dimensions can wrongly appear to be close due to distance metrics behaving non-intuitively.
Specialised methods have been designed for single-cell data that considers single cell features such as technical dropouts and shape of the distribution e.g. Bimodal vs.
The trajectory, therefore, enables the inference of gene expression dynamics and the ordering of cells by their progression through differentiation or response to external stimuli.
The method relies on the assumptions that the cells follow the same path through the process of interest and that their transcriptional state correlates to their progression.
More than 50 methods for pseudo-temporal ordering have been developed, and each has its own requirements for prior information (such as starting cells or time course data), detectable topologies, and methodology.
[33] Another class of methods (e.g., scDREAMER[34]) uses deep generative models such as variational autoencoders for learning batch-invariant latent cellular representations which can be used for downstream tasks such as cell type clustering, denoising of single-cell gene expression vectors and trajectory inference.