Chapter 8 Defining Cluster Identity with scRNA-seq

In addition to allowing cluster identity assignment with gene scores, ArchR also enables integration with scRNA-seq. This can help with cluster identity assignment because you can directly use clusters called in scRNA-seq space or use the gene expression measurements after integration. The way this integration works is by directly aligning cells from scATAC-seq with cells from scRNA-seq by comparing the scATAC-seq gene score matrix with the scRNA-seq gene expression matrix. Under the hood, this alignment is performed using the FindTransferAnchors() function from the Seurat package which allows you to align data across two datasets. However, to appropriately scale this procedure for hundreds of thousands of cells ArchR provides a parallelization of this procedure by dividing the total cells into smaller groups of cells and performing separate alignments.

Effectively, for each cell in the scATAC-seq data, this integration process finds the cell in the scRNA-seq data that looks most similar and assigns the gene expression data from that scRNA-seq cell to the scATAC-seq cell. At the end, each cell in scATAC-seq space has been assigned a gene expression signature which can be used for many downstream analyses. This chapter illustrates how to use this information for assigning clusters while later chapters show how to use the linked scRNA-seq data for more complex analyses such as identifying predicted cis-regulatory elements. We believe these integrative analyses will become increasingly relevant as multi-omic single-cell profiling becomes commercially available. In the meantime, using publicly available scRNA-seq data in matched cell types or scRNA-seq data that you have generated on your sample of interest can bolster the scATAC-seq analyses performed in ArchR.