ATAC-seq allows for the unbiased identification of TFs that exhibit large changes in chromatin accessibility at sites containing their DNA binding motifs. However, families of TFs (for ex. GATA factors) share similar features in their binding motifs when looking in aggregate through position weight matrices (PWMs).
This motif similarity makes it challenging to identify the specific TFs that might be driving observed changes in chromatin accessibility at their predicted binding sites. To circumvent this challenge, we have previously ATAC-seq and RNA-seq to identify TFs whose gene expression is positively correlated to changes in the accessibility of their corresponding motif. We term these TFs “positive regulators”. However, this analysis relies on matched gene expression data which may not be readily available in all experiments. To overcome this dependency, ArchR can identify TFs whose inferred gene scores are correlated to their chromVAR TF deviation z-scores. To achieve this, ArchR correlates chromVAR deviation z-scores of TF motifs with gene activity scores of TF genes from the low-overlapping cell aggregates. When using scRNA-seq integration with ArchR, gene expression of the TF can be used instead of inferred gene activity score.
15.4.1 Step 1. Identify Deviant TF Motifs
The first part of identifying positive TF regulators is identification of deviant TF motifs. We performed this analysis in a previous chapter, creating a MotifMatrix of chromVAR deviations and deviation z-scores for all motifs. We can obtain this data, averaged by clusters, by using the getGroupSE() function which returns a SummarizedExperiment.
## 2020-04-15 12:03:08 : Successfully Created Group Matrix, 0.133 mins elapsed.
Because this SummarizedExperiment object comes from the MotifMatrix is has two seqnames - “deviations” and “z” - corresponding to the raw deviations and deviation z-scores from chromVAR.
Then we can identify the maximum delta in z-score between all clusters. This will be helpful in stratifying motifs based on the degree of variation observed across clusters.
To identify TFs whose motif accessibility is correlated with with their own gene activity (either by gene score or gene expression), we use the correlateMatrices() function and provide the two matrices that we are interested in, in this case the GeneScoreMatrix and the MotifMatrix. As mentioned previously, these correlations are determined across many low-overlapping cell aggregates determined in the lower dimension space specified in the reducedDims parameter.
## ArchR logging to : ArchRLogs/ArchR-correlateMatrices-11a7038ad0163-Date-2020-04-15_Time-12-03-11.log
## If there is an issue, please report to github with logFile!
## When accessing features from a matrix of class Sparse.Assays.Matrix it requires 1 seqname!
## Continuing with first seqname ‘z’!
## If confused, try getFeatures(ArchRProj, ‘MotifMatrix’) to list out available seqnames for input!
## 2020-04-15 12:03:16 : Testing 825 Mappings!, 0.091 mins elapsed.
## 2020-04-15 12:03:16 : Computing KNN, 0.091 mins elapsed.
## 2020-04-15 12:03:16 : Identifying Non-Overlapping KNN pairs, 0.095 mins elapsed.
## 2020-04-15 12:03:19 : Identified 491 Groupings!, 0.138 mins elapsed.
## 2020-04-15 12:03:21 : Getting Group Matrix 1, 0.17 mins elapsed.
## 2020-04-15 12:03:33 : Getting Group Matrix 2, 0.365 mins elapsed.
## Some entries in groupMat2 are less than 0, continuing without Log2 Normalization.
## Most likely this assay is a deviations matrix.
## Getting Correlations…
## 2020-04-15 12:03:40 :
## Computing Correlation (250 of 825)
## Computing Correlation (500 of 825)
## Computing Correlation (750 of 825)
## ArchR logging successful to : ArchRLogs/ArchR-correlateMatrices-11a7038ad0163-Date-2020-04-15_Time-12-03-11.log
This function returns a DataFrame object that that contains the elements from each matrix and the correlation across the low-overlapping cell aggregates.
## ArchR logging to : ArchRLogs/ArchR-correlateMatrices-11a70642ae62f-Date-2020-04-15_Time-12-03-41.log
## If there is an issue, please report to github with logFile!
## When accessing features from a matrix of class Sparse.Assays.Matrix it requires 1 seqname!
## Continuing with first seqname ‘z’!
## If confused, try getFeatures(ArchRProj, ‘MotifMatrix’) to list out available seqnames for input!
## 2020-04-15 12:03:44 : Testing 798 Mappings!, 0.062 mins elapsed.
## 2020-04-15 12:03:44 : Computing KNN, 0.062 mins elapsed.
## 2020-04-15 12:03:45 : Identifying Non-Overlapping KNN pairs, 0.066 mins elapsed.
## 2020-04-15 12:03:47 : Identified 491 Groupings!, 0.108 mins elapsed.
## 2020-04-15 12:03:50 : Getting Group Matrix 1, 0.147 mins elapsed.
## 2020-04-15 12:04:03 : Getting Group Matrix 2, 0.368 mins elapsed.
## Some entries in groupMat2 are less than 0, continuing without Log2 Normalization.
## Most likely this assay is a deviations matrix.
## Getting Correlations…
## 2020-04-15 12:04:10 :
## Computing Correlation (250 of 798)
## Computing Correlation (500 of 798)
## Computing Correlation (750 of 798)
## ArchR logging successful to : ArchRLogs/ArchR-correlateMatrices-11a70642ae62f-Date-2020-04-15_Time-12-03-41.log
We can use all of this information to identify positive TF regulators. In the examples below, we consider positive regulators as those TFs whose correlation between motif and gene score (or gene expression) is greater than 0.5 with an adjusted p-value less than 0.01 and a maximum inter-cluster difference in deviation z-score that is in the top quartile.
We apply these selection criteria and do a little text juggling to isolate the TF names.