15.4 Identification of Positive TF-Regulators
ATAC-seq allows for the unbiased identification of TFs that exhibit large changes in chromatin accessibility at sites containing their DNA binding motifs. However, families of TFs (for ex. GATA factors) share similar features in their binding motifs when looking in aggregate through position weight matrices (PWMs).
This motif similarity makes it challenging to identify the specific TFs that might be driving observed changes in chromatin accessibility at their predicted binding sites. To circumvent this challenge, we have previously ATAC-seq and RNA-seq to identify TFs whose gene expression is positively correlated to changes in the accessibility of their corresponding motif. We term these TFs “positive regulators”. However, this analysis relies on matched gene expression data which may not be readily available in all experiments. To overcome this dependency, ArchR can identify TFs whose inferred gene scores are correlated to their chromVAR TF deviation z-scores. To achieve this, ArchR correlates chromVAR deviation z-scores of TF motifs with gene activity scores of TF genes from the low-overlapping cell aggregates. When using scRNA-seq integration with ArchR, gene expression of the TF can be used instead of inferred gene activity score.
15.4.1 Step 1. Identify Deviant TF Motifs
The first part of identifying positive TF regulators is identification of deviant TF motifs. We performed this analysis in a previous chapter, creating a
MotifMatrix of chromVAR deviations and deviation z-scores for all motifs. We can obtain this data, averaged by clusters, by using the
getGroupSE() function which returns a
## 2020-04-15 12:03:08 : Successfully Created Group Matrix, 0.133 mins elapsed.
SummarizedExperiment object comes from the
MotifMatrix is has two seqnames - “deviations” and “z” - corresponding to the raw deviations and deviation z-scores from chromVAR.
## class: SummarizedExperiment
## dim: 1740 11
## assays(1): MotifMatrix
## rownames(1740): f1 f2 … f1739 f1740
## rowData names(3): seqnames idx name
## colnames(11): B CD4.M … PreB Progenitor
## colData names(16): TSSEnrichment ReadsInTSS … FRIP nCells
We can subset this
SummarizedExperiment to just the deviation z-scores.
Then we can identify the maximum delta in z-score between all clusters. This will be helpful in stratifying motifs based on the degree of variation observed across clusters.