16.1 Motif Footprinting

Importantly, the footprints generated from this tutorial data are not as clean as would be desired but this is because of the small size of the tutorial dataset. Footprints generated from larger datasets would be expected to show even less variation.

When footprinting, the first thing we need to do is obtain the positions of the relevant motifs. To do this, we call the getPositions() function. This function has an optional parameter called name which can accept the name of the peakAnnotation object from which we would like to obtain the positions. If name = NULL, then ArchR will use the first entry in the peakAnnotation slot. In the example shown below, we do not specify name and ArchR uses the first entry which is our CIS-BP motifs.

motifPositions <- getPositions(projHeme5)

This creates a GRangesList object where each TF motif is represented by a separate GRanges object.

motifPositions
## GRangesList object of length 870:
## $TFAP2B_1
## GRanges object with 17409 ranges and 1 metadata column:
##           seqnames              ranges strand |     score
##              <Rle>           <IRanges>  <Rle> | <numeric>
##       [1]     chr1       852468-852479      + |   8.17077
##       [2]     chr1       873916-873927      + |   8.31842
##       [3]     chr1       873916-873927      - |   8.31842
##       [4]     chr1       896671-896682      + |   9.95541
##       [5]     chr1       896671-896682      - |   8.91854
##       ...      ...                 ...    ... .       ...
##   [17405]     chrX 154004258-154004269      - |   9.05749
##   [17406]     chrX 154299568-154299579      + |   8.89420
##   [17407]     chrX 154664929-154664940      - |   8.15963
##   [17408]     chrX 154807684-154807695      + |   9.57083
##   [17409]     chrX 154807684-154807695      - |  10.60491
##   -------
##   seqinfo: 23 sequences from an unspecified genome; no seqlengths
## 
## ...
## <869 more elements>

We can subset this GRangesList to a few TF motifs that we are interested in. Because the SREBF1 TF comes up when we search for “EBF1”, we explicitly remove it from the downstream analyses below using the %ni% helper function which provides the opposite functionality of %in% from base R.

motifs <- c("GATA1", "CEBPA", "EBF1", "IRF4", "TBX21", "PAX5")
markerMotifs <- unlist(lapply(motifs, function(x) grep(x, names(motifPositions), value = TRUE)))
markerMotifs <- markerMotifs[markerMotifs %ni% "SREBF1_22"]
markerMotifs
## [1] "GATA1_383" "CEBPA_155" "EBF1_67"   "IRF4_632"  "TBX21_780" "PAX5_709"

To accurately profile TF footprints, a large number of reads are required. Therefore, cells are grouped to create pseudo-bulk ATAC-seq profiles that can be then used for TF footprinting. These pseudo-bulk profiles are stored as group coverage files which we originally created in a previous chapter to perform peak calling. If you haven’t already added group coverages to your ArchRProject, lets do that now.

if(is.null(projHeme5@projectMetadata$GroupCoverages$Clusters2)){
  projHeme5 <- addGroupCoverages(ArchRProj = projHeme5, groupBy = "Clusters2")
}

With group coverages calculated, we can now compute footprints for the subset of marker motifs that we previously selected using the getFootprints() function. Even though ArchR implements a highly optimized footprinting workflow, it is recommended to perform footprinting on a subset of motifs rather than all motifs. As such, we provide the subset of motifs to footprint via the positions parameter.

seFoot <- getFootprints(
  ArchRProj = projHeme5, 
  positions = motifPositions[markerMotifs], 
  groupBy = "Clusters2"
)
## ArchR logging to : ArchRLogs/ArchR-getFootprints-91455ab47-Date-2025-02-06_Time-02-39-53.799919.log
## If there is an issue, please report to github with logFile!
## 2025-02-06 02:39:54.353679 : Computing Kmer Bias Table, 0.009 mins elapsed.
## 2025-02-06 02:40:00.975188 : Finished Computing Kmer Tables, 0.11 mins elapsed.
## 2025-02-06 02:40:00.976127 : Computing Footprints, 0.12 mins elapsed.
## 2025-02-06 02:40:17.403599 : Computing Footprints Bias, 0.393 mins elapsed.
## 2025-02-06 02:40:32.376185 : Summarizing Footprints, 0.643 mins elapsed.

Once we have retrieved these footprints, we can plot them using the plotFootprints() function. This function can simultaneously normalize the footprints in various ways. This normalization and the actual plotting of the footprints is discussed in the next section.