14.2 Normalization of Footprints for Tn5 Bias

One major challenge with TF footprinting using ATAC-seq data is the insertion sequence bias of the Tn5 transposase which can lead to misclassification of TF footprints. To account for Tn5 insertion bias, ArchR identifies the k-mer (user-defined length, default length 6) sequences surrounding each Tn5 insertion site. To do this analysis, ArchR identifies single-base resolution Tn5 insertion sites for each pseudo-bulk, resizes these 1-bp sites to k-bp windows (-k/2 and + (k/2 - 1) bp from insertion), and then creates a k-mer frequency table using the oligonucleotidefrequency(w=k, simplify.as="collapse") function from the Biostrings package. ArchR then calculates the expected k-mers genome-wide using the same function with the BSgenome-associated genome file. To calculate the insertion bias for a pseudo-bulk footprint, ArchR creates a k-mer frequency matrix that is represented as all possible k-mers across a window +/- N bp (user-defined, default 250 bp) from the motif center. Then, iterating over each motif site, ArchR fills in the positioned k-mers into the k-mer frequency matrix. This is then calculated for each motif position genome-wide. Using the sample’s k-mer frequency table, ArchR can then compute the expected Tn5 insertions by multiplying the k-mer position frequency table by the observed/expected Tn5 k-mer frequency.

All of this happens under the hood within the plotFootprints() function.

14.2.1 Subtracting the Tn5 Bias

One normalization method subtracts the Tn5 bias from the footprinting signal. This normalization is performed by setting normMethod = "Subtract" when calling plotFootprints().

plotFootprints(
  seFoot = seFoot,
  ArchRProj = projHeme5, 
  normMethod = "Subtract",
  plotName = "Footprints-Subtract-Bias",
  addDOC = FALSE,
  smoothWindow = 5
)

## ArchR logging to : ArchRLogs/ArchR-plotFootprints-10b7a2038428-Date-2020-04-15_Time-11-16-59.log
## If there is an issue, please report to github with logFile!
## 2020-04-15 11:16:59 : Plotting Footprint : GATA1_383 (1 of 6), 0.007 mins elapsed.
## Applying smoothing window to footprint
## Normalizing by flanking regions
## NormMethod = Subtract
## 2020-04-15 11:17:02 : Plotting Footprint : CEBPA_155 (2 of 6), 0.065 mins elapsed.
## Applying smoothing window to footprint
## Normalizing by flanking regions
## NormMethod = Subtract
## 2020-04-15 11:17:05 : Plotting Footprint : EBF1_67 (3 of 6), 0.109 mins elapsed.
## Applying smoothing window to footprint
## Normalizing by flanking regions
## NormMethod = Subtract
## 2020-04-15 11:17:08 : Plotting Footprint : IRF4_632 (4 of 6), 0.155 mins elapsed.
## Applying smoothing window to footprint
## Normalizing by flanking regions
## NormMethod = Subtract
## 2020-04-15 11:17:11 : Plotting Footprint : TBX21_780 (5 of 6), 0.199 mins elapsed.
## Applying smoothing window to footprint
## Normalizing by flanking regions
## NormMethod = Subtract
## 2020-04-15 11:17:13 : Plotting Footprint : PAX5_709 (6 of 6), 0.245 mins elapsed.
## Applying smoothing window to footprint
## Normalizing by flanking regions
## NormMethod = Subtract
## ArchR logging successful to : ArchRLogs/ArchR-plotFootprints-10b7a2038428-Date-2020-04-15_Time-11-16-59.log

By default, these plots will be saved in the outputDirectory of the ArchRProject. If you requested to plot all motifs and returned this as a ggplot object, this ggplot object would be extremely large. An example of motif footprints from bias-subtracted analyses are shown below.

14.2.2 Dividing by the Tn5 Bias

A second strategy for normalization divides the footprinting signal by the Tn5 bias signal. This normalization is performed by setting normMethod = "Divide" when calling plotFootprints().

plotFootprints(
  seFoot = seFoot,
  ArchRProj = projHeme5, 
  normMethod = "Divide",
  plotName = "Footprints-Divide-Bias",
  addDOC = FALSE,
  smoothWindow = 5
)

## ArchR logging to : ArchRLogs/ArchR-plotFootprints-10b7a703225b0-Date-2020-04-15_Time-11-17-23.log
## If there is an issue, please report to github with logFile!
## 2020-04-15 11:17:23 : Plotting Footprint : GATA1_383 (1 of 6), 0.008 mins elapsed.
## Applying smoothing window to footprint
## Normalizing by flanking regions
## NormMethod = Divide
## 2020-04-15 11:17:27 : Plotting Footprint : CEBPA_155 (2 of 6), 0.073 mins elapsed.
## Applying smoothing window to footprint
## Normalizing by flanking regions
## NormMethod = Divide
## 2020-04-15 11:17:30 : Plotting Footprint : EBF1_67 (3 of 6), 0.13 mins elapsed.
## Applying smoothing window to footprint
## Normalizing by flanking regions
## NormMethod = Divide
## 2020-04-15 11:17:34 : Plotting Footprint : IRF4_632 (4 of 6), 0.189 mins elapsed.
## Applying smoothing window to footprint
## Normalizing by flanking regions
## NormMethod = Divide
## 2020-04-15 11:17:38 : Plotting Footprint : TBX21_780 (5 of 6), 0.248 mins elapsed.
## Applying smoothing window to footprint
## Normalizing by flanking regions
## NormMethod = Divide
## 2020-04-15 11:17:41 : Plotting Footprint : PAX5_709 (6 of 6), 0.307 mins elapsed.
## Applying smoothing window to footprint
## Normalizing by flanking regions
## NormMethod = Divide
## ArchR logging successful to : ArchRLogs/ArchR-plotFootprints-10b7a703225b0-Date-2020-04-15_Time-11-17-23.log

An example of motif footprints from bias-divided analyses are shown below.

14.2.3 Footprinting Without Normalization for Tn5 Bias

While we highly recommend normalizing footprints for Tn5 sequence insertion bias, it is possible to perform footprinting without normalization by setting normMethod = "None" in the plotFootprints() function.

plotFootprints(
  seFoot = seFoot,
  ArchRProj = projHeme5, 
  normMethod = "None",
  plotName = "Footprints-No-Normalization",
  addDOC = FALSE,
  smoothWindow = 5
)

## ArchR logging to : ArchRLogs/ArchR-plotFootprints-10b7a22669572-Date-2020-04-15_Time-11-16-30.log
## If there is an issue, please report to github with logFile!
## 2020-04-15 11:16:31 : Plotting Footprint : GATA1_383 (1 of 6), 0.009 mins elapsed.
## Applying smoothing window to footprint
## Normalizing by flanking regions
## NormMethod = None
## 2020-04-15 11:16:35 : Plotting Footprint : CEBPA_155 (2 of 6), 0.077 mins elapsed.
## Applying smoothing window to footprint
## Normalizing by flanking regions
## NormMethod = None
## 2020-04-15 11:16:38 : Plotting Footprint : EBF1_67 (3 of 6), 0.125 mins elapsed.
## Applying smoothing window to footprint
## Normalizing by flanking regions
## NormMethod = None
## 2020-04-15 11:16:41 : Plotting Footprint : IRF4_632 (4 of 6), 0.173 mins elapsed.
## Applying smoothing window to footprint
## Normalizing by flanking regions
## NormMethod = None
## 2020-04-15 11:16:44 : Plotting Footprint : TBX21_780 (5 of 6), 0.221 mins elapsed.
## Applying smoothing window to footprint
## Normalizing by flanking regions
## NormMethod = None
## 2020-04-15 11:16:46 : Plotting Footprint : PAX5_709 (6 of 6), 0.27 mins elapsed.
## Applying smoothing window to footprint
## Normalizing by flanking regions
## NormMethod = None
## ArchR logging successful to : ArchRLogs/ArchR-plotFootprints-10b7a22669572-Date-2020-04-15_Time-11-16-30.log

An example of motif footprints without normalization are shown below.