## 14.2 Normalization of Footprints for Tn5 Bias

One major challenge with TF footprinting using ATAC-seq data is the insertion sequence bias of the Tn5 transposase which can lead to misclassification of TF footprints. To account for Tn5 insertion bias, ArchR identifies the k-mer (user-defined length, default length 6) sequences surrounding each Tn5 insertion site. To do this analysis, ArchR identifies single-base resolution Tn5 insertion sites for each pseudo-bulk, resizes these 1-bp sites to k-bp windows (-k/2 and + (k/2 - 1) bp from insertion), and then creates a k-mer frequency table using the `oligonucleotidefrequency(w=k, simplify.as="collapse")`

function from the `Biostrings`

package. ArchR then calculates the expected k-mers genome-wide using the same function with the `BSgenome`

-associated genome file. To calculate the insertion bias for a pseudo-bulk footprint, ArchR creates a k-mer frequency matrix that is represented as all possible k-mers across a window +/- N bp (user-defined, default 250 bp) from the motif center. Then, iterating over each motif site, ArchR fills in the positioned k-mers into the k-mer frequency matrix. This is then calculated for each motif position genome-wide. Using the sample’s k-mer frequency table, ArchR can then compute the expected Tn5 insertions by multiplying the k-mer position frequency table by the observed/expected Tn5 k-mer frequency.

All of this happens under the hood within the `plotFootprints()`

function.

### 14.2.1 Subtracting the Tn5 Bias

One normalization method subtracts the Tn5 bias from the footprinting signal. This normalization is performed by setting `normMethod = "Subtract"`

when calling `plotFootprints()`

.

```
plotFootprints(
seFoot = seFoot,
ArchRProj = projHeme5,
normMethod = "Subtract",
plotName = "Footprints-Subtract-Bias",
addDOC = FALSE,
smoothWindow = 5
)
```

## ArchR logging to : ArchRLogs/ArchR-plotFootprints-10b7a2038428-Date-2020-04-15_Time-11-16-59.log

## If there is an issue, please report to github with logFile!

## 2020-04-15 11:16:59 : Plotting Footprint : GATA1_383 (1 of 6), 0.007 mins elapsed.

## Applying smoothing window to footprint

## Normalizing by flanking regions

## NormMethod = Subtract

## 2020-04-15 11:17:02 : Plotting Footprint : CEBPA_155 (2 of 6), 0.065 mins elapsed.

## Applying smoothing window to footprint

## Normalizing by flanking regions

## NormMethod = Subtract

## 2020-04-15 11:17:05 : Plotting Footprint : EBF1_67 (3 of 6), 0.109 mins elapsed.

## Applying smoothing window to footprint

## Normalizing by flanking regions

## NormMethod = Subtract

## 2020-04-15 11:17:08 : Plotting Footprint : IRF4_632 (4 of 6), 0.155 mins elapsed.

## Applying smoothing window to footprint

## Normalizing by flanking regions

## NormMethod = Subtract

## 2020-04-15 11:17:11 : Plotting Footprint : TBX21_780 (5 of 6), 0.199 mins elapsed.

## Applying smoothing window to footprint

## Normalizing by flanking regions

## NormMethod = Subtract

## 2020-04-15 11:17:13 : Plotting Footprint : PAX5_709 (6 of 6), 0.245 mins elapsed.

## Applying smoothing window to footprint

## Normalizing by flanking regions

## NormMethod = Subtract

## ArchR logging successful to : ArchRLogs/ArchR-plotFootprints-10b7a2038428-Date-2020-04-15_Time-11-16-59.log

By default, these plots will be saved in the `outputDirectory`

of the `ArchRProject`

. If you requested to plot all motifs and returned this as a `ggplot`

object, this `ggplot`

object would be extremely large. An example of motif footprints from bias-subtracted analyses are shown below.

### 14.2.2 Dividing by the Tn5 Bias

A second strategy for normalization divides the footprinting signal by the Tn5 bias signal. This normalization is performed by setting `normMethod = "Divide"`

when calling `plotFootprints()`

.

```
plotFootprints(
seFoot = seFoot,
ArchRProj = projHeme5,
normMethod = "Divide",
plotName = "Footprints-Divide-Bias",
addDOC = FALSE,
smoothWindow = 5
)
```

## ArchR logging to : ArchRLogs/ArchR-plotFootprints-10b7a703225b0-Date-2020-04-15_Time-11-17-23.log

## If there is an issue, please report to github with logFile!

## 2020-04-15 11:17:23 : Plotting Footprint : GATA1_383 (1 of 6), 0.008 mins elapsed.

## Applying smoothing window to footprint

## Normalizing by flanking regions

## NormMethod = Divide

## 2020-04-15 11:17:27 : Plotting Footprint : CEBPA_155 (2 of 6), 0.073 mins elapsed.

## Applying smoothing window to footprint

## Normalizing by flanking regions

## NormMethod = Divide

## 2020-04-15 11:17:30 : Plotting Footprint : EBF1_67 (3 of 6), 0.13 mins elapsed.

## Applying smoothing window to footprint

## Normalizing by flanking regions

## NormMethod = Divide

## 2020-04-15 11:17:34 : Plotting Footprint : IRF4_632 (4 of 6), 0.189 mins elapsed.

## Applying smoothing window to footprint

## Normalizing by flanking regions

## NormMethod = Divide

## 2020-04-15 11:17:38 : Plotting Footprint : TBX21_780 (5 of 6), 0.248 mins elapsed.

## Applying smoothing window to footprint

## Normalizing by flanking regions

## NormMethod = Divide

## 2020-04-15 11:17:41 : Plotting Footprint : PAX5_709 (6 of 6), 0.307 mins elapsed.

## Applying smoothing window to footprint

## Normalizing by flanking regions

## NormMethod = Divide

## ArchR logging successful to : ArchRLogs/ArchR-plotFootprints-10b7a703225b0-Date-2020-04-15_Time-11-17-23.log

An example of motif footprints from bias-divided analyses are shown below.

### 14.2.3 Footprinting Without Normalization for Tn5 Bias

While we highly recommend normalizing footprints for Tn5 sequence insertion bias, it is possible to perform footprinting without normalization by setting `normMethod = "None"`

in the `plotFootprints()`

function.

```
plotFootprints(
seFoot = seFoot,
ArchRProj = projHeme5,
normMethod = "None",
plotName = "Footprints-No-Normalization",
addDOC = FALSE,
smoothWindow = 5
)
```

## ArchR logging to : ArchRLogs/ArchR-plotFootprints-10b7a22669572-Date-2020-04-15_Time-11-16-30.log

## If there is an issue, please report to github with logFile!

## 2020-04-15 11:16:31 : Plotting Footprint : GATA1_383 (1 of 6), 0.009 mins elapsed.

## Applying smoothing window to footprint

## Normalizing by flanking regions

## NormMethod = None

## 2020-04-15 11:16:35 : Plotting Footprint : CEBPA_155 (2 of 6), 0.077 mins elapsed.

## Applying smoothing window to footprint

## Normalizing by flanking regions

## NormMethod = None

## 2020-04-15 11:16:38 : Plotting Footprint : EBF1_67 (3 of 6), 0.125 mins elapsed.

## Applying smoothing window to footprint

## Normalizing by flanking regions

## NormMethod = None

## 2020-04-15 11:16:41 : Plotting Footprint : IRF4_632 (4 of 6), 0.173 mins elapsed.

## Applying smoothing window to footprint

## Normalizing by flanking regions

## NormMethod = None

## 2020-04-15 11:16:44 : Plotting Footprint : TBX21_780 (5 of 6), 0.221 mins elapsed.

## Applying smoothing window to footprint

## Normalizing by flanking regions

## NormMethod = None

## 2020-04-15 11:16:46 : Plotting Footprint : PAX5_709 (6 of 6), 0.27 mins elapsed.

## Applying smoothing window to footprint

## Normalizing by flanking regions

## NormMethod = None

## ArchR logging successful to : ArchRLogs/ArchR-plotFootprints-10b7a22669572-Date-2020-04-15_Time-11-16-30.log

An example of motif footprints without normalization are shown below.