3.6 Filtering Doublets from an ArchRProject

After we have added information on the predicted doublets using addDoubletScores(), we can remove these predicted doublets using filterDoublets(). One of the key elements of this filtering step is the filterRatio which is the maximum ratio of predicted doublets to filter based on the number of pass-filter cells. For example, if there are 5000 cells, the maximum number of filtered predicted doublets would be filterRatio * 5000^2 / (100000) (which simplifies to filterRatio * 5000 * 0.05). This filterRatio allows you to apply a consistent filter across multiple different samples that may have different percentages of doublets because they were run with different cell loading concentrations. The higher the filterRatio, the greater the number of cells potentially removed as doublets.

First, we filter the doublets. We save this as a new ArchRProject for the purposes of this stepwise tutorial but you can always overwrite your original ArchRProject object.

projHeme2 <- filterDoublets(projHeme1)

## Filtering 410 cells from ArchRProject!
## scATAC_BMMC_R1 : 243 of 4932 (4.9%)
## scATAC_CD34_BMMC_R1 : 107 of 3275 (3.3%)
## scATAC_PBMC_R1 : 60 of 2454 (2.4%)

Previously, we saw that projHeme1 had 10,661 cells. Now, we see that projHeme2 has 10,251 cells, indicating that 410 cells (3.85%) were removed by doublet filtration as indicated above.

projHeme2

## class: ArchRProject
## outputDirectory: /oak/stanford/groups/howchang/users/jgranja/ArchRTutorial/ArchRBook/BookOutput4/HemeTutorial
## samples(3): scATAC_BMMC_R1 scATAC_CD34_BMMC_R1 scATAC_PBMC_R1
## sampleColData names(1): ArrowFiles
## cellColData names(13): Sample TSSEnrichment … bioNames bioNames2
## numberOfCells(1): 10251
## medianTSS(1): 16.856
## medianFrags(1): 2991

If you wanted to filter more cells from the ArchR Project, you would use a higher filterRatio. To see additional arguments that can be tweaked, try ?filterDoublets.

projHemeTmp <- filterDoublets(projHeme1, filterRatio = 1.5)

## Filtering 614 cells from ArchRProject!
## scATAC_BMMC_R1 : 364 of 4932 (7.4%)
## scATAC_CD34_BMMC_R1 : 160 of 3275 (4.9%)
## scATAC_PBMC_R1 : 90 of 2454 (3.7%)

Since projHemeTmp was only created for illustrative purposes, we remove it from our R session.

rm(projHemeTmp)