11.1 Identifying Marker Peaks with ArchR

Often times, we are interested to know which peaks are unique to an individual cluster or a small group of clusters. We can do this in an unsupervised fashion in ArchR using the addMarkerFeatures() function in combination with useMatrix = "PeakMatrix".

First, lets remind ourselves of the cell types that we are working with in projHeme5 and their relative proportions.

#Our scRNA labels
table(projHeme5$Clusters2)

## B CD4.M CD4.N CLP Erythroid GMP Mono
## 439 678 1271 387 879 793 2632
## NK pDC PreB Progenitor
## 851 320 351 1650

Now, we are ready to identify marker peaks by calling the addMarkerFeatures() function with useMatrix = "PeakMatrix". Additionally, we tell ArchR to account for differences in data quality amongst the cell groups by setting the bias parameter to account for TSS enrichment and the number of unique fragments per cell.

markersPeaks <- getMarkerFeatures(
    ArchRProj = projHeme5, 
    useMatrix = "PeakMatrix", 
    groupBy = "Clusters2",
  bias = c("TSSEnrichment", "log10(nFrags)"),
  testMethod = "wilcoxon"
)

## ArchR logging to : ArchRLogs/ArchR-getMarkerFeatures-ff9f78e54080-Date-2020-04-15_Time-10-31-43.log
## If there is an issue, please report to github with logFile!
## MatrixClass = Sparse.Integer.Matrix
## 2020-04-15 10:31:44 : Matching Known Biases, 0.011 mins elapsed.
##
## ###########
## 2020-04-15 10:33:02 : Completed Pairwise Tests, 1.301 mins elapsed.
## ###########
## ArchR logging successful to : ArchRLogs/ArchR-getMarkerFeatures-ff9f78e54080-Date-2020-04-15_Time-10-31-43.log

The object returned by the getMarkerFeatures() function is a SummarizedExperiment that contains a few different assays.

markersPeaks

## class: SummarizedExperiment
## dim: 144009 11
## metadata(2): MatchInfo Params
## assays(6): Log2FC Mean … AUC MeanBGD
## rownames(144009): 1 2 … 144008 144009
## rowData names(4): seqnames idx start end
## colnames(11): B CD4.M … PreB Progenitor
## colData names(0):

We can use the getMarkers() function to retrieve particular slices of this SummarizedExperiment that we are interested in. The default behavior of this function is to return a list of DataFrame objects, one for each cell group.

markerList <- getMarkers(markersPeaks, cutOff = "FDR <= 0.01 & Log2FC >= 1")
markerList

## List of length 11
## names(11): B CD4.M CD4.N CLP Erythroid GMP Mono NK pDC PreB Progenitor

If we are interested in the marker peaks for a specific cell group, we can access this from the list via the $ accessor.

markerList$Erythroid

## DataFrame with 2267 rows and 7 columns
## seqnames idx start end Log2FC
##
## 6915 chr1 6915 110407005 110407505 6.13156087199941
## 89187 chr22 1265 30129827 30130327 4.31793618061602
## 2715 chr1 2715 27869127 27869627 8.81830883930008
## 9273 chr1 9273 164681433 164681933 4.09761463117515
## 46278 chr15 2875 74902688 74903188 5.94807465189627
## … … … … … …
## 124072 chr7 2640 47608032 47608532 1.57044726023651
## 13738 chr1 13738 248018421 248018921 2.18899218987721
## 90192 chr22 2270 39632393 39632893 2.09633755722991
## 30353 chr12 2227 47600919 47601419 1.98865307113263
## 51935 chr16 3926 70729551 70730051 2.33056750082649
## FDR MeanDiff
##
## 6915 8.5785377923846e-15 0.908051252811916
## 89187 1.00297518036045e-13 1.0458579648286
## 2715 3.96321274318867e-12 0.908093680025248
## 9273 3.96321274318867e-12 0.743482307090926
## 46278 3.96321274318867e-12 0.739990672777532
## … … …
## 124072 0.00939699723029238 0.345440715489147
## 13738 0.00940220652828868 0.214851028991864
## 90192 0.00960407719383025 0.270261700407864
## 30353 0.00978687988910092 0.348162927084217
## 51935 0.00982181746221214 0.327156785147329

Instead of a list of DataFrame objects, we can use getMarkers() to return a GRangesList object by setting returnGR = TRUE.

markerList <- getMarkers(markersPeaks, cutOff = "FDR <= 0.01 & Log2FC >= 1", returnGR = TRUE)
markerList

## GRangesList object of length 11:
## $B
## GRanges object with 594 ranges and 3 metadata columns:
## seqnames ranges strand | Log2FC
## |
## [1] chr2 232537191-232537691 * | 4.60006288742652
## [2] chr12 92566305-92566805 * | 4.3389825818619
## [3] chr3 13152070-13152570 * | 4.44200650760392
## [4] chr9 37409171-37409671 * | 3.32811859032542
## [5] chr1 160759469-160759969 * | 5.66666023794885
## … … … … . …
## [590] chr8 128222178-128222678 * | 6.1639699924098
## [591] chr9 93643862-93644362 * | 6.57677988412609
## [592] chrX 6656761-6657261 * | 6.4417508983755
## [593] chr14 81425876-81426376 * | 4.62521198459712
## [594] chr7 63765291-63765791 * | 4.10893943379989
## FDR MeanDiff
##
## [1] 1.61183300601365e-12 1.09207677080711
## [2] 9.91817474781581e-10 0.993976831608143
## [3] 4.25560916639498e-09 1.20028559994278
## [4] 1.45797719387195e-08 1.03421998926309
## [5] 1.58046323198917e-08 0.80285226332275
## … … …
## [590] 0.0098441599959607 0.20375624865997
## [591] 0.0098441599959607 0.27221052581318
## [592] 0.0098441599959607 0.247631445395118
## [593] 0.00994427516588697 0.432810696716752
## [594] 0.00998116335641915 0.326535661912967
## ——-
## seqinfo: 23 sequences from an unspecified genome; no seqlengths
##
## …
## <10 more elements>

This GRangesList object can similarly be subset to a GRanges object for a particular cell group using the $ accessor.

markerList$Erythroid

## GRanges object with 2267 ranges and 3 metadata columns:
## seqnames ranges strand | Log2FC
## |
## [1] chr1 110407005-110407505 * | 6.13156087199941
## [2] chr22 30129827-30130327 * | 4.31793618061602
## [3] chr1 27869127-27869627 * | 8.81830883930008
## [4] chr1 164681433-164681933 * | 4.09761463117515
## [5] chr15 74902688-74903188 * | 5.94807465189627
## … … … … . …
## [2263] chr7 47608032-47608532 * | 1.57044726023651
## [2264] chr1 248018421-248018921 * | 2.18899218987721
## [2265] chr22 39632393-39632893 * | 2.09633755722991
## [2266] chr12 47600919-47601419 * | 1.98865307113263
## [2267] chr16 70729551-70730051 * | 2.33056750082649
## FDR MeanDiff
##
## [1] 8.5785377923846e-15 0.908051252811916
## [2] 1.00297518036045e-13 1.0458579648286
## [3] 3.96321274318867e-12 0.908093680025248
## [4] 3.96321274318867e-12 0.743482307090926
## [5] 3.96321274318867e-12 0.739990672777532
## … … …
## [2263] 0.00939699723029238 0.345440715489147
## [2264] 0.00940220652828868 0.214851028991864
## [2265] 0.00960407719383025 0.270261700407864
## [2266] 0.00978687988910092 0.348162927084217
## [2267] 0.00982181746221214 0.327156785147329
## ——-
## seqinfo: 23 sequences from an unspecified genome; no seqlengths