This function will get insertions from coverage files, call peaks, and merge peaks to get a "Union Reproducible Peak Set".

addReproduciblePeakSet(
  ArchRProj = NULL,
  groupBy = "Clusters",
  peakMethod = "Macs2",
  reproducibility = "2",
  peaksPerCell = 500,
  maxPeaks = 150000,
  minCells = 25,
  excludeChr = c("chrM", "chrY"),
  pathToMacs2 = if (tolower(peakMethod) == "macs2") findMacs2() else NULL,
  genomeSize = NULL,
  shift = -75,
  extsize = 150,
  method = if (tolower(peakMethod) == "macs2") "q" else "p",
  cutOff = 0.1,
  additionalParams = "--nomodel --nolambda",
  extendSummits = 250,
  promoterRegion = c(2000, 100),
  genomeAnnotation = getGenomeAnnotation(ArchRProj),
  geneAnnotation = getGeneAnnotation(ArchRProj),
  plot = TRUE,
  threads = getArchRThreads(),
  parallelParam = NULL,
  force = FALSE,
  verbose = TRUE,
  logFile = createLogFile("addReproduciblePeakSet"),
  ...
)

Arguments

ArchRProj

An ArchRProject object.

groupBy

The name of the column in cellColData to use for grouping cells together for peak calling.

peakMethod

The name of peak calling method to be used. Options include "Macs2" for using macs2 callpeak or "Tiles" for using a TileMatrix.

reproducibility

A string that indicates how peak reproducibility should be handled. This string is dynamic and can be a function of n where n is the number of samples being assessed. For example, reproducibility = "2" means at least 2 samples must have a peak call at this locus and reproducibility = "(n+1)/2" means that the majority of samples must have a peak call at this locus.

peaksPerCell

The upper limit of the number of peaks that can be identified per cell-grouping in groupBy. This is useful for controlling how many peaks can be called from cell groups with low cell numbers.

maxPeaks

A numeric threshold for the maximum peaks to retain per group from groupBy in the union reproducible peak set.

minCells

The minimum allowable number of unique cells that was used to create the coverage files on which peaks are called. This is important to allow for exclusion of pseudo-bulk replicates derived from very low cell numbers.

excludeChr

A character vector containing the seqnames of the chromosomes that should be excluded from peak calling.

pathToMacs2

The full path to the MACS2 executable.

genomeSize

The genome size to be used for MACS2 peak calling (see MACS2 documentation).

shift

The number of basepairs to shift each Tn5 insertion. When combined with extsize this allows you to create proper fragments, centered at the Tn5 insertion site, for use with MACS2 (see MACS2 documentation).

extsize

The number of basepairs to extend the MACS2 fragment after shift has been applied. When combined with extsize this allows you to create proper fragments, centered at the Tn5 insertion site, for use with MACS2 (see MACS2 documentation).

method

The method to use for significance testing in MACS2. Options are "p" for p-value and "q" for q-value. When combined with cutOff this gives the method and significance threshold for peak calling (see MACS2 documentation).

cutOff

The numeric significance cutOff for the testing method indicated by method (see MACS2 documentation).

additionalParams

A string of additional parameters to pass to MACS2 (see MACS2 documentation).

extendSummits

The number of basepairs to extend peak summits (in both directions) to obtain final fixed-width peaks. For example, extendSummits = 250 will create 501-bp fixed-width peaks from the 1-bp summits.

promoterRegion

A vector of two integers specifying the distance in basepairs upstream and downstream of a TSS to be included as a promoter region. Peaks called within one of these regions will be annotated as a "promoter" peak. For example, promoterRegion = c(2000, 100) will annotate any peak within the region 2000 bp upstream and 100 bp downstream of a TSS as a "promoter" peak.

genomeAnnotation

The genomeAnnotation (see createGenomeAnnotation()) to be used for generating peak metadata such as nucleotide information (GC content) or chromosome sizes.

geneAnnotation

The geneAnnotation (see createGeneAnnotation()) to be used for labeling peaks as "promoter", "exonic", etc.

plot

A boolean describing whether to plot peak annotation results.

threads

The number of threads to be used for parallel computing.

parallelParam

A list of parameters to be passed for biocparallel/batchtools parallel computing.

force

A boolean value indicating whether to force the reproducible peak set to be overwritten if it already exist in the given ArchRProject peakSet.

verbose

A boolean value that determines whether standard output includes verbose sections.

logFile

The path to a file to be used for logging ArchR output.

...

Additional parameters to be pass to addGroupCoverages() to get sample-guided pseudobulk cell-groupings. Only used for TileMatrix-based peak calling (not for MACS2). See addGroupCoverages() for more info.