For each sample in the ArrowFiles or ArchRProject provided, this function will independently assign inferred doublet information to each cell. This allows for removing strong heterotypic doublet-based clusters downstream. A doublet results from a droplet that contained two cells, causing the ATAC-seq data to be a mixture of the signal from each cell.

addDoubletScores(
  input = NULL,
  useMatrix = "TileMatrix",
  k = 10,
  nTrials = 5,
  dimsToUse = 1:30,
  LSIMethod = 1,
  scaleDims = FALSE,
  corCutOff = 0.75,
  knnMethod = "UMAP",
  UMAPParams = list(n_neighbors = 40, min_dist = 0.4, metric = "euclidean", verbose =
    FALSE),
  LSIParams = list(outlierQuantiles = NULL, filterBias = FALSE),
  outDir = getOutputDirectory(input),
  threads = getArchRThreads(),
  force = FALSE,
  parallelParam = NULL,
  verbose = TRUE,
  logFile = createLogFile("addDoubletScores")
)

Arguments

input

An ArchRProject object or a character vector containing the paths to the ArrowFiles to be used.

useMatrix

The name of the matrix to be used for performing doublet identification analyses. Options include "TileMatrix" and "PeakMatrix".

k

The number of cells neighboring a simulated doublet to be considered as putative doublets.

nTrials

The number of times to simulate nCell (number of cells in the sample) doublets to use for doublet simulation when calculating doublet scores.

dimsToUse

A vector containing the dimensions from the reducedDims object to use in clustering.

LSIMethod

A number or string indicating the order of operations in the TF-IDF normalization. Possible values are: 1 or "tf-logidf", 2 or "log(tf-idf)", and 3 or "logtf-logidf".

scaleDims

A boolean that indicates whether to z-score the reduced dimensions for each cell during the LSI method performed for doublet determination. This is useful for minimizing the contribution of strong biases (dominating early PCs) and lowly abundant populations. However, this may lead to stronger sample-specific biases since it is over-weighting latent PCs.

corCutOff

A numeric cutoff for the correlation of each dimension to the sequencing depth. If the dimension has a correlation to sequencing depth that is greater than the corCutOff, it will be excluded from analysis.

knnMethod

The name of the dimensionality reduction method to be used for k-nearest neighbors calculation. Possible values are "UMAP" or "LSI".

UMAPParams

The list of parameters to pass to the UMAP function if "UMAP" is designated to knnMethod. See the function umap in the uwot package.

LSIParams

The list of parameters to pass to the IterativeLSI() function. See IterativeLSI().

outDir

The relative path to the output directory for relevant plots/results from doublet identification.

threads

The number of threads to be used for parallel computing.

force

If the UMAP projection is not accurate (when R < 0.8 for the reprojection of the training data - this occurs when you have a very homogenous population of cells), setting force=FALSE will return -1 for all doubletScores and doubletEnrichments. If you would like to override this (not recommended!), you can bypass this warning by setting force=TRUE.

parallelParam

A list of parameters to be passed for biocparallel/batchtools parallel computing.

verbose

A boolean value that determines whether standard output is printed.

logFile

The path to a file to be used for logging ArchR output.