For each sample in the ArrowFiles or ArchRProject provided, this function will independently assign inferred doublet information to each cell. This allows for removing strong heterotypic doublet-based clusters downstream. A doublet results from a droplet that contained two cells, causing the ATAC-seq data to be a mixture of the signal from each cell.
addDoubletScores( input = NULL, useMatrix = "TileMatrix", k = 10, nTrials = 5, dimsToUse = 1:30, LSIMethod = 1, scaleDims = FALSE, corCutOff = 0.75, knnMethod = "UMAP", UMAPParams = list(n_neighbors = 40, min_dist = 0.4, metric = "euclidean", verbose = FALSE), LSIParams = list(outlierQuantiles = NULL, filterBias = FALSE), outDir = getOutputDirectory(input), threads = getArchRThreads(), force = FALSE, parallelParam = NULL, verbose = TRUE, logFile = createLogFile("addDoubletScores") )
The name of the matrix to be used for performing doublet identification analyses. Options include "TileMatrix" and "PeakMatrix".
The number of cells neighboring a simulated doublet to be considered as putative doublets.
The number of times to simulate nCell (number of cells in the sample) doublets to use for doublet simulation when calculating doublet scores.
A vector containing the dimensions from the
A number or string indicating the order of operations in the TF-IDF normalization. Possible values are: 1 or "tf-logidf", 2 or "log(tf-idf)", and 3 or "logtf-logidf".
A boolean that indicates whether to z-score the reduced dimensions for each cell during the LSI method performed for doublet determination. This is useful for minimizing the contribution of strong biases (dominating early PCs) and lowly abundant populations. However, this may lead to stronger sample-specific biases since it is over-weighting latent PCs.
A numeric cutoff for the correlation of each dimension to the sequencing depth. If the dimension has a correlation
to sequencing depth that is greater than the
The name of the dimensionality reduction method to be used for k-nearest neighbors calculation. Possible values are "UMAP" or "LSI".
The list of parameters to pass to the UMAP function if "UMAP" is designated to
The list of parameters to pass to the
The relative path to the output directory for relevant plots/results from doublet identification.
The number of threads to be used for parallel computing.
If the UMAP projection is not accurate (when R < 0.8 for the reprojection of the training data - this occurs when you
have a very homogenous population of cells), setting
A list of parameters to be passed for biocparallel/batchtools parallel computing.
A boolean value that determines whether standard output is printed.
The path to a file to be used for logging ArchR output.