This function will identify clusters from a reduced dimensions object in an ArchRProject or from a supplied reduced dimensions matrix.

addClusters(
  input = NULL,
  reducedDims = "IterativeLSI",
  name = "Clusters",
  sampleCells = NULL,
  seed = 1,
  method = "Seurat",
  dimsToUse = NULL,
  scaleDims = NULL,
  corCutOff = 0.75,
  knnAssign = 10,
  nOutlier = 5,
  maxClusters = 25,
  testBias = TRUE,
  filterBias = FALSE,
  biasClusters = 0.01,
  biasCol = "nFrags",
  biasVals = NULL,
  biasQuantiles = c(0.05, 0.95),
  biasEnrich = 10,
  biasProportion = 0.5,
  biasPval = 0.05,
  nPerm = 500,
  prefix = "C",
  ArchRProj = NULL,
  verbose = TRUE,
  tstart = NULL,
  force = FALSE,
  logFile = createLogFile("addClusters"),
  ...
)

Arguments

input

Either (i) an ArchRProject object containing the dimensionality reduction matrix passed by reducedDims or (ii) a dimensionality reduction matrix. This object will be used for cluster identification.

reducedDims

The name of the reducedDims object (i.e. "IterativeLSI") to retrieve from the designated ArchRProject. Not required if input is a matrix.

name

The column name of the cluster label column to be added to cellColData if input is an ArchRProject object.

sampleCells

An integer specifying the number of cells to subsample and perform clustering on. The remaining cells that were not subsampled will be assigned to the cluster of the nearest subsampled cell. This enables a decrease in run time but can sacrifice granularity of clusters.

seed

A number to be used as the seed for random number generation required in cluster determination. It is recommended to keep track of the seed used so that you can reproduce results downstream.

method

A string indicating the clustering method to be used. Supported methods are "Seurat" and "Scran".

dimsToUse

A vector containing the dimensions from the reducedDims object to use in clustering.

scaleDims

A boolean value that indicates whether to z-score the reduced dimensions for each cell. This is useful for minimizing the contribution of strong biases (dominating early PCs) and lowly abundant populations. However, this may lead to stronger sample-specific biases since it is over-weighting latent PCs. If set to NULL this will scale the dimensions based on the value of scaleDims when the reducedDims were originally created during dimensionality reduction. This idea was introduced by Timothy Stuart.

corCutOff

A numeric cutoff for the correlation of each dimension to the sequencing depth. If the dimension has a correlation to sequencing depth that is greater than the corCutOff, it will be excluded from analysis.

knnAssign

The number of nearest neighbors to be used during clustering for assignment of outliers (clusters with less than nOutlier cells).

nOutlier

The minimum number of cells required for a group of cells to be called as a cluster. If a group of cells does not reach this threshold, then the cells will be considered outliers and assigned to nearby clusters.

maxClusters

The maximum number of clusters to be called. If the number exceeds this the clusters are merged unbiasedly using hclust and cutree. This is useful for contraining the cluster calls to be reasonable if they are converging on large numbers. Useful in iterativeLSI as well for initial iteration. Default is set to 25.

testBias

A boolean value that indicates whether or not to test clusters for bias.

filterBias

A boolean value indicates whether or not to filter clusters that are identified as biased.

biasClusters

A numeric value between 0 and 1 indicating that clusters that are smaller than the specified proportion of total cells are to be checked for bias. This should be set close to 0. We recommend a default of 0.01 which specifies clusters below 1 percent of the total cells.

biasCol

The name of a column in cellColData that contains the numeric values used for testing bias enrichment.

biasVals

A set of numeric values used for testing bias enrichment if input is not an ArchRProject.

biasQuantiles

A vector of two numeric values, each between 0 and 1, that describes the lower and upper quantiles of the bias values to use for computing bias enrichment statistics.

biasEnrich

A numeric value that specifies the minimum enrichment of biased cells over the median of the permuted background sets.

biasProportion

A numeric value between 0 and 1 that specifies the minimum proportion of biased cells in a cluster required to determine that the cluster is biased during testing for bias-enriched clusters.

biasPval

A numeric value between 0 and 1 that specifies the p-value to use when testing for bias-enriched clusters.

nPerm

An integer specifying the number of permutations to perform for testing bias-enriched clusters.

prefix

A character string to be added before each cluster identity. For example, if "Cluster" then cluster results will be "Cluster1", "Cluster2" etc.

ArchRProj

An ArchRProject object containing the dimensionality reduction matrix passed by reducedDims. This argument can also be supplied as input.

verbose

A boolean value indicating whether to use verbose output during execution of this function. Can be set to FALSE for a cleaner output.

tstart

A timestamp that is typically passed internally from another function (for ex. "IterativeLSI") to measure how long the clustering analysis has been running relative to the start time when this process was initiated in another function. This argument is rarely manually specified.

force

A boolean value that indicates whether or not to overwrite data in a given column when the value passed to name already exists as a column name in cellColData.

logFile

The path to a file to be used for logging ArchR output.

...

Additional arguments to be provided to Seurat::FindClusters or scran::buildSNNGraph (for example, knn = 50, jaccard = TRUE)