This function will compute an iterative LSI dimensionality reduction on an ArchRProject.
addIterativeLSI( ArchRProj = NULL, useMatrix = "TileMatrix", name = "IterativeLSI", iterations = 2, clusterParams = list(resolution = c(2), sampleCells = 10000, maxClusters = 6, n.start = 10), firstSelection = "top", depthCol = "nFrags", varFeatures = 25000, dimsToUse = 1:30, LSIMethod = 2, scaleDims = TRUE, corCutOff = 0.75, binarize = TRUE, outlierQuantiles = c(0.02, 0.98), filterBias = TRUE, sampleCellsPre = 10000, projectCellsPre = FALSE, sampleCellsFinal = NULL, selectionMethod = "var", scaleTo = 10000, totalFeatures = 5e+05, filterQuantile = 0.995, excludeChr = c(), saveIterations = TRUE, UMAPParams = list(n_neighbors = 40, min_dist = 0.4, metric = "cosine", verbose = FALSE, fast_sgd = TRUE), nPlot = 10000, outDir = getOutputDirectory(ArchRProj), threads = getArchRThreads(), seed = 1, verbose = TRUE, force = FALSE, logFile = createLogFile("addIterativeLSI") )
The name of the data matrix to retrieve from the ArrowFiles associated with the
The name to use for storage of the IterativeLSI dimensionality reduction in the
The number of LSI iterations to perform.
A list of Additional parameters to be passed to
First iteration selection method for features to use for LSI. Either "Top" for the top accessible/average or "Var" for the top variable features. "Top" should be used for all scATAC-seq data (binary) while "Var" should be used for all scRNA/other-seq data types (non-binary).
A column in the
The number of N variable features to use for LSI. The top N features will be used based on the
A vector containing the dimensions from the
A number or string indicating the order of operations in the TF-IDF normalization. Possible values are: 1 or "tf-logidf", 2 or "log(tf-idf)", and 3 or "logtf-logidf".
A boolean that indicates whether to z-score the reduced dimensions for each cell. This is useful forminimizing the contribution
of strong biases (dominating early PCs) and lowly abundant populations. However, this may lead to stronger sample-specific biases since
it is over-weighting latent PCs. If set to
A numeric cutoff for the correlation of each dimension to the sequencing depth. If the dimension has a correlation to
sequencing depth that is greater than the
A boolean value indicating whether the matrix should be binarized before running LSI. This is often desired when working with insertion counts.
Two numerical values (between 0 and 1) that describe the lower and upper quantiles of bias (number of acessible regions per cell, determined
A boolean indicating whether to drop bias clusters when computing clusters during iterativeLSI.
An integer specifying the number of cells to sample in iterations prior to the last in order to perform a sub-sampled LSI and sub-sampled clustering. This greatly reduced memory usage and increases speed for early iterations.
A boolean indicating whether to reproject all cells into the sub-sampled LSI (see
An integer specifying the number of cells to sample in order to perform a sub-sampled LSI in final iteration.
The selection method to be used for identifying the top variable features. Valid options are "var" for log-variability or "vmr" for variance-to-mean ratio.
Each column in the matrix designated by
The number of features to consider for use in LSI after ranking the features by the total number of insertions.
These features are the only ones used throught the variance identification and LSI. These are an equivalent when using a
A number 0,1 that indicates the quantile above which features should be removed based on insertion counts prior
A string of chromosomes to exclude for iterativeLSI procedure.
to the first iteration of the iterative LSI paradigm. For example, if
A boolean value indicating whether the results of each LSI iterations should be saved as compressed
The list of parameters to pass to the UMAP function if "UMAP" if
The output directory for saving LSI iterations if desired. Default is in the
The number of threads to be used for parallel computing.
A number to be used as the seed for random number generation. It is recommended to keep track of the seed used so that you can reproduce results downstream.
A boolean value that determines whether standard output includes verbose sections.
A boolean value that indicates whether or not to overwrite relevant data in the
The path to a file to be used for logging ArchR output.