3.1 Creating An ArchRProject

First, we must create our ArchRProject by providing a list of Arrow files and a few other parameters. The outputDirectory here describes where all downstream analyses and plots will be saved. ArchR will automatically associate the previously provided geneAnnotation and genomeAnnotation with the new ArchRProject. These were stored when we ran addArchRGenome("hg19") in a previous chapter.

projHeme1 <- ArchRProject(
  ArrowFiles = ArrowFiles, 
  outputDirectory = "HemeTutorial",
  copyArrows = TRUE #This is recommened so that if you modify the Arrow files you have an original copy for later usage.
)

## Using GeneAnnotation set by addArchRGenome(Hg19)!
## Using GeneAnnotation set by addArchRGenome(Hg19)!
## Validating Arrows…
## Getting SampleNames…
##
## Copying ArrowFiles to Ouptut Directory! If you want to save disk space set copyArrows = FALSE
## 1 2 3
## Getting Cell Metadata…
##
## Merging Cell Metadata…
## Initializing ArchRProject…

We call this ArchRProject “projHeme1” because it is the first iteration of our hematopoiesis project. Throughout this walkthrough we will modify and update this ArchRProject and keep track of which version of the project we are using by iterating the project number (i.e. “projHeme2”).

We can examine the contents of our ArchRProject:

projHeme1

## class: ArchRProject
## outputDirectory: /oak/stanford/groups/howchang/users/jgranja/ArchRTutorial/ArchRBook/BookOutput4/HemeTutorial
## samples(3): scATAC_BMMC_R1 scATAC_CD34_BMMC_R1 scATAC_PBMC_R1
## sampleColData names(1): ArrowFiles
## cellColData names(11): Sample TSSEnrichment … DoubletScore
## DoubletEnrichment
## numberOfCells(1): 10661
## medianTSS(1): 16.832
## medianFrags(1): 3050

We can see from the above that our ArchRProject has been initialized with a few important attributes:

  1. The specified outputDirectory.
  2. The sampleNames of each sample which were obtained from the Arrow files.
  3. A matrix called sampleColData which contains data associated with each sample.
  4. A matrix called cellColData which contains data associated with each cell. Because we already computed doublet enrichment scores using addDoubletScores(), which added those values to each cell in the Arrow files, we can see columns corresponding to the “DoubletEnrichment” and “DoubletScore” in the cellColData matrix.
  5. The total number of cells in our project which represents all samples after doublet identification and removal.
  6. The median TSS enrichment score and the median number of fragments across all cells and all samples.

We can check how much memory is used to store the ArchRProject in memory within R:

paste0("Memory Size = ", round(object.size(projHeme1) / 10^6, 3), " MB")

## [1] “Memory Size = 37.135 MB”

We can also ask which data matrices are available within the ArchRProject which will be useful downstream once we start adding to this project:

getAvailableMatrices(projHeme1)

## [1] “GeneScoreMatrix” “TileMatrix”