scenic运行记录1
2021-04-16 本文已影响0人
Seurat_Satija
R version 4.0.2 (2020-06-22) -- "Taking Off Again"
Copyright (C) 2020 The R Foundation for Statistical Computing
Platform: x86_64-w64-mingw32/x64 (64-bit)
R is free software and comes with ABSOLUTELY NO WARRANTY.
You are welcome to redistribute it under certain conditions.
Type 'license()' or 'licence()' for distribution details.
R is a collaborative project with many contributors.
Type 'contributors()' for more information and
'citation()' on how to cite R or R packages in publications.
Type 'demo()' for some demos, 'help()' for on-line help, or
'help.start()' for an HTML browser interface to help.
Type 'q()' to quit R.
> # https://mp.weixin.qq.com/s/pN4qWdUszuGqr8nOJstn8w
>
> if (!requireNamespace("BiocManager", quietly = TRUE)) install.packages("BiocManager")
> BiocManager::version()
[1] ‘3.12’
> # If your bioconductor version is previous to 4.0, see the section bellow
>
> ## Required
> if (!requireNamespace(c("AUCell", "RcisTarget"), quietly = TRUE)) BiocManager::install(c("AUCell", "RcisTarget"),ask = F,update = F)
> if (!requireNamespace('GENIE3', quietly = TRUE)) BiocManager::install(c("GENIE3"),ask = F,update = F) # Optional. Can be replaced by GRNBoost
>
> ## Optional (but highly recommended):
> # To score the network on cells (i.e. run AUCell):
> if (!requireNamespace(c("zoo", "mixtools", "rbokeh"), quietly = TRUE)) BiocManager::install(c("zoo", "mixtools", "rbokeh"),ask = F,update = F)
> # For various visualizations and perform t-SNEs:
> if (!requireNamespace(c("DT", "NMF", "ComplexHeatmap", "R2HTML", "Rtsne"), quietly = TRUE)) BiocManager::install(c("DT", "NMF", "ComplexHeatmap", "R2HTML", "Rtsne"),ask = F,update = F)
>
> # To support paralell execution (not available in Windows):
> # To support paralell execution (not available in Windows):
> #if (!requireNamespace(c("doMC", "doRNG"), quietly = TRUE)) BiocManager::install(c("doMC", "doRNG"),ask = F,update = F)
>
> # To export/visualize in http://scope.aertslab.org
> if (!requireNamespace("devtools", quietly = TRUE)) install.packages("devtools")
> if (!requireNamespace("SCopeLoomR", quietly = TRUE)) devtools::install_github("aertslab/SCopeLoomR", build_vignettes = TRUE)
>
> if (!requireNamespace("devtools", quietly = TRUE)) install.packages("devtools")
> if (!requireNamespace("SCENIC", quietly = TRUE)) devtools::install_github("aertslab/SCENIC")
> packageVersion("SCENIC")
[1] ‘1.2.4’
>
> library(Rtsne)
Warning message:
程辑包‘Rtsne’是用R版本4.0.5 来建造的
> library(doSNOW)
载入需要的程辑包:foreach
载入需要的程辑包:iterators
载入需要的程辑包:snow
Warning messages:
1: 程辑包‘doSNOW’是用R版本4.0.5 来建造的
2: 程辑包‘foreach’是用R版本4.0.3 来建造的
3: 程辑包‘iterators’是用R版本4.0.3 来建造的
4: 程辑包‘snow’是用R版本4.0.3 来建造的
> #install.packages("doParallel")
> library(doParallel)
载入需要的程辑包:parallel
载入程辑包:‘parallel’
The following objects are masked from ‘package:snow’:
clusterApply, clusterApplyLB, clusterCall, clusterEvalQ, clusterExport,
clusterMap, clusterSplit, makeCluster, parApply, parCapply, parLapply,
parRapply, parSapply, splitIndices, stopCluster
Warning message:
程辑包‘doParallel’是用R版本4.0.3 来建造的
> library(doMPI)
载入需要的程辑包:Rmpi
Error: package or namespace load failed for ‘Rmpi’:
loadNamespace()里算'Rmpi'时.onLoad失败了,详细内容:
调用: inDL(x, as.logical(local), as.logical(now), ...)
错误: unable to load shared object 'C:/Users/Nano/Documents/R/win-library/4.0/Rmpi/libs/x64/Rmpi.dll':
LoadLibrary failure: 找不到指定的模块。
Error: 无法载入程辑包‘Rmpi’
In addition: Warning messages:
1: 程辑包‘doMPI’是用R版本4.0.5 来建造的
2: 程辑包‘Rmpi’是用R版本4.0.4 来建造的
> # install.packages("doSNOW")
> # install.packages("doMPI")
> rm(list = ls())
> library(Seurat)
Registered S3 method overwritten by 'spatstat':
method from
print.boxx cli
Attaching SeuratObject
Warning message:
程辑包‘Seurat’是用R版本4.0.4 来建造的
> # devtools::install_github('satijalab/seurat-data')
> library(SeuratData)
-- Installed datasets ------------------------------------------------ SeuratData v0.2.1 --
√ ifnb 3.1.0 √ pbmcMultiome 0.1.0
√ panc8 3.0.2 √ stxBrain 0.1.1
√ pbmc3k 3.1.4
------------------------------------------- Key -------------------------------------------
√ Dataset loaded successfully
> Dataset built with a newer version of Seurat than installed
(?) Unknown version of Seurat installed
There were 12 warnings (use warnings() to see them)
> AvailableData()
Dataset Version
bmcite.SeuratData bmcite 0.3.0
cbmc.SeuratData cbmc 3.1.4
celegans.embryo.SeuratData celegans.embryo 0.1.0
hcabm40k.SeuratData hcabm40k 3.0.0
ifnb.SeuratData ifnb 3.1.0
panc8.SeuratData panc8 3.0.2
pbmc3k.SeuratData pbmc3k 3.1.4
pbmcMultiome.SeuratData pbmcMultiome 0.1.0
pbmcsca.SeuratData pbmcsca 3.0.0
ssHippo.SeuratData ssHippo 3.1.4
stxBrain.SeuratData stxBrain 0.1.1
stxKidney.SeuratData stxKidney 0.1.0
thp1.eccite.SeuratData thp1.eccite 3.1.5
Summary
bmcite.SeuratData 30k Bone Marrow Cells
cbmc.SeuratData scRNAseq and 13-antibody sequencing of CBMCs
celegans.embryo.SeuratData 6k C. elegans embryos from Packer and Zhu et al (2019)
hcabm40k.SeuratData 40,000 Cells From the Human Cell Atlas ICA Bone Marrow Dataset
ifnb.SeuratData IFNB-Stimulated and Control PBMCs
panc8.SeuratData Eight Pancreas Datasets Across Five Technologies
pbmc3k.SeuratData 3k PBMCs from 10X Genomics
pbmcMultiome.SeuratData 10X Genomics PBMC Multiome Dataset
pbmcsca.SeuratData Broad Institute PBMC Systematic Comparative Analysis
ssHippo.SeuratData Slide-seq v2 dataset of mouse hippocampus
stxBrain.SeuratData 10X Genomics Visium Mouse Brain Dataset
stxKidney.SeuratData 10X Genomics Visium Mouse Kidney Dataset
thp1.eccite.SeuratData ECCITE-seq THP-1
seurat species system ncells
bmcite.SeuratData 3.2.2 human bone marrow 30672
cbmc.SeuratData 3.1.4 human CBMC (cord blood) 8617
celegans.embryo.SeuratData <NA> C. elegans embryo 6188
hcabm40k.SeuratData <NA> human bone marrow 40000
ifnb.SeuratData <NA> human PBMC 13999
panc8.SeuratData <NA> human Pancreatic Islets 14892
pbmc3k.SeuratData 3.1.4 human PBMC 2700
pbmcMultiome.SeuratData 4.0.0 human pbmc 11909
pbmcsca.SeuratData <NA> human PBMC 31021
ssHippo.SeuratData <NA> mouse hippocampus NA
stxBrain.SeuratData <NA> mouse brain 12167
stxKidney.SeuratData <NA> mouse kidney 1438
thp1.eccite.SeuratData <NA> human <NA> NA
tech
bmcite.SeuratData <NA>
cbmc.SeuratData CITE-seq
celegans.embryo.SeuratData <NA>
hcabm40k.SeuratData 10x v2
ifnb.SeuratData 10x v1
panc8.SeuratData SMARTSeq2, Fluidigm C1, CelSeq, CelSeq2, inDrops
pbmc3k.SeuratData 10x v1
pbmcMultiome.SeuratData <NA>
pbmcsca.SeuratData 10x v2, 10x v3, SMARTSeq2, Seq-Well, inDrops, Drop-seq, CelSeq2
ssHippo.SeuratData slideseq v2
stxBrain.SeuratData visium
stxKidney.SeuratData visium
thp1.eccite.SeuratData <NA>
default.dataset disk.datasets
bmcite.SeuratData <NA> <NA>
cbmc.SeuratData raw processed
celegans.embryo.SeuratData raw <NA>
hcabm40k.SeuratData raw <NA>
ifnb.SeuratData raw <NA>
panc8.SeuratData raw <NA>
pbmc3k.SeuratData raw <NA>
pbmcMultiome.SeuratData NA <NA>
pbmcsca.SeuratData raw <NA>
ssHippo.SeuratData raw <NA>
stxBrain.SeuratData NA <NA>
stxKidney.SeuratData raw <NA>
thp1.eccite.SeuratData <NA> <NA>
other.datasets
bmcite.SeuratData <NA>
cbmc.SeuratData <NA>
celegans.embryo.SeuratData <NA>
hcabm40k.SeuratData <NA>
ifnb.SeuratData processed
panc8.SeuratData <NA>
pbmc3k.SeuratData pbmc3k.final
pbmcMultiome.SeuratData pbmc.rna, pbmc.atac
pbmcsca.SeuratData <NA>
ssHippo.SeuratData <NA>
stxBrain.SeuratData posterior1, posterior2, anterior1, anterior2
stxKidney.SeuratData <NA>
thp1.eccite.SeuratData <NA>
notes
bmcite.SeuratData <NA>
cbmc.SeuratData <NA>
celegans.embryo.SeuratData <NA>
hcabm40k.SeuratData <NA>
ifnb.SeuratData <NA>
panc8.SeuratData <NA>
pbmc3k.SeuratData <NA>
pbmcMultiome.SeuratData One sample with two modalities, RNA and ATAC
pbmcsca.SeuratData HCA benchmark
ssHippo.SeuratData <NA>
stxBrain.SeuratData One sample split across four datasets as paired anterior/posterior slices
stxKidney.SeuratData <NA>
thp1.eccite.SeuratData <NA>
Installed InstalledVersion
bmcite.SeuratData FALSE <NA>
cbmc.SeuratData FALSE <NA>
celegans.embryo.SeuratData FALSE <NA>
hcabm40k.SeuratData FALSE <NA>
ifnb.SeuratData TRUE 3.1.0
panc8.SeuratData TRUE 3.0.2
pbmc3k.SeuratData TRUE 3.1.4
pbmcMultiome.SeuratData TRUE 0.1.0
pbmcsca.SeuratData FALSE <NA>
ssHippo.SeuratData FALSE <NA>
stxBrain.SeuratData TRUE 0.1.1
stxKidney.SeuratData FALSE <NA>
thp1.eccite.SeuratData FALSE <NA>
There were 12 warnings (use warnings() to see them)
> # InstallData("pbmc3k") # (89.4 MB)
> data("pbmc3k")
> exprMat <- as.matrix(pbmc3k@assays$RNA@data)
> dim(exprMat)
[1] 13714 2700
> exprMat[1:4,1:4]
AAACATACAACCAC AAACATTGAGCTAC AAACATTGATCAGC AAACCGTGCTTCCG
AL627309.1 0 0 0 0
AP006222.2 0 0 0 0
RP11-206L10.2 0 0 0 0
RP11-206L10.9 0 0 0 0
> cellInfo <- pbmc3k@meta.data[,c(4,2,3)]
> colnames(cellInfo)=c('CellType', 'nGene' ,'nUMI')
> head(cellInfo)
CellType nGene nUMI
AAACATACAACCAC Memory CD4 T 2419 779
AAACATTGAGCTAC B 4903 1352
AAACATTGATCAGC Memory CD4 T 3147 1129
AAACCGTGCTTCCG CD14+ Mono 2639 960
AAACCGTGTATGCG NK 980 521
AAACGCACTGGTAC Memory CD4 T 2163 781
> table(cellInfo$CellType)
Naive CD4 T Memory CD4 T CD14+ Mono B CD8 T FCGR3A+ Mono NK
697 483 480 344 271 162 155
DC Platelet
32 14
> ### Initialize settings
> library(SCENIC)
Warning message:
程辑包‘SCENIC’是用R版本4.0.3 来建造的
> # 保证cisTarget_databases 文件夹下面有下载好2个1G的文件
> scenicOptions <- initializeScenic(org="hgnc",
+ dbDir="cisTarget_databases", nCores=1)
Motif databases selected:
hg19-500bp-upstream-7species.mc9nr.feather
hg19-tss-centered-10kb-7species.mc9nr.feather
Registered S3 methods overwritten by 'arrow':
method from
as.character.Array DelayedArray
as.integer.Array DelayedArray
as.vector.Array DelayedArray
Warning messages:
1: In RcisTarget::importRankings(dbFilePath, columns = randomCol) :
The following columns are missing from the database:
2: In RcisTarget::importRankings(dbFilePath, columns = randomCol) :
The following columns are missing from the database:
3: package ‘RcisTarget’ was built under R version 4.0.3
4: In RcisTarget::importRankings(dbFile, columns = rnktype) :
The following columns are missing from the database: features
5: In RcisTarget::importRankings(dbFile, columns = rnktype) :
The following columns are missing from the database: features
> saveRDS(scenicOptions, file="int/scenicOptions.Rds")
> ### Co-expression network
> genesKept <- geneFiltering(exprMat, scenicOptions)
Maximum value in the expression matrix: 419
Ratio of detected vs non-detected: 0.066
Number of counts (in the dataset units) per gene:
Min. 1st Qu. Median Mean 3rd Qu. Max.
3.0 16.0 57.0 465.7 176.8 161685.0
Number of cells in which each gene is detected:
Min. 1st Qu. Median Mean 3rd Qu. Max.
3.0 15.0 54.0 166.5 157.0 2700.0
Number of genes left after applying the following filters (sequential):
5749 genes with counts per gene > 81
5748 genes detected in more than 27 cells
5318 genes available in RcisTarget database
Gene list saved in int/1.1_genesKept.Rds
> exprMat_filtered <- exprMat[genesKept, ]
> exprMat_filtered[1:4,1:4]
AAACATACAACCAC AAACATTGAGCTAC AAACATTGATCAGC AAACCGTGCTTCCG
NOC2L 0 0 0 0
HES4 0 0 0 0
ISG15 0 0 1 9
TNFRSF18 0 2 0 0
> dim(exprMat_filtered)
[1] 5318 2700
> runCorrelation(exprMat_filtered, scenicOptions)
> exprMat_filtered_log <- log2(exprMat_filtered+1)
> runGenie3(exprMat_filtered_log, scenicOptions)
Using 480 TFs as potential regulators...
Running GENIE3 part 1
Running GENIE3 part 10
Running GENIE3 part 2
Running GENIE3 part 3
Running GENIE3 part 4
Running GENIE3 part 5
Running GENIE3 part 6
Running GENIE3 part 7
Running GENIE3 part 8
Running GENIE3 part 9
Finished running GENIE3.
Warning message:
In runGenie3(exprMat_filtered_log, scenicOptions) :
Only 480 (26%) of the 1839 TFs in the database were found in the dataset. Do they use the same gene IDs?
>
> ### Build and score the GRN
> exprMat_log <- log2(exprMat+1)
> scenicOptions@settings$dbs <- scenicOptions@settings$dbs["10kb"] # Toy run settings
> scenicOptions <- runSCENIC_1_coexNetwork2modules(scenicOptions)
00:50 Creating TF modules
Number of links between TFs and targets (weight>=0.001): 1773162
[,1]
nTFs 480
nTargets 5318
nGeneSets 3832
nLinks 2515239
> scenicOptions <- runSCENIC_2_createRegulons(scenicOptions,
+ coexMethod=c("top5perTarget")) # Toy run settings
00:50 Step 2. Identifying regulons
tfModulesSummary:
[,1]
top5perTarget 61
00:50 RcisTarget: Calculating AUC
Scoring database: [Source file: hg19-tss-centered-10kb-7species.mc9nr.feather]
00:53 RcisTarget: Adding motif annotation
Using BiocParallel...
Number of motifs in the initial enrichment: 19044
Number of motifs annotated to the matching TF: 357
00:53 RcisTarget: Pruning targets
00:55 Number of motifs that support the regulons: 357
Preview of motif enrichment saved as: output/Step2_MotifEnrichment_preview.html
Warning messages:
1: package ‘BiocParallel’ was built under R version 4.0.3
2: In MulticoreParam(nCores) :
MulticoreParam() not supported on Windows, use SnowParam()
3: package ‘AUCell’ was built under R version 4.0.3
4: In RcisTarget::importRankings(dbFilePath, columns = randomCol) :
The following columns are missing from the database:
5: In RcisTarget::importRankings(dbFile, columns = rnktype) :
The following columns are missing from the database: features
6: In importRankings(rnkName, columns = allGenes) :
The following columns are missing from the database:
7: In importRankings(dbNames[motifDbName], columns = allGenes) :
The following columns are missing from the database:
8: In .addSignificantGenes(resultsTable = resultsTable, geneSets = geneSets, :
The rakings provided only include a subset of genes/regions included in the whole database.
9: package ‘data.table’ was built under R version 4.0.3
> library(doParallel)
> scenicOptions <- runSCENIC_3_scoreCells(scenicOptions, exprMat_log )
00:56 Step 3. Analyzing the network activity in each individual cell
Number of regulons to evaluate on cells: 21
Biggest (non-extended) regulons:
SPI1 (539g)
CEBPD (73g)
CEBPB (47g)
SPIB (38g)
TBX21 (28g)
IRF7 (15g)
IRF8 (16g)
STAT1 (14g)
Quantiles for the number of genes detected by cell:
(Non-detected genes are shuffled at the end of the ranking. Keep it in mind when choosing the threshold for calculating the AUC).
min 1% 5% 10% 50% 100%
212.00 325.00 434.95 539.90 816.00 3400.00
00:57 Finished running AUCell.
00:57 Plotting heatmap...
00:57 Plotting t-SNEs...
> scenicOptions <- runSCENIC_4_aucell_binarize(scenicOptions)
Binary regulon activity: 12 TF regulons x 2216 cells.
(20 regulons including 'extended' versions)
11 regulons are active in more than 1% (22.16) cells.
> tsneAUC(scenicOptions, aucType="AUC") # choose settings
[1] "int/tSNE_AUC_50pcs_30perpl.Rds"
>