空间转录组空间转录组

【10X空间转录组Visium】(三)跑通Visium全流程记录

2020-03-25  本文已影响0人  Geekero

旧号无故被封,小号再发一次

更多空间转录组文章:

1. 新版10X Visium
2. 旧版Sptial

下载数据集

https://support.10xgenomics.com/spatial-gene-expression/datasets
我选择的是:Mouse Brain Section (Coronal)

$ tar -xvf V1_Adult_Mouse_Brain_fastqs.tar
$ ls
V1_Adult_Mouse_Brain_S5_L001_I1_001.fastq.gz  V1_Adult_Mouse_Brain_S5_L001_R2_001.fastq.gz  V1_Adult_Mouse_Brain_S5_L002_R1_001.fastq.gz
V1_Adult_Mouse_Brain_S5_L001_I2_001.fastq.gz  V1_Adult_Mouse_Brain_S5_L002_I1_001.fastq.gz  V1_Adult_Mouse_Brain_S5_L002_R2_001.fastq.gz
V1_Adult_Mouse_Brain_S5_L001_R1_001.fastq.gz  V1_Adult_Mouse_Brain_S5_L002_I2_001.fastq.gz

运行spaceranger count

此处选择自动对齐的方案
由于服务器没有连接外网:所以手动下载slide文件
https://support.10xgenomics.com/spatial-gene-expression/software/pipelines/latest/using/count

$ spaceranger count --id=V1_Adult_Mouse_Brain \
                      --transcriptome=/share/nas1/Data/luohb/Visium/reference/refdata-cellranger-mm10-3.0.0/  \
                      --fastqs=/share/nas1/Data/luohb/Visium/test2/V1_Adult_Mouse_Brain_fastqs \
                      --sample=V1_Adult_Mouse_Brain \
                      --image=/share/nas1/Data/luohb/Visium/test2/V1_Adult_Mouse_Brain_image.tif \
                      --slide=V19L01-041 \
                      --area=C1 \
                      --slidefile=/share/nas1/Data/luohb/Visium/test2/V19L01-041.gpr \
                      --localcores=32   \
                      --localmem=128

顺利地跑完了,因为服务器同时还跑着几个比较大的任务,然后居然跑了接近13个小时。。。


image.png

查看结果文件

$ ls
_cmdline   _finalstate  _jobmode  _mrosource  _perf              _sitecheck              _tags       _uuid                         _vdrkill
_filelist  _invocation  _log      outs        _perf._truncated_  SPATIAL_RNA_COUNTER_CS  _timestamp  V1_Adult_Mouse_Brain.mri.tgz  _versions

$ cd outs/
$ ls
analysis       filtered_feature_bc_matrix     metrics_summary.csv  possorted_genome_bam.bam      raw_feature_bc_matrix     spatial
cloupe.cloupe  filtered_feature_bc_matrix.h5  molecule_info.h5     possorted_genome_bam.bam.bai  raw_feature_bc_matrix.h5  web_summary.html

$cd analysis/
$ls
clustering  diffexp  pca  tsne  umap

1. PCA降维结果:

$cd /pca/10_components
$ls
components.csv  dispersion.csv  features_selected.csv  projection.csv  variance.csv

投影

$head -3 projection.csv 
Barcode,PC-1,PC-2,PC-3,PC-4,PC-5,PC-6,PC-7,PC-8,PC-9,PC-10
AAACAAGTATCTCCCA-1,-10.281241313083257,-24.67223115562252,-0.19850052930601336,-2.1734929997144388,6.630976878797487,-0.12128746693282366,6.040708059434257,4.657495740394594,16.344239212184327,6.523601903899456
AAACAATCTACTAGCA-1,17.830458684877186,-27.53526668134934,15.877302377060623,9.74572143694312,-0.7208195934715782,-4.339470398396214,2.5444608437485288,-5.084679351848514,2.9247276185469495,-1.0731021612191327

components matrix

$less -S components.csv
PC,ENSMUSG00000051951,ENSMUSG00000089699,ENSMUSG00000025900,ENSMUSG00000025902,ENSMUSG00000033845,ENSMUSG00000025903,ENSMUSG00000104217,ENSMUSG00000033813,(略……)
1,9.807402710059275e-05,-0.0007359419037463138,0.0018506647696503106,0.0019216677830155664,-0.009477278899046813,-0.005003056852125207,0.0,-0.008498306263180
2,-0.0013017257339919546,0.0015759310908915448,0.0013809836795030965,0.0009513422156874659,0.007418499981929492,0.003222355732773671,0.0,0.00887178686827463,
3,-0.001920230193482586,0.003378841598139873,-0.00012165106820253075,-0.00024897415838216264,-0.0031447165300072175,-0.007787586978438225,0.0,-0.003148852394
(略……)

总方差的比例

$head -3 variance.csv
PC,Proportion.Variance.Explained
1,0.030645967432188836
2,0.015067575203691749

归一化的离散度

$head -3 dispersion.csv
Feature,Normalized.Dispersion
ENSMUSG00000051951,0.261762717719762
ENSMUSG00000089699,-1.5988672040435437

2. t-SNE结果文件:

$cd ../../tsne/2_components/
$ls
projection.csv

$head -5 projection.csv 
Barcode,TSNE-1,TSNE-2
AAACAAGTATCTCCCA-1,-18.47081216664088,7.240054873818881
AAACAATCTACTAGCA-1,-4.219964329936257,-9.182632464702484
AAACACCAATAACTGC-1,14.744060324279337,13.360913482080413
AAACAGAGCGACTCCT-1,-11.72411901642397,-7.924228663324808

3. 聚类结果:

$cd ../../clustering/
$ls
graphclust          kmeans_2_clusters  kmeans_4_clusters  kmeans_6_clusters  kmeans_8_clusters
kmeans_10_clusters  kmeans_3_clusters  kmeans_5_clusters  kmeans_7_clusters  kmeans_9_clusters

对于每个聚类, spaceranger为每个点生成聚类分配cluster assignments

打开聚类3看看:

$cd kmeans_3_clusters
$ls
clusters.csv
$head -5 clusters.csv 
Barcode,Cluster
AAACAAGTATCTCCCA-1,1
AAACAATCTACTAGCA-1,3
AAACACCAATAACTGC-1,2
AAACAGAGCGACTCCT-1,1

4. 差异表达分析:

$cd ../../diffexp/
$ls
graphclust          kmeans_2_clusters  kmeans_4_clusters  kmeans_6_clusters  kmeans_8_clusters
kmeans_10_clusters  kmeans_3_clusters  kmeans_5_clusters  kmeans_7_clusters  kmeans_9_clusters

这次看个总表:

$cd /graphclust
$ls
differential_expression.csv
$head -3 differential_expression.csv 
Feature ID,Feature Name,Cluster 1 Mean Counts,Cluster 1 Log2 fold change,Cluster 1 Adjusted p value,Cluster 2 Mean Counts,Cluster 2 Log2 fold change,Cluster 2 Adjusted p value,Cluster 3 Mean Counts,Cluster 3 Log2 fold change,Cluster 3 Adjusted p value,Cluster 4 Mean Counts,Cluster 4 Log2 fold change,Cluster 4 Adjusted p value,Cluster 5 Mean Counts,Cluster 5 Log2 fold change,Cluster 5 Adjusted p value,Cluster 6 Mean Counts,Cluster 6 Log2 fold change,Cluster 6 Adjusted p value,Cluster 7 Mean Counts,Cluster 7 Log2 fold change,Cluster 7 Adjusted p value,Cluster 8 Mean Counts,Cluster 8 Log2 fold change,Cluster 8 Adjusted p value,Cluster 9 Mean Counts,Cluster 9 Log2 fold change,Cluster 9 Adjusted p value
ENSMUSG00000051951,Xkr4,0.09115907843838432,0.15688013442205495,0.9130108472807676,0.08789156406190936,0.094226986457139,1.0,0.059424476860418934,-0.5579910544947899,0.4792687534164091,0.09747791035014447,0.270272692975412,0.7950049780312995,0.08717356987748102,0.14776402072440886,1.0,0.05406634025868632,-0.6310298603360582,0.7980928917515894,0.15030400022885756,0.9570457266970553,0.22931236900985477,0.0606581027791399,-0.4319057525382224,1.0,0.10761817731957228,0.4400508833584902,1.0
ENSMUSG00000089699,Gm1992,0.0016574377897888059,1.3866145310996707,0.8220253607506287,0.0,0.423008752385563,1.0,0.0,0.22991150489664136,1.0,0.0033613072534532575,2.5793194965660433,0.5338242296758853,0.0,2.3542148981918345,1.0,0.003180372956393313,2.490599584065473,0.8676482778053517,0.0,1.5959470345290159,1.0,0.0,1.4568374963600368,1.0,0.0,2.146642828481177,1.0

5 .矩阵:Feature-Barcode Matrices
矩阵的每个元素是与特征(行)和条形码(列)关联的UMI的数量。

$cd /share/nas1/Data/luohb/Visium/test2/V1_Adult_Mouse_Brain/outs
$ls
analysis       filtered_feature_bc_matrix     metrics_summary.csv  possorted_genome_bam.bam      raw_feature_bc_matrix     spatial
cloupe.cloupe  filtered_feature_bc_matrix.h5  molecule_info.h5     possorted_genome_bam.bam.bai  raw_feature_bc_matrix.h5  web_summary.html
$tree filtered_feature_bc_matrix
filtered_feature_bc_matrix
├── barcodes.tsv.gz
├── features.tsv.gz
└── matrix.mtx.gz
0 directories, 3 files

$tree raw_feature_bc_matrix
raw_feature_bc_matrix
├── barcodes.tsv.gz
├── features.tsv.gz
└── matrix.mtx.gz
0 directories, 3 files
$gzip -cd filtered_feature_bc_matrix/features.tsv.gz |head -3
ENSMUSG00000051951  Xkr4    Gene Expression
ENSMUSG00000089699  Gm1992  Gene Expression
ENSMUSG00000102343  Gm37381 Gene Expression

其中:

第一列 第二列 第三列
功能ID 基因名 标识特征的类型

尝试将矩阵加载到R

library(Matrix)
matrix_dir = "/share/nas1/Data/luohb/Visium/test2/V1_Adult_Mouse_Brain/outs/filtered_feature_bc_matrix/"
barcode.path <- paste0(matrix_dir, "barcodes.tsv.gz")
features.path <- paste0(matrix_dir, "features.tsv.gz")
matrix.path <- paste0(matrix_dir, "matrix.mtx.gz")
mat <- readMM(file = matrix.path)
feature.names = read.delim(features.path, 
                           header = FALSE,
                           stringsAsFactors = FALSE)
barcode.names = read.delim(barcode.path, 
                           header = FALSE,
                           stringsAsFactors = FALSE)
colnames(mat) = barcode.names$V1
rownames(mat) = feature.names$V1
dim(mat)
[1] 31053  2698

尝试将矩阵加载到Python

import csv
import gzip
import os
import scipy.io
 
matrix_dir = "/share/nas1/Data/luohb/Visium/test2/V1_Adult_Mouse_Brain/outs/filtered_feature_bc_matrix"
mat = scipy.io.mmread(os.path.join(matrix_dir, "matrix.mtx.gz"))


features_path = os.path.join(matrix_dir, "features.tsv.gz")
feature_ids = [row[0] for row in csv.reader(gzip.open(features_path), delimiter="\t")]
gene_names = [row[1] for row in csv.reader(gzip.open(features_path), delimiter="\t")]
feature_types = [row[2] for row in csv.reader(gzip.open(features_path), delimiter="\t")]
barcodes_path = os.path.join(matrix_dir, "barcodes.tsv.gz")
barcodes = [row[0] for row in csv.reader(gzip.open(barcodes_path), delimiter="\t")]

6. 看图片

$cd spatial/
$ls
aligned_fiducials.jpg  detected_tissue_image.jpg  scalefactors_json.json  tissue_hires_image.png  tissue_lowres_image.png  tissue_positions_list.csv

tissue_hires_image.png:较高像素的明场图片


image.png

tissue_lowres_image.png:较低像素的明场图片


image.png
aligned_fiducials.jpg(尺寸与 tissue_hires_image.png相同):用于验证基准对齐是否成功
image.png

相应的像素坐标转换文件:scalefactors_json.json

$cat scalefactors_json.json
{"spot_diameter_fullres": 89.44476048022638, "tissue_hires_scalef": 0.17011142, "fiducial_diameter_fullres": 144.48769000651953, "tissue_lowres_scalef": 0.05

PS:这部有点像旧流程的ST_spot_detector的步骤了

其中:

detected_tissue_image.jpg:


image.png

tissue_positions_list.txt:

$head -2 tissue_positions_list.csv
ACGCCTGACACGCGCT-1,0,0,0,1252,1211
TACCGATCCAACACTT-1,0,1,1,1372,1280

其中列对应着:

7. BAM:Barcoded BAM

$cd outs/
$samtools view possorted_genome_bam.bam |head -5
A00984:21:HMKLFDMXX:2:2117:10357:1235   16  1   3000100 255 25M199730N72M23S    *   0   0   TTTTTTTTTTTTTTTTTTTTTTTTGCAAGAAAAAAAATCAGATAACCGAGGAAAATTATTCATTATGAAGTACTACTTTCCACTTCATTTCATCCCATGTACTCTGCGTTGATACCACTG    F:FFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFF:FFFFFFFF    NH:i:1  HI:i:1  AS:i:83 nM:i:1  RE:A:I  xf:i:0  ts:i:21 li:i:0  BC:Z:ACCAGACAAC QT:Z:FFFFFFFFFF CR:Z:GACGACGATCCGCGTT   CY:Z:FFFFFFFFFFFFFFFF   CB:Z:GACGACGATCCGCGTT-1 UR:Z:CCTGTTTGTTGT   UY:Z:FFFFFFFFFFFF   UB:Z:CCTGTTTGTTGT   RG:Z:V1_Adult_Mouse_Brain:0:1:HMKLFDMXX:2
A00984:21:HMKLFDMXX:1:1306:5041:10034   16  1   3000100 255 25M199611N95M   *   0   0   TTTTTTTTTTTTTTTTTTTTTTTTGAAATGACCACAGTGTACTTTATTTAATGATTTTTGTACTTTGTGTTGCAATAAAATAAAAAAAAAATCTACAAAATTCAAATATATAAAATTTCA    FFFF:FFFFFFF:FFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFF:FFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFF    NH:i:1  HI:i:1  AS:i:108    nM:i:0  RE:A:I  xf:i:0  li:i:0  BC:Z:ACCAGACAAC QT:Z:FFFFFFFFFF CR:Z:TGGTCTGTTGGGCGTA   CY:Z:FFFFFFFFFFFFFFFF   CB:Z:TGGTCTGTTGGGCGTA-1 UR:Z:GTTACCCTATGT   UY:Z:FFFFFFFFFFFF   UB:Z:GTTACCCTATGT   RG:Z:V1_Adult_Mouse_Brain:0:1:HMKLFDMXX:1
A00984:21:HMKLFDMXX:2:2345:21206:5087   16  1   3010019 255 98M22S  *   0   0   ATAGTGTCCCAGATTTCCTGGCTGTTTCTTGTTAGGATTTTTTTAGATTTAACATTTCTGTCATAGATTAATCTATTTTGCAGATGTAATCCCATGTACTCTGCGTTGATACCACTGCTT    F:FFFFFFFFFFF::FFF:FFFFFFFFFFFFFFFFFFFFFFFF,FFFFFFFFFFFFFFFFFFFFFFFF,FFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFF:FFFF:FFFFFF    NH:i:1  HI:i:1  AS:i:90 nM:i:3  RE:A:I  xf:i:0  ts:i:30 li:i:0  BC:Z:ACCAGACAAC QT:Z:FFFFFFFFFF CR:Z:ACGGTCACCGAGACCCY:Z:FFFFFFFFFFFFF,F:   CB:Z:ACGGTCACCGAGAACA-1 UR:Z:TCGATCTCGTAA   UY:Z:FFFFFFFFFFFF   UB:Z:TCGATCTCGTAA   RG:Z:V1_Adult_Mouse_Brain:0:1:HMKLFDMXX:2
A00984:21:HMKLFDMXX:1:1164:15980:17738  16  1   3013014 255 17M186702N103M  *   0   0   TTTTTTTTTTTTTTTGTTTAAAATGACCACAGTGTACTTTATTTAATGATTTTTGTACTTTGTGTTGCAATAAAATAAAAAAAAAATCTACAAAATTCAAATATATAAAATTTCAAGTTT    FFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFF:FFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFF    NH:i:1  HI:i:1  AS:i:108    nM:i:0  RE:A:I  xf:i:0  li:i:0  BC:Z:ACCAGACAAC QT:Z:FFF,FFFFFF CR:Z:TCAAGGTTACTACACC   CY:Z:FFFFFFFFFFF:FFFF   CB:Z:TCAAGGTTACTACACC-1 UR:Z:CCGGGCAGTTAT   UY:Z:FFFFFFFFFFFF   UB:Z:CCGGGCAGTTAT   RG:Z:V1_Adult_Mouse_Brain:0:1:HMKLFDMXX:1
A00984:21:HMKLFDMXX:1:1451:3477:33912   16  1   3013014 255 17M186702N103M  *   0   0   TTTTTTTTTTTTTTTGTTTAAAATGACCACAGTGTACTTTATTTAATGATTTTTGTACTTTGTGTTGCAATAAAATAAAAAAAAAATCTACAAAATTCAAATATATAAAATTTCAAGTTT    FFFFFFFFFFFFFFFF:FF:FFFFFFFFFFFFFFFFFFFFF:FFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFF:FFFFFFFF:FFFFFFFFFFFFFFFFFFFFFFF    NH:i:1  HI:i:1  AS:i:108    nM:i:0  RE:A:I  xf:i:0  li:i:0  BC:Z:ACCAGACAAC QT:Z:FFFFFFFFFF CR:Z:TCAAGGTTACTACACC   CY:Z:FFFFFFFFFFF:F,FF   CB:Z:TCAAGGTTACTACACC-1 UR:Z:CCGGGCAGTTAT   UY:Z:FFFFFFFFFFFF   UB:Z:CCGGGCAGTTAT   RG:Z:V1_Adult_Mouse_Brain:0:1:HMKLFDMXX:1

貌似没看到官网讲的AGAATGGTCTGCAT-1这种spot barcodeCB标签包含带短划线分隔符的后缀,后跟数字的结构啊。。。

进行R的下游分析

由于现在还没有现成的用于10X Visium空间转录组的R包,只好参考官网的R代码

官网地址:https://support.10xgenomics.com/spatial-gene-expression/software/pipelines/latest/rkit

通过Loupe Browser 4.0.0进行下游分析

上一篇下一篇

猜你喜欢

热点阅读