技术专题单细胞分析之Scanpy

单细胞空间转录分析之Scanpy

2021-03-06  本文已影响0人  清竹饮酒

将空间位置信息和转录组分析相结合,对于癌症、免疫、肿瘤免疫相互作用,组织微环境,神经和发育等领域,有着令人期待的应用前景。

单细胞的一切分析,加前缀Spatial 都是一个新的分析点,因此Scanpy 扩展后也可用于空间转录组数据分析。 https://scanpy-tutorials.readthedocs.io/en/latest/spatial/integration-scanorama.html


单细胞转录数据分析之Scanpy:https://www.jianshu.com/p/e22a947e6c60
单细胞转录组之Scanpy - 轨迹推断/拟时序分析:https://www.jianshu.com/p/0b2ca0e0b544
单细胞转录组之Scanpy - 样本整合分析:https://www.jianshu.com/p/beef8a8be360
单细胞空间转录分析之Scanpy:https://www.jianshu.com/p/8dc231c06932
单细胞空间转录分析之Scanpy-多样本整合:https://www.jianshu.com/p/caa98aeac191
单细胞空间转录分析之Seurat:https://www.jianshu.com/p/c9a601ced91f
单细胞空间转录分析之Seurat-多样本整合(浅谈空间批次):https://www.jianshu.com/p/609b04096b79

python 包Scanpy很多函数是借鉴了R包Seurat,所以这两种方法分析结果差异不大,大家可以对照Seurat分析,上面网址也提供了Seurat包处理单细胞空间转录分析过程。

和分析单细胞转录组数据一样,单细胞空间转录组主要包括了:质控(QC),标准化(Normalization),降维聚类(Dimensional reduction and clustering),Cluster marker genes, Spatially variable genes

为了和Seurat结果比较,我们使用了相同的一套数据集:https://support.10xgenomics.com/spatial-gene-expression/datasets/1.1.0/V1_Mouse_Brain_Sagittal_Anterior, 新鲜的冷冻小鼠脑组织, 前牙矢状切面,可以参考前面讲述的ABA大脑图谱:https://www.jianshu.com/p/5d087fffeb35
导入相关包

import scanpy as sc
import pandas as pd
import matplotlib.pyplot as plt
import seaborn as sns
sc.logging.print_versions()
sc.set_figure_params(facecolor="white", figsize=(8, 8))
sc.settings.verbosity = 3
import os ##设置输入路径
os.chdir('/home/wucheng/jianshu/scanpy/spatial') ##修改路径
results_file = 'Anterior.h5ad' ##设置结果文件保存路径

读取数据

adata = sc.read_visium("/home/wucheng/data_set/Spatial/Mouse/Brain_Section1_Sagittal_Anterior/Brain_anterior1/outs")
adata.var_names_make_unique()
adata
AnnData object with n_obs × n_vars = 2695 × 32285
    obs: 'in_tissue', 'array_row', 'array_col'
    var: 'gene_ids', 'feature_types', 'genome'
    uns: 'spatial'
    obsm: 'spatial'

预处理

adata.var["mt"] = adata.var_names.str.match("^MT-|^mt-") #计算线粒体基因比例 ,这儿没有用作者推荐的str.startswith
sc.pp.calculate_qc_metrics(adata, qc_vars=["mt"], inplace=True)
adata.var["RP"] = adata.var_names.str.match("^RPS|^RPL|^Rps|^Rpl") #同样我们也可以计算核糖体基因比例
sc.pp.calculate_qc_metrics(adata, qc_vars=["RP"], inplace=True)

我们根据总counts和表达的genes对spots进行一些基本的过滤:

fig, axs = plt.subplots(1, 4, figsize=(16, 4))
sns.distplot(adata.obs["total_counts"], kde=False,bins=60, ax=axs[0]) #bins柱子的数目
sns.distplot(adata.obs["n_genes_by_counts"], kde=False, bins=60, ax=axs[1])
sns.distplot(adata.obs["pct_counts_mt"], kde=False, bins=60, ax=axs[2])
sns.distplot(adata.obs["pct_counts_RP"], kde=False, bins=60, ax=axs[3])
plt.savefig("QC_plot.pdf")
QC

这儿和Seurat得到的QC小提琴图一样,只是形式不同。

# 进一步细化,用于过滤spots
fig, axs = plt.subplots(1, 4, figsize=(16, 4))
sns.distplot(adata.obs["total_counts"][adata.obs["total_counts"] < 10000], kde=False, bins=40, ax=axs[0]) 
sns.distplot(adata.obs["total_counts"][adata.obs["total_counts"] > 40000], kde=False, bins=40, ax=axs[1]) 
sns.distplot(adata.obs["n_genes_by_counts"][adata.obs["n_genes_by_counts"] < 4000], kde=False, bins=60, ax=axs[2])
sns.distplot(adata.obs["pct_counts_mt"][adata.obs["pct_counts_mt"] >10], kde=False, bins=30, ax=axs[3])
plt.savefig("QC_plot1.pdf")
QC1
过滤 可以看到更加质量评估我们是可以去除一些低质量的细胞的,但是因为Seurat对此数据集未作过滤,我们这儿也不过滤,实际可以按照下面过滤细胞
sc.pp.filter_cells(adata, min_counts=5000) #filtered out 80 cells that have less than 5000 counts
sc.pp.filter_cells(adata, max_counts=50000) #filtered out 39 cells that have more than 50000 counts
adata = adata[adata.obs["pct_counts_mt"] < 25]
print(f"#cells after MT filter: {adata.n_obs}") #cells after MT filter: 2502
sc.pp.filter_genes(adata, min_cells=10)

标准化,HVG

sc.pp.normalize_total(adata, inplace=True)
sc.pp.log1p(adata)
sc.pp.highly_variable_genes(adata, flavor="seurat", n_top_genes=2000)

PCA,聚类,可视化

sc.pp.pca(adata)
sc.pp.neighbors(adata)
sc.tl.umap(adata)
sc.tl.leiden(adata, key_added="clusters")
sc.pl.umap(adata, color=["total_counts",  "clusters"], wspace=0.4)
plt.savefig("umap_cluster.pdf")
sc.pl.spatial(adata, img_key="hires", color=["total_counts", "clusters"])
plt.savefig("spatial_cluster.pdf")
Umap_Cluster
spatial_Cluster
显示每簇位置
sc.pl.spatial(adata, img_key="hires", color="clusters", groups=["0"], alpha=0.5, size=1.3)
plt.savefig("spatial_cluster_sub_C0.pdf")
sc.pl.spatial(adata, img_key="hires", color="clusters", groups=["1"], alpha=0.5, size=1.3)
plt.savefig("spatial_cluster_sub_C1.pdf")
spatial_cluster_sub_C0
关键基因的表达可视化
sc.pl.umap(adata, color=["Hpca", "Ttr"])
plt.savefig("umap_gene_exp.pdf")
sc.pl.spatial(adata, img_key="hires", color=["Hpca", "Ttr"])
plt.savefig("spatial_gene_exp.pdf")
umap_gene_exp spatial_gene_exp.pdf

每一簇marker genes

sc.tl.rank_genes_groups(adata, "clusters", method="t-test")
sc.pl.rank_genes_groups_heatmap(adata, groups="0", n_genes=20, groupby="clusters") #热图可视化
plt.savefig("Cluster0_marker_heatmap.pdf")
sc.pl.spatial(adata, img_key="hires", color=["Ppp1r1b", "Gpr88","Penk"], alpha=0.7)
plt.savefig("Cluster0_spatial_marker.pdf")
#输出每一簇marker genes
pd.DataFrame(adata.uns['rank_genes_groups']['names']).head(5)
result = adata.uns['rank_genes_groups']
groups = result['names'].dtype.names
pd.DataFrame(    {group + '_' + key[:1]: result[key][group]    for group in groups for key in ['names', 'pvals']}).head(5)
res = pd.DataFrame(    {group + '_' + key: result[key][group]    for group in groups for key in ['names', 'pvals','logfoldchanges','pvals_adj','scores']})
res.to_csv("dif_markers.csv") #输出保存
res
       0_names        0_pvals  0_logfoldchanges    0_pvals_adj   0_scores  1_names        1_pvals  ...   16_pvals_adj   16_scores  17_names       17_pvals  17_logfoldchanges   17_pvals_adj  17_scores
0      Ppp1r1b   0.000000e+00          3.872787   0.000000e+00  87.917786  Camk2n1   0.000000e+00  ...   8.478769e-43  100.765488   Neurod6   9.957260e-22           4.388139   4.848720e-20  30.808397
1        Gpr88   0.000000e+00          4.403049   0.000000e+00  85.114883     Nrgn   0.000000e+00  ...   8.057953e-47   90.942802    Shisa6   3.514023e-18           4.730407   1.345792e-16  23.194458
2         Penk   0.000000e+00          4.161317   0.000000e+00  81.762627      Cck  5.144804e-319  ...   3.267212e-36   61.300213    Kcnab2   5.501745e-19           1.907810   2.231455e-17  21.838722
3        Pde1b   0.000000e+00          3.090780   0.000000e+00  79.073380    Nptxr  4.326942e-262  ...   3.116630e-38   54.045643     Nell2   1.468953e-18           2.391361   5.804790e-17  21.818993
4       Pde10a   0.000000e+00          3.938498   0.000000e+00  77.363052   Camk2a  1.998454e-264  ...   3.185207e-28   42.650623   Slc17a7   6.004060e-21           2.086003   2.793099e-19  21.763395
...        ...            ...               ...            ...        ...      ...            ...  ...            ...         ...       ...            ...                ...            ...        ...
32280  Slc17a7  5.392274e-138         -2.638275  3.108743e-135 -31.638594   Pcp4l1  4.422776e-130  ...  1.257510e-237  -36.938381     Htr1b  7.360363e-161         -28.121906  4.752586e-157 -28.987629
32281      Cck  2.454884e-151         -2.442926  1.617468e-148 -33.223202      Mal  3.544791e-138  ...  7.927483e-259  -38.930893    Shisa8  5.588661e-167         -28.738472  4.510748e-163 -29.622454
32282    Stmn1  1.575607e-143         -1.639276  9.782401e-141 -34.924587     Plp1  8.282063e-151  ...  5.406098e-260  -39.055321      Dlx6  8.146516e-176         -28.056419  8.767009e-172 -30.527283
32283    Olfm1  1.527035e-160         -2.071354  1.173817e-157 -35.563046     Mobp  1.650645e-178  ...  1.075188e-280  -40.989178    Cdkn1a  6.932729e-189         -28.244112  1.119116e-184 -31.842714
32284     Nrn1  6.550518e-207         -3.461612  9.194933e-204 -37.689751      Mbp  1.564585e-190  ...   5.941411e-33  -46.059914      Rxrg  4.265618e-220         -29.022448  1.377155e-215 -34.893322

[32285 rows x 90 columns]
Cluster0_marker_heatmap
Cluster0_spatial_marker

空间特异性genes

import SpatialDE #导入包pip install spatialde
counts = pd.DataFrame(adata.X.todense(), columns=adata.var_names, index=adata.obs_names)
coord = pd.DataFrame(adata.obsm['spatial'], columns=['x_coord', 'y_coord'], index=adata.obs_names)
results = SpatialDE.run(coord, counts)
results.index = results["g"]
adata.var = pd.concat([adata.var, results.loc[adata.var.index.values, :]], axis=1)
results.sort_values("qval").head(10)
             FSV  M       g           l  max_delta       max_ll  max_mu_hat  max_s2_t_hat model     n    s2_FSV  s2_logdelta      time          BIC  max_ll_null         LLR  pval  qval
g                                                                                                                                                                                       
Dgkb    0.443220  4    Dgkb  456.331505   1.223975 -2400.272142    0.960307      0.261535    SE  2695  0.000034     0.000653  0.006909  4832.140898 -3068.090390  667.818248   0.0   0.0
Dlg4    0.250555  4    Dlg4  456.331505   2.914374 -2323.810810    1.374468      0.131051    SE  2695  0.000026     0.000777  0.007951  4679.218235 -2650.952950  327.142139   0.0   0.0
Eif5a   0.183140  4   Eif5a  456.331505   4.345842 -1760.856838    2.325085      0.125122    SE  2695  0.000025     0.001113  0.008964  3553.310290 -1946.308672  185.451834   0.0   0.0
Nlgn2   0.157334  4   Nlgn2  456.331505   5.218447 -2322.206831    1.041536      0.074196    SE  2695  0.000024     0.001289  0.009007  4676.010275 -2486.621146  164.414315   0.0   0.0
Atp1b2  0.209184  4  Atp1b2  456.331505   3.683460 -2412.570465    1.691318      0.129979    SE  2695  0.000022     0.000824  0.008972  4856.737545 -2670.858217  258.287751   0.0   0.0
Efnb3   0.298024  4   Efnb3  456.331505   2.294977 -2520.276506    0.884368      0.160919    SE  2695  0.000022     0.000557  0.007954  5072.149626 -3041.920377  521.643871   0.0   0.0
Naa38   0.119160  4   Naa38  456.331505   7.202339 -2284.186889    1.567088      0.075790    SE  2695  0.000027     0.002230  0.007951  4599.970392 -2377.750298   93.563409   0.0   0.0
Apcdd1  0.409983  4  Apcdd1  858.638612   1.321026 -1339.982577    0.374860      0.112883    SE  2695  0.000240     0.004741  0.006995  2711.561768 -1548.322641  208.340064   0.0   0.0
Rnf227  0.281065  4  Rnf227  456.331505   2.492253 -2386.449172    1.251820      0.148585    SE  2695  0.000029     0.000776  0.007941  4804.494957 -2690.556304  304.107132   0.0   0.0
Kcnab3  0.361125  4  Kcnab3  456.331505   1.723719 -2047.492804    0.551128      0.142294    SE  2695  0.000023     0.000479  0.007971  4126.582221 -2618.187562  570.694759   0.0   0.0
res = results.sort_values("qval")
res.to_csv("dif_spatial.csv") #输出保存
sc.pl.spatial(adata, img_key="hires", color=["Dgkb", "Dlg4","Eif5a"], alpha=0.7)
plt.savefig("spatial_Var_feature.pdf")
spatial_Var_feature

保存数据

adata.write(results_file)

我们可以发现与Seurat相比,分类结果还是有差异,不过大的区域识别两种方法都没有什么问题。

上一篇下一篇

猜你喜欢

热点阅读