空间细胞邻域网络图
作者,Evil Genius~~~
今天不想看文章了,我们来画一画图吧。
单细胞的我们来画一画下面的图
Transcription factor network in the pancreatic epithelial cell population
空间的部分我们来画一画邻域网络图
Neighborhood graph representing neighborhood enrichment of cell types. Edges
represent average neighborhood enrichment scores (observed‑to‑expected ratio) between the cell types, and only the bidirectional enrichments
were depicted in this graph as edges. Dot sizes are proportional to the estimated abundances (log scale), and the colors represent average cancer
cell abundances in each cell type’s neighborhood
注意空间图的注释
Neighborhood graph representing neighborhood enrichment of cell types. Edges
represent average neighborhood enrichment scores (observed‑to‑expected ratio) between the cell types, and only the bidirectional enrichments were depicted in this graph as edges. Dot sizes are proportional to the estimated abundances (log scale), and the colors represent average cancer cell abundances in each cell type’s neighborhood.
当然了,有AI的部分在里面,我们需要绘制主图
我们先来画单细胞的,比较简单
import sceleto as scl
import sys
import scanpy as sc
import matplotlib.pyplot as plt
sys.path.append('/home/user1/python_package_jp')
import scjp
import scipy
import numpy as np
import pickle as pkl
def pklrd(filename):
with open(filename, "rb") as file:
data = pkl.load(file)
return data
csdict=pklrd('/home/srkim/02_PancCa/16_FIGURE/csdict.pkl')
def size2(a,b):
plt.rcParams['figure.figsize'] = (a,b)
def size(a,b):
plt.gcf().set_size_inches(a,b)
def pklrd(filename):
with open(filename, "rb") as file:
data = pkl.load(file)
return data
def pklwr(data, filename):
with open(filename, "wb") as file:
pkl.dump(data, file)
def save(name):
scjp.save_fig(version,'Fig_'+name,fig_folder='./Fig')
plt.rcParams['figure.dpi'] = 300
# printing version
## from GSE154778, Lin, 2020
nb_name = 'test'
version =02
Name = nb_name
print('Version: %s'%(version))
print('NB_name: %s'%(Name))
plt.rcParams['figure.dpi'] = 300
读取示例数据,anndata
adata = sc.read_h5ad('/home/srkim/02_PancCa/16_FIGURE/data/PCI01.v43.canrawt_final_02.h5ad')
adata
AnnData object with n_obs × n_vars = 11536 × 2536
obs: 'Sample', 'n_counts', 'n_genes', 'mito', 'doublet_scores', 'predicted_doublets', 'n_genes_by_counts', 'log1p_n_genes_by_counts', 'total_counts', 'log1p_total_counts', 'pct_counts_in_top_50_genes', 'pct_counts_in_top_100_genes', 'pct_counts_in_top_200_genes', 'pct_counts_in_top_500_genes', 'name', 'status', 'assignment', 'genotype', 'KRASg', 'pathol', 'leiden', 'leiden_0.50', 'anno', 'anno_fib', 'anno_final_temp', 'anno_can', 'anno_final', 'ID', 'cID', 'pID', 'copykat.pred', 'anno_predict', 'batch', 'leiden_2.00', 'Fb_COL9A1', 'Fb_ANXA8', 'Fb_Base', 'Fb_LRRC15', 'Fb_SFRP1', 'Fb_VIT', 'Stellate', 'confused'
var: 'means', 'dispersions', 'dispersions_norm', 'mean', 'std'
uns: 'anno_final_colors', 'genotype_colors', 'neighbors', 'pathol_colors', 'pca', 'umap'
obsm: 'X_pca', 'X_umap'
varm: 'PCs'
obsp: 'connectivities', 'distances'
画一个UMAP图
scjp.us(adata, 'anno_final')
数据处理
adata = adata[~adata.obs.anno_final.isin(['Ep_CDK1','Ep_Base'])].copy()
adata = adata.raw.to_adata()
scjp.sc_process(adata, 'fspku')
sc.external.pp.harmony_integrate(adata, 'ID', adjusted_basis='X_pca')
scjp.sc_process(adata, 'ku')
scjp.bbknn_umap(adata, 'ID', 50)
构建网络
N = scjp.network.network(adata, n_neighbor=8)
N.get_grid(select_per_grid=4)
N.impute()
N.idata
##AnnData object with n_obs × n_vars = 9148 × 33538
# obs: 'Sample', 'n_counts', 'n_genes', 'mito', 'doublet_scores', 'predicted_doublets', 'n_genes_by_counts', 'log1p_n_genes_by_counts', 'total_counts', 'log1p_total_counts', 'pct_counts_in_top_50_genes', 'pct_counts_in_top_100_genes', 'pct_counts_in_top_200_genes', 'pct_counts_in_top_500_genes', 'name', 'status', 'assignment', 'genotype', 'KRASg', 'pathol', 'leiden', 'leiden_0.50', 'anno', 'anno_fib', 'anno_final_temp', 'anno_can', 'anno_final', 'ID', 'cID', 'pID', 'copykat.pred', 'anno_predict', 'batch', 'leiden_2.00', 'Fb_COL9A1', 'Fb_ANXA8', 'Fb_Base', 'Fb_LRRC15', 'Fb_SFRP1', 'Fb_VIT', 'Stellate', 'confused'
# uns: 'anno_final_colors', 'genotype_colors', 'neighbors', 'pathol_colors', 'pca', 'umap'
# obsm: 'X_pca', 'X_umap'
tfdata = scjp.network.new_exp_matrix(N.bdata,N.idata,N.select,
tflist=scjp.tf_genes,max_cutoff=0.2,ratio_expressed=0.01,
min_disp=.75, n_min_exp_cell = 200,calc_var=True,
example_gene='CREB3L1')
计算邻域
C = scipy.sparse.csr_matrix(np.corrcoef(tfdata.X.todense()))
tfdata.uns['neighbors'] = {}
tfdata.uns['neighbors']['connectivities'] = C
tfdata.uns['neighbors']['params'] = {'n_neighbors': 15, 'method': 'umap', 'metric': 'cosine'}
scjp.network.generate_gene_network(tfdata)
N.tfdata = tfdata
N.annotation('anno_final')
N.draw_network('anno_final', axis='X_draw_graph_fa')
N.draw_network('anno_final', axis='X_draw_graph_fr')
N.draw_network('anno_final', axis='X_draw_graph_kk')
大家可以把函数封装起来
anno_key = 'anno_final'
anno_uniq = N.anno_dict[anno_key][0]
anno_ratio = N.anno_dict[anno_key][1]
def draw_graph1(tfdata, anno_key, anno_uniq, anno_ratio, factor0=100.0, adjust=False,
text_fontsize=8,z_score_cut=0.4,color_dict=None, axis='X_draw_graph_kk'):
import seaborn as sns
palette1 = ["#023fa5", "#7d87b9", "#bec1d4", "#d6bcc0", "#bb7784", "#8e063b", "#4a6fe3",
"#8595e1", "#b5bbe3", "#e6afb9", "#e07b91", "#d33f6a", "#11c638", "#8dd593",
"#c6dec7", "#ead3c6", "#f0b98d", "#ef9708", "#0fcfc0", "#9cded6", "#d5eae7",
"#f3e1eb", "#f6c4e1", "#f79cd4",
'#7f7f7f', "#c7c7c7", "#1CE6FF", "#336600"]
palette2 = [
'#1f77b4', '#ff7f0e', '#2ca02c', '#d62728',
'#9467bd', '#8c564b', '#e377c2', # '#7f7f7f' removed grey
'#bcbd22', '#17becf',
'#aec7e8', '#ffbb78', '#98df8a', '#ff9896',
'#c5b0d5', '#c49c94', '#f7b6d2', # '#c7c7c7' removed grey
'#dbdb8d', '#9edae5',
'#ad494a', '#8c6d31'] # manual additions
if len(anno_uniq)> 28:
palette = godsnot_64
elif len(anno_uniq)> 20:
palette = palette1
else:
palette = palette2
if color_dict:
palette = [color_dict[x] for x in anno_uniq]
palette = palette
Cr = np.corrcoef(np.vstack([tfdata.X.todense(),anno_ratio]))
Cr_an = Cr[len(tfdata):,:len(tfdata)]
color_anno = np.argmax(Cr_an,axis=0)
C = tfdata.uns['neighbors']['connectivities'].todense()
#C = Cr[:len(tfdata),:len(tfdata)]
pairs = np.where(C>0.3)
from sklearn import preprocessing
Cr_scaled = preprocessing.scale(Cr_an)
Cr_scaled = Cr_an
#dot_size0 = np.choose(color_anno,Cr_scaled)
dot_size0 = np.array([Cr_scaled[j,i] for i,j in enumerate(color_anno)])
colors = np.array([palette[i] if s >z_score_cut else 'gray' for i,s in zip(color_anno,dot_size0)])
from adjustText import adjust_text
show = True
gamma = 2
dot_size = dot_size0**(gamma)
x = tfdata.obsm[axis][:,0]
y = tfdata.obsm[axis][:,1]
n = list(tfdata.obs_names)
plt.figure(figsize=(8,8))
plt.scatter(x,y,c=colors,s=factor0*dot_size) #50*size
# draw pair to pair lines
for p1, p2 in zip(pairs[0],pairs[1]):
plt.plot([x[p1],x[p2]],[y[p1],y[p2]],c='lightgrey',alpha=0.3,zorder=-1,
linewidth=2*C[p1,p2]**gamma)
# draw_label
for i, label in enumerate(anno_uniq):
plt.scatter(0,0,s=0,label=label,
c=palette[i],zorder=1,alpha=1.0,
linewidth=0)
# add_names
if show != False:
texts = []
for i, txt in enumerate(n):
if txt in anno_uniq:
texts.append(plt.text(x[i],y[i],txt,fontsize=text_fontsize*1.2,color=color_dict[txt]))
else:
texts.append(plt.text(x[i],y[i],txt,fontsize=text_fontsize))
lgnd = plt.legend(bbox_to_anchor=(0.5,0), scatterpoints=1, markerscale=5, handletextpad=0.01, fontsize=9, loc='upper center', ncol=1, columnspacing=0.01)
for handle in lgnd.legendHandles:
handle.set_sizes([30.0])
handle.set_alpha(1)
plt.xticks([], [])
plt.yticks([], [])
plt.grid(False)
if adjust == True:
adjust_text(texts,only_move={'text':'y'})
然后调用函数即可
draw_graph1(N.tfdata, anno_key, anno_uniq, anno_ratio, color_dict=csdict,
factor0=200, z_score_cut=0.4, axis='X_draw_graph_kk', text_fontsize=9, adjust=True)
save('network_pysc')
那么第二部分,我们要绘制空间邻域网络图
这个地方大家首先要完成单细胞空间的联合分析
import sceleto as scl
import sys
import scanpy as sc
import matplotlib.pyplot as plt
sys.path.append('/home/user1/python_package_jp')
import scjp
import scipy
import numpy as np
import pickle as pkl
def pklrd(filename):
with open(filename, "rb") as file:
data = pkl.load(file)
return data
csdict=pklrd('/home/srkim/02_PancCa/16_FIGURE/csdict.pkl')
def size2(a,b):
plt.rcParams['figure.figsize'] = (a,b)
def size(a,b):
plt.gcf().set_size_inches(a,b)
def pklrd(filename):
with open(filename, "rb") as file:
data = pkl.load(file)
return data
def pklwr(data, filename):
with open(filename, "wb") as file:
pkl.dump(data, file)
def save(name):
scjp.save_fig(version,'Fig_'+name,fig_folder='./Fig')
plt.rcParams['figure.dpi'] = 300
# printing version
## from GSE154778, Lin, 2020
nb_name = 'test'
version =02
Name = nb_name
print('Version: %s'%(version))
print('NB_name: %s'%(Name))
plt.rcParams['figure.dpi'] = 300
读取示例数据,anndata,包含了单细胞空间的联合分析
adata = sc.read_h5ad('test.spatial.h5ad')
看一下计算网络的方法
Neighborhood enrichment analysis and graph construction
We first identified the “high spots” for each cell type using abundance profiles projected by Cell2location (spots with 5% quantile abundances > 3 were defined as the high spots). For each high spot, the abundance profiles of neighboring spots (the spots up to the third most proximal spots) were summed. Then, we compared the observed abundance profiles of neighboring spots with the expected abundance profiles, which were calculated
by multiplying the number of neighboring spots by the average abundance profiles across all spots in the spatial transcriptomic data. The observed-to-expected abundance profile ratio was defined as the enrichment profile. The cell–cell pairs with mutual enrichment (observed-to expected ratio > 1) alone were counted in the neighborhood graph.
这么看就简单了
首先确定“high spot”,也就是我们想要的目标细胞类型所在的spot
然后计算“high spot”的邻域
关于邻域的R版本之前分享过,文章在10X空间转录组数据分析之细胞的空间依赖性
拿到如下的结果,每种细胞的空间邻域富集:
首先我们要拿到单细胞空间联合分析的结果,这个部分大家用哪个软件自己决定,推荐cell2location 和RCTD,这里RCTD联合分析作为示例,整理结果如下:
这里的first_type,second_type是指对应的单细胞的cluster,celltype1,celltype2是对应的细胞类型,这里我选择了丰度前二的细胞类型,大家根据需求可以选择前三、前四,x,y是坐标,prop1,prop2是细胞类型的比例。
其中celltype1所在的spot,就需要我们指定“high spot”
来吧, python实现一下,读取RCTD的注释信息,构建邻域:
df_processed = pd.read_csv('RCTD.annovar.csv', header=0)
# All categorzied files (index_, matchComb_, neiCombUnique_, prop_ .csv) are saved in the "categorized_data folder" in the root directory.
CellNeighborEX.categorization.generate_input_files(data_type = "NGS", df = df_processed, sample_size=30, min_sample_size=1)