scrna
从cellrange开始,修改文件名为适用格式
scrna-filename.txt里是文件名
命令cat scrna-filename.txt | while read i ;do (mv ${i}_f1*.gz ${i}_S1_L001_R1_001.fastq.gz;mv ${i}_r2*.gz ${i}_S1_L001_R2_001.fastq.gz);done
会报错:
mv: cannot stat 'HRR573093'$'\r''_r2*.gz': No such file or directory
dos2unix是将Windows格式文件转换为Unix、Linux格式的实用命令。Windows格式文件的换行符为\r\n ,而Unix&Linux文件的换行符为\n. dos2unix命令其实就是将文件中的\r\n 转换为\n。
先下载一个dos2unix
dos2unix -o scrna-filename.txt scrna-filename.txt
(此参数新文件覆盖了源文件)
再运行以上cat命令无报错,可以看到文件名被成功修改:
原先的文件名格式:
接下来是质量控制,这一步跳过了,应该没问题。
接下来用cellranger的count,这个过程是最重要的,它完成细胞与基因的定量,它将比对、质控、定量都包装了起来
这是一个样本示例:
cellranger count --id=HRR57572950 --transcriptome=refdata-cellranger-GRCh38-1.2.0 --fastqs=/data1/肝癌单细胞GSA数据-HCC --sample=HRR572950
要下载注释文件,因为count的时候用到的是refdata
会报错找不到这个文件,就自己点链接下载再上传。
好消息:能跑了
坏消息:又没空间了
删除了非HCC的数据
真神奇啊,上午还不用“./”就能运行的,下午就必须加了
加上./好像在运行,然后显示没有构建索引,但是构建过的....
[error] Your reference doesn't appear to be indexed. Please runthe mkreference tool
2023-06-01 08:11:06 Shutting down.Saving pipestance info to "HRR572950/HRR572950.mri.tgz'For assistance upload this file to 10x Genomics by running:
cellranger upload <your email>"HRR572950/HRR572950.mri.tgz'
然后构建索引的命令很玄学的不正确
error: The subcommand 'mkref --genome=GRCh38' wasn't recognized
Did you mean 'mkref'?
If you believe you received this message in error, try re-running with 'cellranger -- mkref --genome=GRCh38
我在github上的提问:https://github.com/10XGenomics/cellranger/issues/217
用命令cellranger mkref --genome=GRCh38 --fasta=Homo_sapiens.GRCh38.dna.primary_assembly.fa --genes=Homo_sapiens.GRCh38.84.filtered.gtf
Reference successfully created,但是依然是“Your reference doesn't appear to be indexed. please runthe mkreference tool”
解决了!在闲鱼上找人,发现count的代码不对,可以用自带的基因组(refdata-gex-GRCh38-2020-A),代码如下
cellranger count --id=HRR572950 --transcriptome=refdata-gex-GRCh38-2020-A --fastqs=/data1/liver-cancer-GSA-HCC --sample=HRR572950
批量执行:
#批量执行cellranger
def cellranger():
import os
for i in range(572951,572951):
x = "HRR" + str(i)
cmd_string = "cellranger count --id="+x+" --transcriptome=refdata-gex-GRCh38-2020-A --fastqs=/data1/liver-cancer-GSA-HCC --sample="+x
print('x:{}'.format(cmd_string))
print(os.popen(cmd_string).read())
cellranger()
aggr,整合样本
50_76_libraries.csv:
sample_id,molecule_h5
HRR572950,/data1/liver-cancer-GSA-HCC/cellranger-7.1.0/HRR572950/outs/molecule_info.h5
HRR572951,/data1/liver-cancer-GSA-HCC/cellranger-7.1.0/HRR572951/outs/molecule_info.h5
HRR572952,/data1/liver-cancer-GSA-HCC/cellranger-7.1.0/HRR572952/outs/molecule_info.h5
HRR572954,/data1/liver-cancer-GSA-HCC/cellranger-7.1.0/HRR572954/outs/molecule_info.h5
HRR572955,/data1/liver-cancer-GSA-HCC/cellranger-7.1.0/HRR572955/outs/molecule_info.h5
HRR572956,/data1/liver-cancer-GSA-HCC/cellranger-7.1.0/HRR572956/outs/molecule_info.h5
HRR572962,/data1/liver-cancer-GSA-HCC/cellranger-7.1.0/HRR572962/outs/molecule_info.h5
HRR572964,/data1/liver-cancer-GSA-HCC/cellranger-7.1.0/HRR572964/outs/molecule_info.h5
HRR572965,/data1/liver-cancer-GSA-HCC/cellranger-7.1.0/HRR572965/outs/molecule_info.h5
HRR572966,/data1/liver-cancer-GSA-HCC/cellranger-7.1.0/HRR572966/outs/molecule_info.h5
HRR572967,/data1/liver-cancer-GSA-HCC/cellranger-7.1.0/HRR572967/outs/molecule_info.h5
HRR572968,/data1/liver-cancer-GSA-HCC/cellranger-7.1.0/HRR572968/outs/molecule_info.h5
HRR572969,/data1/liver-cancer-GSA-HCC/cellranger-7.1.0/HRR572969/outs/molecule_info.h5
HRR572972,/data1/liver-cancer-GSA-HCC/cellranger-7.1.0/HRR572972/outs/molecule_info.h5
HRR572973,/data1/liver-cancer-GSA-HCC/cellranger-7.1.0/HRR572973/outs/molecule_info.h5
HRR572974,/data1/liver-cancer-GSA-HCC/cellranger-7.1.0/HRR572974/outs/molecule_info.h5
HRR572976,/data1/liver-cancer-GSA-HCC/cellranger-7.1.0/HRR572976/outs/molecule_info.h5
命令:
cellranger aggr --id=5076 --csv=./50_76_libraries.csv --normalize=mapped