记录一次单细胞分析流程
【对于3’端文件】
一、Cellranger count
修改Fastq文件名,同一样本的不同Fastq文件,前缀相同,修改L00X
cellranger count批量处理脚本
#! /bin/bash
#mouse single cell data analysis
genomedir=/home/user/myh/ref_data/refdata-gex-mm10-2020-A
#小鼠参考基因目录
datadir=/home/user/myh/raw_data/AEKIscRNAseq/3_X101SC22052037-Z01-J022-B22-1_10X_release_20220717/Cleandata
#Fastq文件所在目录
sample="AEKI_1_3 AEKI_2_3 Con_1_3 Con_2_3"
#sample名
date
for s in $sample
do
date
cellranger count --id=${s}_cellranger_out --fastqs=$datadir --sample=$s --transcriptome=$genomedir --nosecondary
date
wait
done
exit
二、Velocyto:将bam文件转为loom文件(方便后续用scVelo)
velocyto
includes a shortcut to run the counting directly on one or more cellranger output folders (e.g. this is the folder containing the subfolder: outs
, outs/analys
and outs/filtered_gene_bc_matrices
).
velocyto总是报错:MemoryError: bam file #0 could not be sorted by cells. This is probably related to an old version of samtools, please install samtools >= 1.6. In alternative this could be a memory error, try to set the --samtools_memory option to a value compatible with your system. Otherwise sort manually by samtools sort -l [compression] -m [mb_to_use]M -t [tagname] -O BAM -@ [threads_to_use] -o cellsorted_[bamfile] [bamfile]
处理办法:先用samtools sort,再用velocyto
samtools sort -t CB -O BAM -o /home/user/myh/raw_data/AEKIscRNAseq/3_X101SC22052037-Z01-J022-B22-1_10X_release_20220717/Cellranger_out/AEKI_2_3_cellranger_out/outs/cellsorted_possorted_genome_bam.bam /home/user/myh/raw_data/AEKIscRNAseq/3_X101SC22052037-Z01-J022-B22-1_10X_release_20220717/Cellranger_out/AEKI_2_3_cellranger_out/outs/possorted_genome_bam.bam
velocyto run10x -m /home/user/myh/ref_data/mm10_allTracks.gtf /home/user/myh/raw_data/AEKIscRNAseq/3_X101SC22052037-Z01-J022-B22-1_10X_release_20220717/Cellranger_out/AEKI_2_3_cellranger_out /home/user/myh/ref_data/refdata-gex-mm10-2020-A/genes/genes.gtf
-----------------------------------------------------------------------
【对于5’+TCR文件】
一、Cellranger multi
修改Fastq文件名,区分
nano multi_config.csv
修改其中的内容:
[gene-expression]
reference,/home/user/myh/ref_data/refdata-gex-mm10-2020-A
expect-cells,1000
[vdj]
reference,/home/user/myh/ref_data/refdata-cellranger-vdj-GRCm38-alts-ensembl-5.0.0
[libraries]
fastq_id,fastqs,lanes,feature_types,subsample_rate
AEKI_1_5,/home/user/myh/raw_data/AEKIscRNAseq/5_X101SC22052039-Z01-J048-B48-1_10X_release_20220717/Cleandata/AEKI_1_5,1|2,gene expression,
AEKI_1_TCR,/home/user/myh/raw_data/AEKIscRNAseq/5TCR_X101SC22052039-Z01-J049-B49-1_10X_release_20220716/Cleandata/AEKI_1_TCR,1|2,VDJ-T,
特别小心!multi_config.csv中的内容一定小心有没有换行,有时候复制到终端它的格式就乱了!!!
把四个样本用screen都跑上(笨办法)
对于cellranger multi 的 loom文件转换
velocyto run -b /home/user/myh/raw_data/AEKIscRNAseq/5_TCR_outs/AEKI_1/outs/per_sample_outs/AEKI_1/count/sample_feature_bc_matrix/barcodes.tsv.gz \
-o /home/user/myh/raw_data/AEKIscRNAseq/5_TCR_outs/AEKI_1/outs/per_sample_outs/AEKI_1/count/loom \
-m /home/user/myh/ref_data/mm10_allTracks.gtf \
/home/user/myh/raw_data/AEKIscRNAseq/5_TCR_outs/AEKI_1/outs/per_sample_outs/AEKI_1/count/sample_alignments.bam \
/home/user/myh/ref_data/refdata-gex-mm10-2020-A/genes/genes.gtf