使用vdjtools进行免疫组库分析

2021-12-22 本文已影响0人不会生信

mixcr与vdjtools是基于java平台开发的处理从原始序列到定量克隆型的大量免疫组数据的免疫分析软件，在使用前要确保java环境是ok的。
官网下载 Java Runtime Environment，jre是java的运行环境。

java -version #检查java环境是否ok

下载vdjtools并安装，latest release。
vdjtools的可视化依赖于R的一些可视化包，安装所需要的R包。

使用vjtools自带命令安装

java -jar /path to vdjtools/vdjtools-1.2.1.jar Rinstall

也可以在R中手动安装

将分析好的数据转换为vdjtools可识别的格式，上游分析参考使用mixcr构建免疫组库及下游分析

构建分组文件
分组文件应包含所有样本名以及样本所在位置。

metada.txt

# convert 
java -jar /path to vdjtools/vdjtools-1.2.1.jar Convert -S mixcr -m metadata.txt output_prefix
#or
java -jar /path to vdjtools/vdjtools-1.2.1.jar Convert -S mixcr sample1.txt sample2.txt ...  output_prefix
# /path to vdjtools/:  vdjtolls的安装路径
#output_prefix:　输出路径

转换完后的表格

转换结果

1.Basic analysis

1.1 CalcBasicStats

This routine computes a set of basic sample statistics, such as read counts, number of clonotypes, etc.

java -jar /path to vdjtools/vdjtools-1.2.1.jar CalcBasicStats sample1.txt sample2.txt ... output_prefix
#or
java -jar /path to vdjtools/vdjtools-1.2.1.jar CalcBasicStats -m metadata.txt output_prefix
# /path to vdjtools/:  vdjtolls的安装路径
#output_prefix:　输出路径

all.basicstats.txt

Tabular output

The following table with .basicstats.txt suffix is generated,

Column	Description
sample_id	Sample unique identifier
…	Metadata columns. See Metadata section
count	Number of reads in a given sample
diversity	Number of clonotypes in a given sample
mean_frequency	Mean clonotype frequency
geomean_frequency	Geometric mean of clonotype frequency
nc_diversity	Number of non-coding clonotypes
nc_frequency	Frequency of reads that belong to non-coding clonotypes
mean_cdr3nt_length	Mean length of CDR3 nucleotide sequence. Weighted by clonotype frequency
mean_insert_size	Mean number of inserted random nucleotides in CDR3 sequence. Characterizes V-J insert for receptor chains without D segment, or a sum of V-D and D-J insert sizes
mean_ndn_size	Mean number of nucleotides that lie between V and J segment sequences in CDR3
convergence	Mean number of unique CDR3 nucleotide sequences that code for the same CDR3 amino acid sequence

1.2 CalcSegmentUsage

This routine computes Variable (V) and Joining (J) segment usage vectors.

java -jar /path to vdjtools/vdjtools-1.2.1.jar CalcSegmentUsage -p -f "disease_state" -m metadata.txt ./results/desease_state
java -jar /path to vdjtools/vdjtools-1.2.1.jar CalcSegmentUsage -p -f "Sex" -m metadata.txt ./results/Sex
#-p : 画图，依赖于R包
#-f  : 指定分组依据,分组信息在metadata文件中
#--plot-type png 输出png图片

output

disease_state.segments.wt.V

1.3 CalcSpectratype

Calculates spectratype, that is, histogram of read counts by CDR3 nucleotide length.

java -jar /path to vdjtools/vdjtools-1.2.1.jar CalcSpectratype -a -m metadata.txt output_prefix
#-a :Will use CDR3 amino acid sequences for calculation instead of nucleotide ones

output
aa：CDR3的氨基酸序列长度的频率分布
insert: CDR3序列中V-J/V-D/D-J插入的核苷酸序列长度的频率分布
ndn:CDR3序列中V和J片段中间的核苷酸序列长度的频率分布

1.4 PlotFancySpectratype

Plots a spectratype that also displays CDR3 lengths for top N clonotypes in a given sample.This plot allows to detect the highly-expanded clonotypes.

java -jar /path to vdjtools/vdjtools-1.2.1.jar PlotFancySpectratype -t 5 sample1.txt output_prefix
#-t:Number of top clonotypes to visualize. Should not exceed 20, default is 10
#单一样本

fancyspectra

1.5 PlotFancyVJUsage

Plots a circos-style V-J usage plot displaying the frequency of various V-J junctions.

java -jar /path to vdjtools/vdjtools-1.2.1.jar PlotFancyVJUsage sample.txt output_prefix
# -u: Instead of counting read frequency, will count the number of unique clonotypes

fancyvj.wt

1.6 PlotSpectratypeV

Plots a detailed spectratype containing additional info displays CDR3 length distribution for clonotypes from top N Variable segment families.This plot is useful to detect type 1 and type 2 repertoire biases, that could arise under pathological conditions.

java -jar /path to vdjtools/vdjtools-1.2.1.jar PlotSpectratypeV sample.txt output_prefix
# -u: Instead of counting read frequency, will count the number of unique clonotypes
# -t: Number of top (by frequency) V segments to visualize. Should notexceed 12 default is 12

spectraV.wt

2.Diversity estimation

2.1 PlotQuantileStats

Plots a three-layer donut chart to visualize the repertoire clonality.

• First layer (“set”) includes the frequency of singleton (“1”, met once), doubleton (“2”, met twice) and highorder(“3+”, met three or more times) clonotypes.
• The second layer (“quantile”), displays the abundance of top 20% (“Q1”), next 20% (“Q2”), ... (up to “Q5”)
clonotypes for clonotypes from “3+” set.
• The last layer (“top”) displays the individual abundances of top N clonotypes.

java -jar /path to vdjtools/vdjtools-1.2.1.jar PlotQuantileStats -t 10 sample.txt output_prefix
#-t:Number of top clonotypes to visualize. Should not exceed 10, default is 5

qstat

2.2 RarefactionPlot

Plots rarefaction curves for specified list of samples, that is, the dependencies between sample diversity and sample size.

java -jar /path to vdjtools/vdjtools-1.2.1.jar RarefactionPlot -m metadata.txt output_prefix
#-f: factor

rarefaction.strict
Solid and dashed lines mark interpolated and extrapolated regions of rarefaction curves respectively,
points mark exact sample size and diversity. Shaded areas mark 95% confidence intervals.
实线和虚线分别表示稀疏曲线的实际和外推区域，点表示实际的样本大小和多样性。阴影区域表示95%置信区间

2.3 CalcDiversityStats

多样性估计，输出两个表格，一个是原始数据的多样性计算，另一个是在原始数据上外推的多样性计算。

java -jar /path to vdjtools/vdjtools-1.2.1.jar CalcDiversityStats -m metadata.txt output_prefix

all.diversity.strict.resampled

3.Repertoire overlap analysis

Clonotype sharing between samples

3.1 OverlapPair

Performs a comprehensive analysis of clonotype sharing for a pair of samples.

java -jar /path to vdjtools/vdjtools-1.2.1.jar OverlapPair -p --plot-area-v2 sample1.txt sample2.txt output_prefix
#-p: plot
#--plot-area-v2:Alternative plotting mode, clonotype CDR3 sequences are shown at plot sides and connected to corresponding areas with lines.

Overlap type

Shorthand	Rule	Note
strict	CDR3nt (AND) V (AND) J (AND) SHMs	Require full match for receptor nucleotide sequence
nt	CDR3nt
ntV	CDR3nt (AND) V
ntVJ	CDR3nt (AND) V (AND) J
aa	CDR3aa
aaV	CDR3aa (AND) V
aaVJ	CDR3aa (AND) V (AND) J
aa!nt	CDR3aa (AND)((NOT) CDR3nt )	Removes nearly all contamination bias from overlap results. Should not be used for samples from the same donor/tracking experiments

strict.paired.scatter

paired.strict.table.collapsed

Clonotype scatterplot. Main frame contains a scatterplot of clonotype abundances (overlapping clonotypes only) and a linear regression. Point size is scaled to the geometric mean of clonotype frequency in both samples. Scatterplot axes represent log10 clonotype frequencies in each sample. Two marginal histograms show the overlapping (red) and total clonotype (grey) abundance distributions in corresponding sample. Histograms are weighted by clonotype abundance, i.e. they display read distribution by clonotype size.
Shared clonotype abundance plot. Plot shows details for top 20 clonotypes shared between samples, as well as collapsed (“NotShown”) and non-overlapping (“NonOverlapping”) clonotypes. Clonotype CDR3 amino acid sequence is plotted against the sample where the clonotype reaches maximum abundance.

CalcPairwiseDistances

Performs an all-versus-all pairwise overlap for a list of samples and computes a set of repertoire similarity measures. At least 3 samples should be provided.

java -jar /path to vdjtools/vdjtools-1.2.1.jar CalcPairwiseDistances -p  [sample1.txt sample2.txt sample3.txt or -m metadata.txt] output_prefix
#-p: plot

intersect.batch.aa

Pairwise overlap circos plot. Count, frequency and diversity panels correspond to the read count, frequency (both non-symmetric) and the total number of clonotypes that are shared between samples. Pairwise overlaps are stacked, i.e. segment arc length is not equal to sample size.

ClusterSamples

将CalcPairwiseDistances的输出文本作为输入进行聚类分析。

java -jar /path to vdjtools/vdjtools-1.2.1.jar ClusterSamples -p  input_prefix output_prefix
#input_prefix等于 calcpariwiseDistance 中的 output_prefix （不用加后缀）
#-p: plot
#-f: factor
#-n：Specifies if plotting factor is continuous

比如：
java -jar /path to vdjtools/vdjtools-1.2.1.jar CalcPairwiseDistances -p e:/data/ -m metadata.txt e:/results/all
java -jar /path to vdjtools/vdjtools-1.2.1.jar ClusterSamples -p -f "Sex" e:/results/all e:/results/Sex

官方给的参考图片

image

TestClusters

This routine allows to test whether a given factor influences repertoire clustering. It assesses compactness of samples that have the same factor level and separation between samples with distinct factor levels for the factor specified in ClusterSamples.
（只有ClusterSamples指定了-f时才可以使用该函数，验证factor是如何影响聚类效果的。）

java -jar /path to vdjtools/vdjtools-1.2.1.jar TestClusters   input_prefix output_prefix

官方图片