泛基因组基因组学群体遗传学

Call SNP用MUMmer

2021-08-27  本文已影响0人  胡童远

Mummer (NUCmer)通过比较基因组获取突变信息。NUCmer,该方法适合多个近缘物种的基因组比较。

NUCmer is a Perl script pipeline for the alignment of multiple closely related nucleotide sequences.

文献:这是老牌软件

MUMmer 1.0
Alignment of Whole Genomes. Nucleic Acids Research 1999
MUMmer 2.1, NUCmer, and PROmer
Fast Algorithms for Large-scale Genome Alignment and Comparision. Nucleic Acids Research 2002
MUMmer 3.0
Versatile and open software for comparing large genomes. Genome Biology 2004
MUMmer4
A fast and versatile genome alignment system. PLoS Comput Biol. 2018

软件地址

sourceforge: http://mummer.sourceforge.net/
Manual: http://mummer.sourceforge.net/examples/
MUMmer3 manual: http://mummer.sourceforge.net/manual/

Mummer

软件获取

conda create -n mummer
conda activate mummer
conda install -c bioconda mummer
mummer -help
nucmer -help

1 比对:

nucmer -p [out_index] [ref] [query]

nucmer -p align ref.fna input.fna

-p|prefix:Set the prefix of the output files (default "out")
--mum: Use anchor matches that are unique in both the reference and query

2 过滤

重复序列可能会掩盖可能的SNP,所以使用delta-filter去除一对多、多对多中的冗余匹配

delta-filter -q align.delta > align_q.filter
delta-filter -1 -q -r align.delta > align_qr.filter

-q: 输入序列最佳匹配
Maps each position of each query to its best hit in the reference
-r: 参考序列最佳匹配
Maps each position of each reference to its best hit in the query
-1: intersection of -r and -q alignments 交集
-m: union of -r and -q alignments 并集

3 coord比对/过滤结果

show-coords -rcl align.delta > align.coords
show-coords -rcl align_q.filter > align_q.coords
show-coords -rcl align_qr.filter > align_qr.coords

-r: 以参考序列排序
Sort output lines by reference IDs and coordinates
-c: 展示覆盖率
Include percent coverage information in the output
-l: 展示序列长度
Include the sequence length information in the output

可见,每次过滤都会减少比对数量

S1 参考匹配序列起点
E1 参考匹配序列终点
S2 输入匹配序列起点
E2 输入匹配序列终点
LEN 1 参考匹配序列长度
LEN 2 输入匹配序列长度
% IDY 匹配序列一致性
LEN R 参考匹配序列所在scaffold/contig长度
LEN Q 输入匹配序列所在scaffold/contig长度
COV R 参考匹配序列长度 vs 所在scaffold/contig长度
COV Q 输入匹配序列长度 vs 所在scaffold/contig长度
TAGS 所在scaffold/contig编号

4 抽查比对结果

show-aligns [align] [ref seq id] [quary seq id]

show-aligns align.delta NC_009614.1 AF04-12.Scaf1 > check.txt
show-aligns align_q.filter NC_009614.1 AF04-12.Scaf1 > check_q.txt
show-aligns align_qr.filter NC_009614.1 AF04-12.Scaf1 > check_qr.txt

5 query向参考回贴:

show-tiling align.delta > align.tiling
show-tiling align_q.filter > align_q.tiling
show-tiling align_qr.filter > align_qr.tiling

6 call SNPs

show-snps -Clr align.delta > align.snps
show-snps -Clr align_q.filter > align_q.snps
show-snps -Clr align_qr.filter > align_qr.snps

-C: 指输出唯一匹配的snp
Do not report SNPs from alignments with an ambiguous mapping, i.e. only report SNPs where the [R] and [Q] columns equal 0 and do not output these columns
-l: 输出结果中包括序列的长度
Include sequence length information in the output
-r: 按照 ref的ID和snp位置信息进行排序
Sort output lines by reference IDs and SNP positions
-H Do not print the output header
-I Do not report indels
-T Switch to tab-delimited format

质控后SNP数量增加了

P1 参考序列位置
SUB 替换方式
P2 输入序列位置
BUFF
DIST
LEN R 参考序列长度
LEN Q 输入序列长度
FRM forward reverse模式
TAGS 序列编号

插入缺失 indel

参考:
compare two incomplete whole genomes to find the SNP calls?
SNPs detection between two bacterial genomes ?
使用nucmer进行全基因组序列比对
MUMmer共线性分析与SNP检测
MUMMER 两个基因组间比较
SNP,SNV 傻傻分不清楚?
Indel (Insertion and Deletion)分析简介
SNP单核苷酸多态性跟点突变有什么区别?

上一篇下一篇

猜你喜欢

热点阅读