R全基因组/外显子组测序分析R语言

VCF变异统计及绘制染色体分布图

2018-05-04  本文已影响134人  生物数据分析笔记

VCF文件的变异分布图,除了使用Circos图之外,统计特定材料间的变异分布情况时可能会用到染色体分布图,为了节省时间,实现统计及绘图的自动化,现提供统计数量用的Python脚本和绘图使用的R脚本,仅供参考。如有其它更快捷的方法,或觉得脚本太拙劣,欢迎留言批评交流。


VCF格式。详细格式介绍,请自行搜索了解,这里不赘述了。

PYTHON脚本,使用脚本整理100Kb滑窗中变异位点数量:

# variation_number_in_bin.py

import sys

input_file = sys.argv[1]
step_len = 100000
result = {}

chrom_stats = {
    "Chr1":43270923,
    "Chr2":35937250,
    "Chr3":36413819,
    "Chr4":35502694,
    "Chr5":29958434,
    "Chr6":31248787,
    "Chr7":29697621,
    "Chr8":28443022,
    "Chr9":23012720,
    "Chr10":23207287,
    "Chr11":29021106,
    "Chr12":27531856
    }

for chrom in chrom_stats.keys():
    step= 0
    step_start = 0
    result[chrom]={}
    result[chrom][step]=0
    for line in open(input_file):
        if line.startswith(chrom):
            pos = line.split()[1]
            if step_start < int(pos) < (step_start + step_len):
                result[chrom][step] += 1
            if int(pos) >= (step_start + step_len):
                step += 1
                step_start += step_len
                result[chrom][step] = 0
                result[chrom][step] += 1

    for steps,nums in result[chrom].items():
        print(chrom, steps, nums, sep="\t")
脚本整理后的文件

下面画图,使用R脚本如下:

library(ggplot2)
library(Cairo)

var_density <- read.table("snp.distribution.stats", sep = "\t", header = T)

names(var_density) <- c("chrom", "region", "variation_count")
var_density$chrom <- factor(var_density$chrom, levels=c(
  "Chr1", "Chr2", "Chr3", "Chr4", "Chr5", "Chr6",
  "Chr7", "Chr8", "Chr9", "Chr10", "Chr11", "Chr12"))

CairoPNG(filename = "variation_distribution.png", width = 12000, height = 7500, res = 600)

distribution <- ggplot(var_density, aes(x=region, y=variation_count))

distribution + geom_line(colour = "darkorange4", size = 0.35) +
  guides(fill=FALSE) +
  theme(axis.line = element_line(size=0.5),
        panel.background = element_rect(fill="white",size=rel(20)),
        panel.grid.minor = element_line(colour = "gray"),
        axis.text = element_text(family="Times New Roman",size=16,face="bold"),
        axis.title = element_text(family="Times New Roman",size=18,face="bold"),
        strip.text = element_text(family="Times New Roman",size=12,face="bold")) +
  facet_wrap(~chrom, ncol=2)

dev.off()
个人觉得这个图还凑合。 variation_distribution.png
上一篇下一篇

猜你喜欢

热点阅读