Circos手把手极简入门(Windows篇)
Circos是数据展示的一个极佳方式。是每一个男人都应该学会的必备技能!
如果按照我这个教程都搞不定的话,还是建议先别看别人的高阶教程了。
Circos安装及测试
首先,下载Circos安装包官网
标黄的为最新版本的circos。
最下面也还有一个简单的tutorials,为Linux版本的tutorial,包含了很多种类型的数据配置示例,共有十个示例,有兴趣的不妨了解一波。
Windows下移动到某个自定义文件夹下解压压缩文件即可,接下来就可以通过PowerShell进行测试,验证是否可以作图。
# 进入对应的解压文件夹下
PS D:\Software\circos\circos-0.69-6> cd .\example\
PS D:\Software\circos\circos-0.69-6\example> ls
目录: D:\Software\circos\circos-0.69-6\example
Mode LastWriteTime Length Name
---- ------------- ------ ----
d----- 2018/6/20 13:32 data
d----- 2018/6/20 13:32 etc
-a---- 2017/6/17 7:36 3693061 circos.png
-a---- 2017/6/17 7:37 258048 circos.svg
-a---- 2017/6/4 9:05 1476 README
-a---- 2017/6/4 9:05 202 run
-a---- 2017/6/4 9:05 37130 run.out
# 测试作图
PS D:\Software\circos\circos-0.69-6\example> ..\bin\circos -conf .\etc\circos.conf
# 会出现一大串状态信息,这里显示最后一行信息
debuggroup summary,timer 34.17s image took more than 30 s to generate. Component timings are shown above. To always show them, use -debug_group timer. To adjust the time cutoff, change debug_auto_timer_report in etc/housekeeping.conf.
Example plot
Perl模块的安装
这里我看了教程[1]选择使用Strawberry Perl
随后打开perl command line进行模块安装
Perl command line
# 在命令行下输入以下代码
cpan Carp Clone Config::General Cwd Data::Dumper Digest::MD5 File::Basename File::Spec::Functions File::Temp FindBin Font::TTF::Font GD GD::Image Getopt::Long IO::File List::MoreUtils List::Util Math::Round Math::Trig Math::VecStat Memoize Params::Validate Pod::Usage POSIX Readonly Regexp::Common Statistics::Basic Storable Sys::Hostname Text::Balanced Text::Format Time::HiRes
静静的等待安装结束, 随后使用PS(以下都使用此代替power shell)检查所需模块是否都已经安装成功。
PS D:\Software\circos\circos-0.69-6\bin> ./circos -module
ok 1.23 Carp
ok 0.31 Clone
ok 2.60 Config::General
ok 3.36 Cwd
ok 2.131 Data::Dumper
ok 2.51 Digest::MD5
ok 2.82 File::Basename
ok 3.33 File::Spec::Functions
ok 0.22 File::Temp
ok 1.50 FindBin
ok 0.39 Font::TTF::Font
ok 2.46 GD
ok 0.2 GD::Polyline
ok 2.38 Getopt::Long
ok 1.15 IO::File
ok 0.33 List::MoreUtils
ok 1.23 List::Util
ok 0.01 Math::Bezier
ok 1.997 Math::BigFloat
ok 0.06 Math::Round
ok 0.08 Math::VecStat
ok 1.02 Memoize
ok 1.24 POSIX
ok 1.05 Params::Validate
ok 1.36 Pod::Usage
ok 1.03 Readonly
ok 2013031301 Regexp::Common
ok 2.50 SVG
ok 1.19 Set::IntSpan
ok 1.6611 Statistics::Basic
ok 2.30 Storable
ok 1.16 Sys::Hostname
ok 2.02 Text::Balanced
ok 0.59 Text::Format
ok 1.9724 Time::HiRes
PS D:\Software\circos\circos-0.69-6\bin>
此步操作没有问题,接下来就是愉快的作图时光!
Linux下安装Circos
老规矩还是anaconda
conda install -c bioconda circos
最低配上手教程
安装好Circos后我们看到作者推荐从quick start系列教程开始学起,首先我们对circos的作图模式有一个了解,只有掌握作图逻辑才可以真正理解circos。
作图模式本节学习使用最少的配置文件,先把图片画出来。随后再对需要的参数进行详细的研究。
本最低配图片将展示人类上色后的24条染色体,当然随后我们也可以加入一些额外参数使图片更好看些。
MINIMUM CONFIGURATIO
# circos.conf
karyotype = data/karyotype/karyotype.human.txt
<ideogram>
<spacing>
default = 0.005r
<spacing>
radius = 0.9r
thickness = 20bp
fill = yes
</ideogram>
# 后面的内容都是标准且必须的。每个circos作图都会至少需要这些参数,在需要的时候可以被修改,具体内容请查看 etc/文件夹
<image>
# 在circos文件夹下
<<include etx/image.conf>>
</image>
# 颜色定义
<<include etc/colors_fonts_patterns.conf>>
# 调试参数
<<include etc/housekeeping.conf>>
KARYOTYPE
karyotype文件一般来说都是需要的,它定义了染色体的名称,大小,颜色。但是circos可以展示很多其它的数据,所以该文件的参数也并不限定针对染色体。
在安装好的circos/data/karyotype/文件夹下已经自带了数个常见生物的序列信息:人类,小鼠,大鼠,果蝇。
当参数karyotype指定了一个文件的位置的时候,文件路径可以是绝对路径或者是相对路径,相对路径的意思就是如果在运行目录没有找到指定文件,就会在circos文件夹下寻找,这也是为什么此处我们只需要指定data/karyotype的原因。
IDEOGRAMS
一旦circos有了染色体信息可以用来作图的时候,核型模式图信息就会被需要来告诉circos在哪里进行标记。
ideogram区块的参数可以指定,例如radius, thickness, fill
,除此之外,<spacing>
参数也可以指定两条染色体之间的间隔。
实战
将以上代码保存为circos.conf,powershell进入该文件所在位置,调用circos即可作图;
PS D:\Software\circos\circos-0.69-6\my_circos> ..\bin\circos -conf .\circos.conf
debuggroup summary 0.16s welcome to circos v0.69-6 31 July 2017 on Perl 5.014002
debuggroup summary 0.16s current working directory D:/Software/circos/circos-0.69-6/my_circos
debuggroup summary 0.16s command D:\Software\circos\circos-0.69-6\bin\circos.exe -conf .\circos.conf
debuggroup summary 0.16s loading configuration from file .\circos.conf
debuggroup summary 0.16s found conf file .\circos.conf
debuggroup summary 0.49s debug will appear for these features: output,summary
debuggroup summary 0.49s bitmap output image ./.\circos.png
debuggroup summary 0.49s SVG output image ./.\circos.svg
debuggroup summary 0.49s parsing karyotype and organizing ideograms
debuggroup summary 0.58s karyotype has 24 chromosomes of total size 3,095,677,436
debuggroup summary 0.58s applying global and local scaling
debuggroup summary 0.59s allocating image, colors and brushes
debuggroup summary 6.69s drawing 24 ideograms of total size 3,095,677,436
debuggroup summary 6.69s drawing highlights and ideograms
debuggroup output 6.73s generating output
debuggroup output 7.10s created PNG image ./.\circos.png (84 kb)
debuggroup output 7.12s created SVG image ./.\circos.svg (6 kb)
PS D:\Software\circos\circos-0.69-6\my_circos>
结果如下:
[图片上传失败...(image-11457-1533143089719)]
简单进阶:添加坐标和标签
在此章,我们学习将在原有图形基础上添加标签和刻度。为了达到这个目的,我们需要在ideogram
分区添加label参数,并增添ticks
分区。
DEOGRAM LABELS
label参数格式有很多,这里展示最少的需要的变量:
<ideogram>
<spacing>
default = 0.005r
</spacing>
# ideogram的位置设定,颜色填充设定的,轮廓设定
radius = 0.90r
thichness = 20bp
fill = yes
stroke_color = dgrey
strock_thickness = 2p
# 标签最少参数设定示例
show_label = yes
# 支持字体可以在etc/fonts.conf文件中查看
label_font = default
label_radius = 1R + 75p
label_size = 30
label_parallel = yes
</ideogram>
其中label_radius参数可以自己设定为相对位置或者是绝对位置:
# 以半径为单位
label_radius = 1.1r
label_radius = 0.8r
# 绝对位置
label_radius = 500p
# 相对加上绝对位置
label_radius = 1r + 100p
# 使用图像大小来界定
label_radius = dims(image,radius) - 50p
# 使用核型模式图来界定
label_radius = dims(ideogram,radius_outer) + 50p
label_radius = dims(ideogram,radius_inner) - 50p
TICK MARKS AND LABELS
刻度一般都是成组出现的,你可以指定它们的相对位置和绝对位置,最低配的参数如下:
show_ticks = yes
show_tick_labels = yes
<ticks>
radius = 1r
color = black
thickness = 2p
# 刻度标签
multiplier = 1e-6
# %d = 整数
# %f = 浮点数
# %.1f = 指定小数点后保留一位
# %.2f = 指定小数点后保留两位
format = %d
<tick>
spacing = 5u
size = 10bp
</tick>
<tick>
spacing = 25u
size = 15p
show_label = yes
label_size = 20p
label_offset = 10p
format = %d
</tick>
</ticks>
我们可以看到这里指定了两个<tick>
,是因为会根据ideogram的情况自动选择最优的展现方式。
实战
随后我们将这两段代码分别保存成ideogram.conf
和ticks.conf
文件,修改之前的circos.conf,将ideogram作为单独模块隔离出来同时指定其它模块的作图单位chromsomes_units
参数,那么circos.conf会变成:
karyotype = data/karyotype/karyotype.human.txt
chromosomes_units = 1000000
<<include ideogram.conf>>
<<include ticks.conf>>
<image>
<<include etc/image.conf>>
</image>
<<include etc/colors_fonts_patterns.conf>>
<<include etc/housekeeping.conf>>
OK,一切搞定后再次调用powershell来作图:
PS D:\Software\circos\circos-0.69-6\my_circos> ..\bin\circos -conf .\circos.conf
debuggroup summary 0.15s welcome to circos v0.69-6 31 July 2017 on Perl 5.014002
debuggroup summary 0.15s current working directory D:/Software/circos/circos-0.69-6/my_circos
debuggroup summary 0.15s command D:\Software\circos\circos-0.69-6\bin\circos.exe -conf .\circos.conf
debuggroup summary 0.15s loading configuration from file .\circos.conf
debuggroup summary 0.16s found conf file .\circos.conf
debuggroup summary 0.32s debug will appear for these features: output,summary
debuggroup summary 0.32s bitmap output image ./.\circos.png
debuggroup summary 0.32s SVG output image ./.\circos.svg
debuggroup summary 0.32s parsing karyotype and organizing ideograms
debuggroup summary 0.40s karyotype has 24 chromosomes of total size 3,095,677,436
debuggroup summary 0.41s applying global and local scaling
debuggroup summary 0.42s allocating image, colors and brushes
debuggroup summary 2.07s drawing 24 ideograms of total size 3,095,677,436
debuggroup summary 2.07s drawing highlights and ideograms
debuggroup output 2.53s generating output
debuggroup output 2.94s created PNG image ./.\circos.png (271 kb)
debuggroup output 2.94s created SVG image ./.\circos.svg (123 kb)
PS D:\Software\circos\circos-0.69-6\my_circos>
现在我们就有标记好的图片了:
image染色体选择,缩放,颜色以及方向
本章节讨论如何对已经展示出来的元素进行调整。
IDEOGRAM SECTION
默认是展示所有的染色体,顺序是按照文件里面的顺序。但是可以通过chromsomes_display_default
参数来设定展示哪部分染色体:
chromosomes_display_default = no
chromosomes = hs1;hs2;hs3;h4
# 正则也是支持的哦
chromosomes = /hs[1-4]$/
# 也可以结合使用
chromosomes = /hs[1-4]$/;hs10;hs11
# “-”号代表着不显示某个染色体
chromosomes = /hs[1-4]$/;-hs3
# 注意是使用;分号来作为分隔符的
IDEOGRAM SCALE
设定染色体的大小有相对指定和绝对指定两种模式。
首先来看绝对指定:
# hs1 0.25x zoom
# hs2 2.00x zoom
chromosomes_scale = hs1=0.25,hs2=2.0
相对指定就是以相对半径为单位了
# hs1 25% of figure
# hs2 50% of figure
chromosomes_scale = hs1=0.25r,hs2=0.50r
也可以指定多少个染色体分享多少比例的图片
# hs1,hs2 distributed evenly within 50% of figure (each is 25%)
chromosomes_scale = /hs[12]/=0.5rn
或者指定大家一块均分
# 所有的的ideogram均分整张图片的比例
chromosomes_scale = /./=1rn
SCALE PROGRESSION
默认染色体的排列顺序呢,是按照顺时针方向的,但是可以通过在<image>
模块里面的angle_orientation
参数来设置:
<image>
angle_orientation* = counterclockwise
# * 号的意义在于覆盖一个参数
<<include etc/image>>
</image>
也可以指定特定的染色体的排列顺序,使用chromosomes-reverse
参数
chromosomes-reverse = /hs[234]/
这里没有加入原字符$是没有关系,因为这里只指定了画1234染色体,当然严谨起见最好还是习惯性的加上
IDEOGRAM COLOR
默认的颜色设定来自于karyotype文件,可以通过charomsomes_color
参数设定
chromosomes_color = hs1=red,hs2=orange,hs3=green,hs4=blue
IDEPGRAM RADIAL POSITION
默认的参数是将所有的染色体以一样的径向位置排列的,但是在<ideogram>
模块里面的radial
参数可以改变所有染色体的径向位置,或者通过chromosomes_radius
参数改变单个或者多个染色体的径向位置
chromosomes_radius = hs4:0.9r
实战
老规矩,把所有代码保存为conf文件,调用powershell作图:
# circos.conf
karyotype = data/karyotype/karyotype.human.txt
chromosomes_units = 1000000
# 设定作图染色体防微
chromosomes_display_default = no
chromosomes = /hs[1-4]$/
# 设定缩放比例
chromosomes_scale = hs1=0.5r,/hs[234]/=0.5rn
# 设置排列顺序
chromosomes_reverse = /hs[234]/
# 设置填充颜色
chromosomes_color = hs1=red,hs2=orange,hs3=green,hs4=blue
# 设置径向位置
chromosomes_radius = hs4:0.9r
# 调用默认文件补齐其余参数
<<include ideogram.conf>>
<<include ticks.conf>>
<image>
<<include etc/image.conf>>
</image>
<<include etc/colors_fonts_patterns.conf>>
<<include etc/housekeeping.conf>>
# ideogram.conf
<ideogram>
<spacing>
default = 0.005r
</spacing>
radius = 0.90r
thickness = 20p
fill = yes
stroke_color = dgrey
stroke_thickness = 2p
show_label = yes
# see etc/fonts.conf for list of font names
label_font = default
label_radius = 1.075r
# 假如想要标签尽可能靠近图像,使用下一行代码
# label_radius = dims(image,radius) - 60p
label_size = 30
label_parallel = yes
</ideogram>
# ticks.conf
show_ticks = yes
show_tick_labels = yes
<ticks>
radius = 1r
color = black
thickness = 2p
multiplier = 1e-6
format = %d
<tick>
spacing = 5u
size = 10p
</tick>
<tick>
spacing = 25u
size = 15p
show_label = yes
label_size = 20p
label_offset = 10p
format = %d
</tick>
</ticks>
PS D:\Software\circos\circos-0.69-6\my_circos> ..\bin\circos -conf .\circos.conf
debuggroup summary 0.31s welcome to circos v0.69-6 31 July 2017 on Perl 5.014002
debuggroup summary 0.31s current working directory D:/Software/circos/circos-0.69-6/my_circos
debuggroup summary 0.32s command D:\Software\circos\circos-0.69-6\bin\circos.exe -conf .\circos.conf
debuggroup summary 0.32s loading configuration from file .\circos.conf
debuggroup summary 0.32s found conf file .\circos.conf
debuggroup summary 0.50s debug will appear for these features: output,summary
debuggroup summary 0.50s bitmap output image ./.\circos.png
debuggroup summary 0.50s SVG output image ./.\circos.svg
debuggroup summary 0.50s parsing karyotype and organizing ideograms
debuggroup summary 0.60s karyotype has 24 chromosomes of total size 3,095,677,436
debuggroup summary 0.60s applying global and local scaling
debuggroup summary 0.63s allocating image, colors and brushes
debuggroup summary 2.68s drawing 4 ideograms of total size 881,626,704
debuggroup summary 2.68s drawing highlights and ideograms
debuggroup output 2.83s generating output
debuggroup output 3.19s created PNG image ./.\circos.png (132 kb)
debuggroup output 3.20s created SVG image ./.\circos.svg (34 kb)
PS D:\Software\circos\circos-0.69-6\my_circos>
image