一个好用的Venn软件
最近看到一个画Venn图的好工具。其主要利用的是bedtools,R以及Python的一些包。我个人感觉是集颜值、方便、多样为一体的好工具。
这个工具有很多相关的页面
下面简单介绍下
装软件
conda create -n intervene_module python=2
conda activate intervene_module
conda install -c bioconda intervene
有一个小问题是似乎其在conda下的默认版本是
intervene version 0.5.8
。但只有0.6.0之后的版本你才可以设置bedtools的额外选项(因为软件的venn模块应该调用的是bedtools intersect)。当然你如果不想要bedtools的额外选项的话,其实0.5.8也可以了……我觉得bedtools选项这个需求可能不是很大。所以在安装的时候,其实你应该设置下版本
conda install -c bioconda intervene=0.6.4
但我发现好像一直下不下来
$ conda install -c bioconda intervene=0.6.4 Collecting package metadata (current_repodata.json): done Solving environment: failed with initial frozen solve. Retrying with flexible solve. Solving environment: failed with repodata from current_repodata.json, will retry with next repodata source. Collecting package metadata (repodata.json): done Solving environment: |
在其Github的issue_Install intervene latest version from conda 也提到了这个问题,作者在这里推荐用pip安装
于是我就在conda的环境下用pip装
pip install intervene $ intervene --version intervene version 0.6.4 (intervene_module)
基本介绍
其有三种模块,Venn、Upset、Pairwise。具体的可以见下面几张图或者文章里面的图。
$ intervene --help
usage: intervene <subcommand> [options]
Intervene: a tool for intersection and visualization of multiple genomic region and gene sets.
For more details check documentation: http://intervene.readthedocs.io
positional arguments:
{venn,upset,pairwise}
List of subcommands
venn Venn diagram of intersection of genomic regions or list sets (upto 6-way).
upset UpSet diagram of intersection of genomic regions or list sets.
pairwise Pairwise intersection and heatmap of N genomic region sets in <BED/GTF/GFF> format.
optional arguments:
-h, --help show this help message and exit
-v, --version show program's version number and exit
(intervene_module)
# 这里默认用了他们的测试选项,如果不能用的话,可以下他们的测试文件
$ intervene venn --test
Running Intervene with test data.
Generating a 3-way "venn" diagram. Please wait...
You are done! Please check your results @ /home/sgdd/test/intervene/Intervene_results.
Thank you for using Intervene!

$ intervene upset --test
Running Intervene with test data.
Running UpSet module. Please wait...
You are done! Please check your results @ /home/sgd/test/intervene/Intervene_results.
Thank you for using Intervene!
(intervene_module)

$ intervene pairwise --test
Running Intervene with test data.
Performing a pairwise intersection analysis. Please wait...
/home/sgd/miniconda3/envs/intervene_module/lib/python2.7/site-packages/intervene/modules/pairwise/pairwise.py:454: FutureWarning: read_table is deprecated, use read_csv instead, passing sep='\t'.
matrix = pd.read_table(matrix_file,index_col=0, delim_whitespace=True)
You are done! Please check your results @ /home/sgd/test/intervene/Intervene_results.
Thank you for using Intervene!
(intervene_module)

跑完三个命令出来的文件
$ ll
total 68K
-rw-r--r--. 1 sgd bioinfo 8.4K Mar 13 13:07 Intervene_pairwise_frac_matrix.txt
-rw-r--r--. 1 sgd bioinfo 27K Mar 13 13:07 Intervene_pairwise_frac.pdf
-rw-r--r--. 1 sgd bioinfo 295 Mar 13 13:05 Intervene_upset_combinations.txt
-rw-r--r--. 1 sgd bioinfo 6.0K Mar 13 13:05 Intervene_upset.pdf
-rwxr-xr-x. 1 sgd bioinfo 732 Mar 13 13:05 Intervene_upset.R
-rw-r--r--. 1 sgd bioinfo 9.3K Mar 13 13:04 Intervene_venn.pdf
这些都是默认的结果。其中一些文件还可以继续放到Shiny那里去交互式重新修改下细节。当然,针对不同的模块,其实你在命令行里面也可以进行修改,也是一样的。
还可以直接下他们的文件 example_data
# 用-i而不是--test了
./intervene/intervene venn -i intervene/example_data/ENCODE_hESC/*.bed
./intervene/intervene upset -i intervene/example_data/ENCODE_hESC/*.bed
./intervene/intervene pairwise -i intervene/example_data/dbSUPER_mm9/*.bed
还可以指定输出地点
intervene <module_name> --test --output ~/path/to/your/results/folder
Venn
Venn的操作就是
intervene venn -i path/to/BED/files/*.bed --output ~/results/path
Option | Description |
---|---|
-h, –help | To show the help message and exit |
-i, –input | Input genomic regions in (BED/GTF/GFF) format or lists of genes/SNPs IDs. For files in a directory use *.<extension>. e.g. *.bed |
–type | {genomic,list}. Type of input sets. Genomic regions or lists of genes/SNPs. Default is genomic
|
–names | Comma-separated list of names as labels for input files. If it is not set file names will be used as labels. For example: –names=A,B,C,D,E,F |
–filenames | Use file names as labels instead. Default is False
|
–bedtools-options | List any of the arguments available for bedtool’s intersect command. Type bedtools intersect –help to view all the options. For example: –bedtools-options f=0.8,r,etc |
–colors | Comma-separated list of matplotlib-valid colors for fill. E.g., –colors=r,b,k |
–bordercolors | Comma-separated list of matplotlib-valid colors for borders. E.g., –bordercolors=r,b,k |
-o, –output | Output folder path where results will be stored. Default is current working directory. |
–save-overlaps | Save overlapping regions/names for all the combinations as bed/txt files. Default is False
|
–overlap-thresh | Minimum threshold to save the overlapping regions/names as bed/txt. Default is 1
|
–figtype | {pdf,svg,ps,tiff,png} Figure type for the plot. e.g. –figtype svg. Default is pdf
|
–figsize | Figure size as width and height.e.g. –figsize 12 12. |
–fontsize | Font size for the plot labels. Default is 14
|
–dpi | Dots-per-inch (DPI) for the output. Default is: 300
|
–fill | {number,percentage} Report number or percentage of overlaps (Only if –type=list). Default is number
|
–test | This will run the program on test data. |
这里选项都没啥好说的,唯一好说的可能就是bedtools intersect那个选项了。默认是你如果两个interval是overlap的,那么就会认为其overlap。但选项-f可以帮我们设置overlap的程度
$ intervene venn --test --bedtools-options f=0.8 Running Intervene with test data. Generating a 3-way "venn" diagram. Please wait... Done! Please check your results @ /home/sgd/test/intervene/Intervene_results/Intervene_results. Thank you for using Intervene! (intervene_module)
可以看到overlap的部分变少了,自己特异部分变多了,跟一开始不设置的相比
image
Upset
intervene upset -i path/to/BED/files/*.bed --output ~/results/path
Option | Description |
---|---|
-h, –help | show this help message and exit |
-i, –input | Input genomic regions in <BED/GTF/GFF/VCF> format or list files. For files in a directory use *.<ext>. e.g. *.bed |
–type | Type of input sets. Genomic regions or lists of genes sets {genomic,list}. Default is genomic
|
–names | Comma-separated list of names as labels for input files. If it is not set file names will be used as labels. For example: –names=A,B,C,D,E,F |
–filenames | Use file names as labels instead. Default is True
|
–bedtools-options | List any of the arguments available for bedtool’s intersect command. Type bedtools intersect –help to view all the options. For example: –bedtools-options f=0.8,r,etc |
-o, –output | Output folder path where plots will store. Default is current working directory. |
–save-overlaps | Save overlapping regions/names for all the combinations as bed/txt files. Default is False
|
–overlap-thresh | Minimum threshold to save the overlapping regions/names as bed/txt. Default is 1
|
–order | The order of intersections of sets {freq,degree}. e.g. –order degree. Default is freq
|
–ninter | Number of top intersections to plot. Default is 30
|
–showzero | Show empty overlap combinations. Default is False
|
–showsize | Show intersection sizes above bars. Default is True
|
–mbcolor | Color of the main bar plot. Default is gray23
|
–sbcolor | Color of set size bar plot. Default is #56B4E9
|
–mblabel | The y-axis label of the intersection size bars. Default is No of Intersections
|
–sxlabel | The x-axis label of the set size bars. Default is Set size
|
–figtype | Figure type for the plot. e.g. –figtype svg {pdf,svg,ps,tiff,png} Default is pdf
|
–figsize | Figure size for the output plot (width,height). |
–dpi | Dots-per-inch (DPI) for the output. Default is 300
|
–scriptonly | Set to generate Rscript only, if R/UpSetR package is not installed. Default is False
|
–showshiny | Print the combinations of intersections to input to Shiny App. Default is False
|
稍微注意一点是可以把–showshiny这个给开了,这样你可以把结果放到Shiny那边去修改
Pairwise intersection module
这个参数会更加多一点,也更好玩一点
intervene pairwise -i path/to/BED/files/*.bed --type genomic --compute jaccard --htype tribar
intervene pairwise -i path/to/BED/files/*.bed --type genomic --compute jaccard --htype tribar --output ~/results/path
Option | Description |
---|---|
-h, –help | show this help message and exit |
-i, –input | Input genomic regions in (BED/GTF/GFF) format. For files in a directory use *.<extension>. e.g. *.bed |
–type | {genomic,list}. Type of input sets. Genomic regions or lists of genes/SNPs. Default is genomic
|
–compute | Compute count/fraction of overlaps or statistical relationships. {count , frac , jaccard , fisher , reldist } |
–compute=count - calculates the number of overlaps. | |
–compute=frac - calculates the fraction of overlap. | |
–compute=jaccard - calculate the Jaccard statistic. Read more details here | |
–compute=reldist - calculate the distribution of relative distances. Read more details here | |
–compute=fisher - calculate Fisher`s statistic. Read more details here | |
Note: For jaccard and reldist regions should be pre-shorted or set –sort`` | |
–bedtools-options | List any of the arguments available for bedtool’s subcommands: interset, jaccard, fisher. Type bedtools <subcommand> –help to view all the options. For example: –bedtools-options f=0.8,r,etc. |
Note: –compute options count and frac uses BedTools’ intersect command. | |
–corr | Compute the correlation. By default set to False |
–corrtype | Select the type of correlation from pearson , kendall or spearman . |
–corrtype=pearson: computes the Pearson correlation. (Default) | |
–corrtype=kendall: computes the Kendall correlation. | |
–corrtype=spearman: computes the Spearman correlation. | |
Note: This only works if –corr is set. | |
–htype | {tribar,color,pie,circle,square,ellipse,number,shade}. Heatmap plot type. Default is tribar . |
Read the below note for tribar option. |
|
–triangle | Show lower/upper triangle of the matrix as heatmap. Default is lower
|
–diagonal | Show the diagonal values in the heatmap. Default is False . |
–names | Comma-separated list of names as labels for input files. If it is not set file names will be used as labels. For example: –names=A,B,C,D,E,F |
–filenames | Use file names as labels instead. Default is False . |
–sort | Set this only if your files are not sorted. Default is False . |
–genome | Required argument if –compute=fisher. Needs to be a string assembly name such as mm10 or hg38
|
-o, –output | Output folder path where results will be stored. Default is current working directory. |
–barlabel | x-axis label of boxplot if –htype=tribar. Default is Set size
|
–barcolor | Boxplot color (hex vlaue or name, e.g. blue). Default is #53cfff . |
–fontsize | Label font size. Default is 8 . |
–title | Heatmap main title. Default is Pairwise intersection
|
–space | White space between barplt and heatmap, if –htype=tribar. Default is 1.3 . |
–figtype | {pdf,svg,ps,tiff,png} Figure type for the plot. e.g. –figtype svg. Default is pdf
|
–figsize | Figure size for the output plot (width,height). e.g. –figsize 8 8 |
–dpi | Dots-per-inch (DPI) for the output. Default is: 300 . |
–scriptonly | Set to generate Rscript only, if R/Corrplot package is not installed. Default is False
|
–test | This will run the program on test data. |
这里有很多有意思的参数。大家可以自己去探究。
其中我觉得最有意思的是–compute那边的选项,jaccard、reldist、fisher 等计算方法让我对overlap计算有了新的理解。
另外附上文档里面的notes……懒得翻了
The option
--htype=tribar
will generate a horizontal bar plot with an adjacent heatmap rotated 45 degrees to show the lower triangle of the matrix comparing all sets of bars. If you want to view upper triangle, please--triangle upper
. It’s only recomended to usetribar
ifcompute
is set tojaccard
orfisher
.Please make sure that the
tribar
will only show lower triangle of the matrix as heatmap and diagoals are set to zero. It recommended to use this if--compute is set to ``jaccard
,fisher
orreldist
.
最后附上pairwise其他的一些图
intervene pairwise -i ~/dbSUPER/mm9/*.bed --filenames --compute frac --htype pie

这里调用的应该是corrplot这个包
intervene pairwise -i ~/dbSUPER/mm9/*.bed --filenames --compute frac --htype color
