生信修炼基因组数据绘图绘图绘画

一个好用的Venn软件

2020-03-13  本文已影响0人  城管大队哈队长

最近看到一个画Venn图的好工具。其主要利用的是bedtools,R以及Python的一些包。我个人感觉是集颜值、方便、多样为一体的好工具。

这个工具有很多相关的页面

下面简单介绍下

装软件

conda create -n intervene_module python=2
conda activate intervene_module
conda install -c bioconda intervene

有一个小问题是似乎其在conda下的默认版本是 intervene version 0.5.8。但只有0.6.0之后的版本你才可以设置bedtools的额外选项(因为软件的venn模块应该调用的是bedtools intersect)。当然你如果不想要bedtools的额外选项的话,其实0.5.8也可以了……我觉得bedtools选项这个需求可能不是很大。

所以在安装的时候,其实你应该设置下版本

conda install -c bioconda intervene=0.6.4

但我发现好像一直下不下来

$ conda install -c bioconda intervene=0.6.4
Collecting package metadata (current_repodata.json): done
Solving environment: failed with initial frozen solve. Retrying with flexible solve.
Solving environment: failed with repodata from current_repodata.json, will retry with next repodata source.
Collecting package metadata (repodata.json): done
Solving environment: | 

在其Github的issue_Install intervene latest version from conda 也提到了这个问题,作者在这里推荐用pip安装

于是我就在conda的环境下用pip装

pip install intervene

$ intervene --version
intervene version 0.6.4
(intervene_module) 

基本介绍

其有三种模块,Venn、Upset、Pairwise。具体的可以见下面几张图或者文章里面的图。

$ intervene --help
usage: intervene <subcommand> [options]

    Intervene: a tool for intersection and visualization of multiple genomic region and gene sets.
    For more details check documentation: http://intervene.readthedocs.io
    

positional arguments:
  {venn,upset,pairwise}
                        List of subcommands
    venn                Venn diagram of intersection of genomic regions or list sets (upto 6-way).
    upset               UpSet diagram of intersection of genomic regions or list sets.
    pairwise            Pairwise intersection and heatmap of N genomic region sets in <BED/GTF/GFF> format.

optional arguments:
  -h, --help            show this help message and exit
  -v, --version         show program's version number and exit
(intervene_module) 

# 这里默认用了他们的测试选项,如果不能用的话,可以下他们的测试文件
$ intervene venn --test

Running Intervene with test data.


Generating a 3-way "venn" diagram. Please wait...


You are done! Please check your results @ /home/sgdd/test/intervene/Intervene_results. 
Thank you for using Intervene!

image
$ intervene upset --test

Running Intervene with test data.


Running UpSet module. Please wait...


You are done! Please check your results @ /home/sgd/test/intervene/Intervene_results. 
Thank you for using Intervene!

(intervene_module) 

image
$ intervene pairwise --test

Running Intervene with test data.


Performing a pairwise intersection analysis. Please wait...

/home/sgd/miniconda3/envs/intervene_module/lib/python2.7/site-packages/intervene/modules/pairwise/pairwise.py:454: FutureWarning: read_table is deprecated, use read_csv instead, passing sep='\t'.
  matrix = pd.read_table(matrix_file,index_col=0, delim_whitespace=True)

You are done! Please check your results @ /home/sgd/test/intervene/Intervene_results. 
Thank you for using Intervene!

(intervene_module) 

image

跑完三个命令出来的文件

$ ll
total 68K
-rw-r--r--. 1 sgd bioinfo 8.4K Mar 13 13:07 Intervene_pairwise_frac_matrix.txt
-rw-r--r--. 1 sgd bioinfo  27K Mar 13 13:07 Intervene_pairwise_frac.pdf
-rw-r--r--. 1 sgd bioinfo  295 Mar 13 13:05 Intervene_upset_combinations.txt
-rw-r--r--. 1 sgd bioinfo 6.0K Mar 13 13:05 Intervene_upset.pdf
-rwxr-xr-x. 1 sgd bioinfo  732 Mar 13 13:05 Intervene_upset.R
-rw-r--r--. 1 sgd bioinfo 9.3K Mar 13 13:04 Intervene_venn.pdf

这些都是默认的结果。其中一些文件还可以继续放到Shiny那里去交互式重新修改下细节。当然,针对不同的模块,其实你在命令行里面也可以进行修改,也是一样的。

还可以直接下他们的文件 example_data

# 用-i而不是--test了
./intervene/intervene venn -i intervene/example_data/ENCODE_hESC/*.bed
./intervene/intervene upset -i intervene/example_data/ENCODE_hESC/*.bed
./intervene/intervene pairwise -i intervene/example_data/dbSUPER_mm9/*.bed

还可以指定输出地点

intervene <module_name> --test --output ~/path/to/your/results/folder

Venn

Venn的操作就是

intervene venn -i path/to/BED/files/*.bed --output ~/results/path
Option Description
-h, –help To show the help message and exit
-i, –input Input genomic regions in (BED/GTF/GFF) format or lists of genes/SNPs IDs. For files in a directory use *.<extension>. e.g. *.bed
–type {genomic,list}. Type of input sets. Genomic regions or lists of genes/SNPs. Default is genomic
–names Comma-separated list of names as labels for input files. If it is not set file names will be used as labels. For example: –names=A,B,C,D,E,F
–filenames Use file names as labels instead. Default is False
–bedtools-options List any of the arguments available for bedtool’s intersect command. Type bedtools intersect –help to view all the options. For example: –bedtools-options f=0.8,r,etc
–colors Comma-separated list of matplotlib-valid colors for fill. E.g., –colors=r,b,k
–bordercolors Comma-separated list of matplotlib-valid colors for borders. E.g., –bordercolors=r,b,k
-o, –output Output folder path where results will be stored. Default is current working directory.
–save-overlaps Save overlapping regions/names for all the combinations as bed/txt files. Default is False
–overlap-thresh Minimum threshold to save the overlapping regions/names as bed/txt. Default is 1
–figtype {pdf,svg,ps,tiff,png} Figure type for the plot. e.g. –figtype svg. Default is pdf
–figsize Figure size as width and height.e.g. –figsize 12 12.
–fontsize Font size for the plot labels. Default is 14
–dpi Dots-per-inch (DPI) for the output. Default is: 300
–fill {number,percentage} Report number or percentage of overlaps (Only if –type=list). Default is number
–test This will run the program on test data.

这里选项都没啥好说的,唯一好说的可能就是bedtools intersect那个选项了。默认是你如果两个interval是overlap的,那么就会认为其overlap。但选项-f可以帮我们设置overlap的程度

$ intervene venn --test --bedtools-options f=0.8 

Running Intervene with test data.


Generating a 3-way "venn" diagram. Please wait...


Done! Please check your results @ /home/sgd/test/intervene/Intervene_results/Intervene_results. 
Thank you for using Intervene!

(intervene_module) 

可以看到overlap的部分变少了,自己特异部分变多了,跟一开始不设置的相比

image

Upset

intervene upset -i path/to/BED/files/*.bed --output ~/results/path
Option Description
-h, –help show this help message and exit
-i, –input Input genomic regions in <BED/GTF/GFF/VCF> format or list files. For files in a directory use *.<ext>. e.g. *.bed
–type Type of input sets. Genomic regions or lists of genes sets {genomic,list}. Default is genomic
–names Comma-separated list of names as labels for input files. If it is not set file names will be used as labels. For example: –names=A,B,C,D,E,F
–filenames Use file names as labels instead. Default is True
–bedtools-options List any of the arguments available for bedtool’s intersect command. Type bedtools intersect –help to view all the options. For example: –bedtools-options f=0.8,r,etc
-o, –output Output folder path where plots will store. Default is current working directory.
–save-overlaps Save overlapping regions/names for all the combinations as bed/txt files. Default is False
–overlap-thresh Minimum threshold to save the overlapping regions/names as bed/txt. Default is 1
–order The order of intersections of sets {freq,degree}. e.g. –order degree. Default is freq
–ninter Number of top intersections to plot. Default is 30
–showzero Show empty overlap combinations. Default is False
–showsize Show intersection sizes above bars. Default is True
–mbcolor Color of the main bar plot. Default is gray23
–sbcolor Color of set size bar plot. Default is #56B4E9
–mblabel The y-axis label of the intersection size bars. Default is No of Intersections
–sxlabel The x-axis label of the set size bars. Default is Set size
–figtype Figure type for the plot. e.g. –figtype svg {pdf,svg,ps,tiff,png} Default is pdf
–figsize Figure size for the output plot (width,height).
–dpi Dots-per-inch (DPI) for the output. Default is 300
–scriptonly Set to generate Rscript only, if R/UpSetR package is not installed. Default is False
–showshiny Print the combinations of intersections to input to Shiny App. Default is False

稍微注意一点是可以把–showshiny这个给开了,这样你可以把结果放到Shiny那边去修改

Pairwise intersection module

这个参数会更加多一点,也更好玩一点

intervene pairwise -i path/to/BED/files/*.bed --type genomic --compute jaccard --htype tribar

intervene pairwise -i path/to/BED/files/*.bed --type genomic --compute jaccard --htype tribar --output ~/results/path
Option Description
-h, –help show this help message and exit
-i, –input Input genomic regions in (BED/GTF/GFF) format. For files in a directory use *.<extension>. e.g. *.bed
–type {genomic,list}. Type of input sets. Genomic regions or lists of genes/SNPs. Default is genomic
–compute Compute count/fraction of overlaps or statistical relationships. {count, frac, jaccard, fisher, reldist}
–compute=count - calculates the number of overlaps.
–compute=frac - calculates the fraction of overlap.
–compute=jaccard - calculate the Jaccard statistic. Read more details here
–compute=reldist - calculate the distribution of relative distances. Read more details here
–compute=fisher - calculate Fisher`s statistic. Read more details here
Note: For jaccard and reldist regions should be pre-shorted or set –sort``
–bedtools-options List any of the arguments available for bedtool’s subcommands: interset, jaccard, fisher. Type bedtools <subcommand> –help to view all the options. For example: –bedtools-options f=0.8,r,etc.
Note: –compute options count and frac uses BedTools’ intersect command.
–corr Compute the correlation. By default set to False
–corrtype Select the type of correlation from pearson, kendall or spearman.
–corrtype=pearson: computes the Pearson correlation. (Default)
–corrtype=kendall: computes the Kendall correlation.
–corrtype=spearman: computes the Spearman correlation.
Note: This only works if –corr is set.
–htype {tribar,color,pie,circle,square,ellipse,number,shade}. Heatmap plot type. Default is tribar.
Read the below note for tribar option.
–triangle Show lower/upper triangle of the matrix as heatmap. Default is lower
–diagonal Show the diagonal values in the heatmap. Default is False.
–names Comma-separated list of names as labels for input files. If it is not set file names will be used as labels. For example: –names=A,B,C,D,E,F
–filenames Use file names as labels instead. Default is False.
–sort Set this only if your files are not sorted. Default is False.
–genome Required argument if –compute=fisher. Needs to be a string assembly name such as mm10 or hg38
-o, –output Output folder path where results will be stored. Default is current working directory.
–barlabel x-axis label of boxplot if –htype=tribar. Default is Set size
–barcolor Boxplot color (hex vlaue or name, e.g. blue). Default is #53cfff.
–fontsize Label font size. Default is 8.
–title Heatmap main title. Default is Pairwise intersection
–space White space between barplt and heatmap, if –htype=tribar. Default is 1.3.
–figtype {pdf,svg,ps,tiff,png} Figure type for the plot. e.g. –figtype svg. Default is pdf
–figsize Figure size for the output plot (width,height). e.g. –figsize 8 8
–dpi Dots-per-inch (DPI) for the output. Default is: 300.
–scriptonly Set to generate Rscript only, if R/Corrplot package is not installed. Default is False
–test This will run the program on test data.

这里有很多有意思的参数。大家可以自己去探究。

其中我觉得最有意思的是–compute那边的选项,jaccard、reldist、fisher 等计算方法让我对overlap计算有了新的理解。

另外附上文档里面的notes……懒得翻了

The option --htype=tribar will generate a horizontal bar plot with an adjacent heatmap rotated 45 degrees to show the lower triangle of the matrix comparing all sets of bars. If you want to view upper triangle, please --triangle upper. It’s only recomended to use tribar if compute is set to jaccard or fisher.

Please make sure that the tribar will only show lower triangle of the matrix as heatmap and diagoals are set to zero. It recommended to use this if --compute is set to ``jaccard, fisher or reldist.

最后附上pairwise其他的一些图

intervene pairwise -i ~/dbSUPER/mm9/*.bed --filenames --compute frac --htype pie
image

这里调用的应该是corrplot这个包

intervene pairwise -i ~/dbSUPER/mm9/*.bed --filenames --compute frac --htype color
image
上一篇下一篇

猜你喜欢

热点阅读