实用小公举

bedtools makewindows 生成滑窗区域文本

2019-07-09  本文已影响0人  Amy_Cui

学习目的,就为了得到这个文本,命令:bedtools makewindows -g genome.chr.ln -w 200000 >200K.genome.3col;当然你还可以写一个单行命令来处理。我这里用的是bedtools软件。

genome.chr.ln示例.png 结果文件:相隔200k的bed文件

安装

官网各种安装方法

conda 安装

$ conda install bedtools
$ bedtools --help
bedtools is a powerful toolset for genome arithmetic.

Version:   v2.28.0
About:     developed in the quinlanlab.org and by many contributors worldwide.
Docs:      http://bedtools.readthedocs.io/
Code:      https://github.com/arq5x/bedtools2
Mail:      https://groups.google.com/forum/#!forum/bedtools-discuss

Usage:     bedtools <subcommand> [options]
...

源码安装

$ wget https://github.com/arq5x/bedtools2/releases/download/v2.28.0/bedtools-2.28.0.tar.gz
$ tar -zxvf bedtools-2.28.0.tar.gz
$ cd bedtools2
$ make

系统安装

需要管理员权限

Fedora / Centos。Adam Huffman为bedtools创建了一个Red Hat软件包,以便可以使用Fedora软件包管理器“yum”轻松安装最新版本。它应该适用于Fedora 13,14和EPEL5 / 6(适用于Centos,Scientific Linux等)。

yum install BEDTools

于Debian / Ubuntu。Charles Plessy还维护着一个Debian软件包,可以在Ubuntu等衍生产品中找到。非常感谢Charles这样做。

apt-get install bedtools

自制。Carlos Borroto已经在OSX的bedtools包管理器上提供了BEDTools。

brew tap homebrew/science
brew install bedtools

MacPorts。或者,MacPorts端口系统可用于在OSX上安装BEDTools。

port install bedtools

makewindows的使用

查看bedtools makewindows 的帮助文档,最下面是示例,非常实用,看完就会了

准备-g文件

samtools dict /public/reference/genome/hg38/hg38.fa >hg38.fa.dict
# One can use the UCSC Genome Browser's MySQL database to extract: chromosome sizes. For example, H. sapiens:
#  mysql --user=genome --host=genome-mysql.cse.ucsc.edu -A -e \
#        "select chrom, size from hg19.chromInfo" > hg19.genome
cat hg38.fa.dict|grep -v '^@HD'|sed 's/:/\t/g'|cut -f 3,5 >genome.chr.ln
bedtools makewindows -g genome.chr.ln -w 200000 >200K.genome.3col

结束我的需求!


帮助文档示例写的很明白


*****
*****ERROR: Need -g (genome file) or -b (BED file) for interval source. 
*****

*****
*****ERROR: Need -w (window size) or -n (number of windows). 
*****

Tool: bedtools makewindows
Version: v2.28.0
Summary: Makes adjacent or sliding windows across a genome or BED file.

Usage: bedtools makewindows [OPTIONS] [-g <genome> OR -b <bed>]
 [ -w <window_size> OR -n <number of windows> ]

Input Options: 
    -g <genome>
        Genome file size (see notes below).
        Windows will be created for each chromosome in the file.

    -b <bed>
        BED file (with chrom,start,end fields).
        Windows will be created for each interval in the file.

Windows Output Options: 
    -w <window_size>
        Divide each input interval (either a chromosome or a BED interval)
        to fixed-sized windows (i.e. same number of nucleotide in each window).
        Can be combined with -s <step_size>

    -s <step_size>
        Step size: i.e., how many base pairs to step before
        creating a new window. Used to create "sliding" windows.
        - Defaults to window size (non-sliding windows).

    -n <number_of_windows>
        Divide each input interval (either a chromosome or a BED interval)
        to fixed number of windows (i.e. same number of windows, with
        varying window sizes).

    -reverse
         Reverse numbering of windows in the output, i.e. report 
         windows in decreasing order

ID Naming Options: 
    -i src|winnum|srcwinnum
        The default output is 3 columns: chrom, start, end .
        With this option, a name column will be added.
         "-i src" - use the source interval's name.
         "-i winnum" - use the window number as the ID (e.g. 1,2,3,4...).
         "-i srcwinnum" - use the source interval's name with the window number.
        See below for usage examples.

Notes: 
    (1) The genome file should tab delimited and structured as follows:
     <chromName><TAB><chromSize>

    For example, Human (hg19):
    chr1    249250621
    chr2    243199373
    ...
    chr18_gl000207_random   4262

Tips: 
    One can use the UCSC Genome Browser's MySQL database to extract
    chromosome sizes. For example, H. sapiens:

    mysql --user=genome --host=genome-mysql.cse.ucsc.edu -A -e \
    "select chrom, size from hg19.chromInfo" > hg19.genome

Examples: 
 # Divide the human genome into windows of 1MB:
 $ bedtools makewindows -g hg19.txt -w 1000000
 chr1 0 1000000
 chr1 1000000 2000000
 chr1 2000000 3000000
 chr1 3000000 4000000
 chr1 4000000 5000000
 ...

 # Divide the human genome into sliding (=overlapping) windows of 1MB, with 500KB overlap:
 $ bedtools makewindows -g hg19.txt -w 1000000 -s 500000
 chr1 0 1000000
 chr1 500000 1500000
 chr1 1000000 2000000
 chr1 1500000 2500000
 chr1 2000000 3000000
 ...

 # Divide each chromosome in human genome to 1000 windows of equal size:
 $ bedtools makewindows -g hg19.txt -n 1000
 chr1 0 249251
 chr1 249251 498502
 chr1 498502 747753
 chr1 747753 997004
 chr1 997004 1246255
 ...

 # Divide each interval in the given BED file into 10 equal-sized windows:
 $ cat input.bed
 chr5 60000 70000
 chr5 73000 90000
 chr5 100000 101000
 $ bedtools makewindows -b input.bed -n 10
 chr5 60000 61000
 chr5 61000 62000
 chr5 62000 63000
 chr5 63000 64000
 chr5 64000 65000
 ...

 # Add a name column, based on the window number: 
 $ cat input.bed
 chr5  60000  70000 AAA
 chr5  73000  90000 BBB
 chr5 100000 101000 CCC
 $ bedtools makewindows -b input.bed -n 3 -i winnum
 chr5        60000   63334   1
 chr5        63334   66668   2
 chr5        66668   70000   3
 chr5        73000   78667   1
 chr5        78667   84334   2
 chr5        84334   90000   3
 chr5        100000  100334  1
 chr5        100334  100668  2
 chr5        100668  101000  3
 ...

 # Reverse window numbers: 
 $ cat input.bed
 chr5  60000  70000 AAA
 chr5  73000  90000 BBB
 chr5 100000 101000 CCC
 $ bedtools makewindows -b input.bed -n 3 -i winnum -reverse
 chr5        60000   63334   3
 chr5        63334   66668   2
 chr5        66668   70000   1
 chr5        73000   78667   3
 chr5        78667   84334   2
 chr5        84334   90000   1
 chr5        100000  100334  3
 chr5        100334  100668  2
 chr5        100668  101000  1
 ...

 # Add a name column, based on the source ID + window number: 
 $ cat input.bed
 chr5  60000  70000 AAA
 chr5  73000  90000 BBB
 chr5 100000 101000 CCC
 $ bedtools makewindows -b input.bed -n 3 -i srcwinnum
 chr5        60000   63334   AAA_1
 chr5        63334   66668   AAA_2
 chr5        66668   70000   AAA_3
 chr5        73000   78667   BBB_1
 chr5        78667   84334   BBB_2
 chr5        84334   90000   BBB_3
 chr5        100000  100334  CCC_1
 chr5        100334  100668  CCC_2
 chr5        100668  101000  CCC_3
 ...
上一篇 下一篇

猜你喜欢

热点阅读