利用bedtools提取基因组指定区域序列

2021-10-27 本文已影响0人 qujingtao

利用bedtools能够快速批量的提取基因组上指定区域的序列。

1. Example：

bedtools getfasta -fi example_genome.fasta -bed example.bed -fo example.fa -name

文件	说明
example_genome.fasta	基因组序列；
example.bed	指定位置，bed文件前四列分别为染色体、起始位置、结束位置及命名，列之间以制表符分隔（\t），需要提取多个位置，按行分隔；
example.fa	截取序列的输出文件。

在提取指定位点的前后各100bp时，如指定位点为Chr1 12345，bed文件中可以位置应该为Chr1 12345-101 12345+100。

2. 安装

bedtools软件的安装，建议在有网络的情况下利用conda安装，方便快捷，安装命令：

conda install -c bioconda bedtools

Ubuntu系统也可以利用apt进行安装:

sudo apt-get install bedtools

3. 说明书

Tool:    bedtools getfasta (aka fastaFromBed)
Version: v2.27.1
Summary: Extract DNA sequences from a fasta file based on feature coordinates.

Usage:   bedtools getfasta [OPTIONS] -fi <fasta> -bed <bed/gff/vcf>

Options:
        -fi     Input FASTA file
        -fo     Output file (opt., default is STDOUT
        -bed    BED/GFF/VCF file of ranges to extract from -fi
        -name   Use the name field for the FASTA header
        -name+  Use the name field and coordinates for the FASTA header
        -split  given BED12 fmt., extract and concatenate the sequences
                from the BED "blocks" (e.g., exons)
        -tab    Write output in TAB delimited format.
                - Default is FASTA format.

        -s      Force strandedness. If the feature occupies the antisense,
                strand, the sequence will be reverse complemented.
                - By default, strand information is ignored.

        -fullHeader     Use full fasta header.
                - By default, only the word before the first space or tab
                is used.

利用bedtools提取基因组指定区域序列

1. Example：

2. 安装

3. 说明书

猜你喜欢

热点阅读