收藏美文共赏

findOverlaps

2021-12-07  本文已影响0人  日月其除

findoverlaps
找到连个GRrange之间相交的区域
https://kasperdanielhansen.github.io/genbioconductor/html/GenomicRanges_GRanges_Usage.html

## S4 method for signature 'GInteractions,Vector'
findOverlaps(query, subject, maxgap=0L, minoverlap=1L,
    type=c("any", "start", "end", "within", "equal"),
    select=c("all", "first", "last", "arbitrary"),
    ignore.strand=FALSE, use.region="both")

Arguments
query, subject  
A Vector, GInteractions or InteractionSet object, depending on the specified method. At least one of these must be a GInteractions or InteractionSet object. Also, subject can be missing if query is a GInteractions or InteractionSet object.

maxgap, minoverlap, type    
See ?findOverlaps in the GenomicRanges package.

select, ignore.strand   
See ?findOverlaps in the GenomicRanges package.

use.region  
A string specifying the regions to be used to identify overlaps.

默认采用any的比对模式。也就是说只要两个序列有重叠,就计算在内、如果想找query完全落在subject中的序列,就需要使用within
select参数说明:
解决的问题是,如果一个query同时比对到subject多个位置,应该选哪个。
默认使用all,将全部位置输出
first:输出比对的第一个
last:输出比对的最后一个
arbitary:随机输出一个
如果没有重叠区域,就输出NA

For findOverlaps, a Hits object is returned if select="all", and an integer vector of subject indices otherwise.
参考网址:
https://www.imsbio.co.jp/RGM/R_rdfile?f=InteractionSet/man/overlaps.Rd&d=R_BC

gr <- GRanges(
       seqnames = Rle(c("chr1", "chr2", "chr1", "chr3"), c(1, 3, 2, 4)),
       ranges = IRanges(101:110, end = 111:120, names = head(letters, 10)),
       strand = Rle(strand(c("-", "+", "*", "+", "-")), c(1, 2, 2, 3, 2)),
       score = 1:10,
       GC = seq(1, 0, length=10))


gr1 <- GRanges(
       seqnames = "chr1",
       ranges = IRanges(100, end = 110),
       strand = Rle(strand(c("-"))),
       score = 1)
> gr
GRanges object with 10 ranges and 2 metadata columns:
    seqnames    ranges strand |     score        GC
       <Rle> <IRanges>  <Rle> | <integer> <numeric>
  a     chr1   101-111      - |         1  1.000000
  b     chr2   102-112      + |         2  0.888889
  c     chr2   103-113      + |         3  0.777778
  d     chr2   104-114      * |         4  0.666667
  e     chr1   105-115      * |         5  0.555556
  f     chr1   106-116      + |         6  0.444444
  g     chr3   107-117      + |         7  0.333333
  h     chr3   108-118      + |         8  0.222222
  i     chr3   109-119      - |         9  0.111111
  j     chr3   110-120      - |        10  0.000000
  -------
  seqinfo: 3 sequences from an unspecified genome; no seqlengths
> gr1
GRanges object with 1 range and 1 metadata column:
      seqnames    ranges strand |     score
         <Rle> <IRanges>  <Rle> | <numeric>
  [1]     chr1   100-110      - |         1
  -------
  seqinfo: 1 sequence from an unspecified genome; no seqlengths

> findOverlaps(gr,gr1,  ignore.strand = T)
Hits object with 3 hits and 0 metadata columns:
      queryHits subjectHits
      <integer>   <integer>
  [1]         1           1
  [2]         5           1
  [3]         6           1
  -------
  queryLength: 10 / subjectLength: 1

type参数尝试

> findOverlaps(gr,gr1,  ignore.strand = T, type = "any")
Hits object with 3 hits and 0 metadata columns:
      queryHits subjectHits
      <integer>   <integer>
  [1]         1           1
  [2]         5           1
  [3]         6           1
  -------
  queryLength: 10 / subjectLength: 1

> findOverlaps(gr,gr1,  ignore.strand = T, type = "start")
Hits object with 0 hits and 0 metadata columns:
   queryHits subjectHits
   <integer>   <integer>
  -------
  queryLength: 10 / subjectLength: 1

> findOverlaps(gr,gr1,  ignore.strand = T, type = "end")
Hits object with 0 hits and 0 metadata columns:
   queryHits subjectHits
   <integer>   <integer>
  -------
  queryLength: 10 / subjectLength: 1

> gr2 <- GRanges(
+        seqnames = "chr1",
+        ranges = IRanges(c(100,103), end = c(102,110)),
+        strand = Rle(strand(c("-","+"))),
+        score = c(1,2))
> gr2
GRanges object with 2 ranges and 1 metadata column:
      seqnames    ranges strand |     score
         <Rle> <IRanges>  <Rle> | <numeric>
  [1]     chr1   100-102      - |         1
  [2]     chr1   103-110      + |         2
  -------
  seqinfo: 1 sequence from an unspecified genome; no seqlengths

> findOverlaps(gr2,gr,  ignore.strand = T, type = "within")
Hits object with 1 hit and 0 metadata columns:
      queryHits subjectHits
      <integer>   <integer>
  [1]         2           1
  -------
  queryLength: 2 / subjectLength: 10

使用queryHits() 以及subjectHits() 提取数据


image.png
上一篇下一篇

猜你喜欢

热点阅读