二分法的简单使用问题

2018-07-31  本文已影响43人  JokerRun

使用场景

根据访问日志的ip地址计算出访问者的归属地.
归属地规则文件:

1.0.1.0|1.0.3.255|16777472|16778239|亚洲|中国|福建|福州||电信|350100|China|CN|119.306239|26.075302
1.0.8.0|1.0.15.255|16779264|16781311|亚洲|中国|广东|广州||电信|440100|China|CN|113.280637|23.125178
1.0.32.0|1.0.63.255|16785408|16793599|亚洲|中国|广东|广州||电信|440100|China|CN|113.280637|23.125178
1.1.0.0|1.1.0.255|16842752|16843007|亚洲|中国|福建|福州||电信|350100|China|CN|119.306239|26.075302
....

访问日志:

20090121000132095572000|125.213.100.123|show.51.com|/shoplist.php?phpfile=shoplist2.php&style=1&sex=137|Mozilla/4.0 (compatible; MSIE 6.0; Windows NT 5.1; SV1; Mozilla/4.0(Compatible Mozilla/4.0(Compatible-EmbeddedWB 14.59 http://bsalsa.com/ EmbeddedWB- 14.59  from: http://bsalsa.com/ )|http://show.51.com/main.php|
20090121000132124542000|117.101.215.133|www.jiayuan.com|/19245971|Mozilla/4.0 (compatible; MSIE 6.0; Windows NT 5.1; SV1; TencentTraveler 4.0)|http://photo.jiayuan.com/index.php?uidhash=d1c3b69e9b8355a5204474c749fb76ef|__tkist=0; myloc=50%7C5008; myage=2009; PROFILE=14469674%3A%E8%8B%A6%E6%B6%A9%E5%92%96%E5%95%A1%3Am%3Aphotos2.love21cn.com%2F45%2F1b%2F388111afac8195cc5d91ea286cdd%3A1%3A%3Ahttp%3A%2F%2Fimages.love21cn.com%2Fw4%2Fglobal%2Fi%2Fhykj_m.jpg; last_login_time=1232454068; SESSION_HASH=8176b100a84c9a095315f916d7fcbcf10021e3af; RAW_HASH=008a1bc48ff9ebafa3d5b4815edd04e9e7978050; COMMON_HASH=45388111afac8195cc5d91ea286cdd1b; pop_1232093956=1232468896968; pop_time=1232466715734; pop_1232245908=1232469069390; pop_1219903726=1232477601937; LOVESESSID=98b54794575bf547ea4b55e07efa2e9e; main_search:14469674=%7C%7C%7C00; registeruid=14469674; REG_URL_COOKIE=http%3A%2F%2Fphoto.jiayuan.com%2Fshowphoto.php%3Fuid_hash%3D0319bc5e33ba35755c30a9d88aaf46dc%26total%3D6%26p%3D5; click_count=0%2C3363619
20090121000132406516000|117.101.222.68|gg.xiaonei.com|/view.jsp?p=389|Mozilla/4.0 (compatible; MSIE 7.0; Windows NT 5.1; CIBA)|http://home.xiaonei.com/Home.do?id=229670724|_r01_=1; __utma=204579609.31669176.1231940225.1232462740.1232467011.145; __utmz=204579609.1231940225.1.1.utmccn=(direct)
20090121000132581311000|115.120.36.118|tj.tt98.com|/tj.htm|Mozilla/4.0 (compatible; MSIE 6.0; Windows NT 5.1; SV1; TheWorld)|http://www.tt98.com/|

开发思路

  1. 数据清洗+整理, 将规则文件写入List[(Long,Long,String)](注: 需要是排序过得list);
  2. 利用二分法, 对日志文件逐条匹配, 获取对应的规则文件中的地址信息(需要将IP地址转化为数值);
  3. 输出结果;

代码实现

规则获取

  def readRules(path: String) = {
    val source = Source.fromFile(path).getLines()
    val rang2addr = source.map(line => {
      val strings = line.split("\\|");
      val floor = strings(2).toLong;
      val top = strings(3).toLong;
      (floor, top, strings(6))
    })
    rang2addr.toArray
  }

地址转换

  def getLongIP(ip: String) = {
    val strings = ip.split("\\.")
    var longIP: Long = strings(0).toLong * 256 * 256 * 256 + strings(1).toLong * 256 * 256 + strings(2).toLong * 256 + strings(3).toLong
    longIP
  }

二分法查找

设计错例


  /**
    * 二分法思路:
    * ip >mid => mid ->min ; mid =(min+max )/2 ;
    * ip <mid => mid ->max ; mid = ..
    * ...
    * ip = mid
    * @param rules
    * @param ip
    * @return index
    */
  def binarySerch(rules: Array[(Long, Long, String)], ip: Long): Int = {
    var low:Int = 0
    var high:Int = rules.length - 1
    var mid:Int =Math.round( (low+high)/2)
    while (low <= high) {
      if (rules(mid)._2 < ip) {
        low = mid 
        mid = Math.round( (low+high)/2)
      } else if (rules(mid)._1 > ip) { 
        high = mid
        mid =Math.round( (low+high)/2)
      }
      else {
        return mid
      }
    }
    -1
  }

问题点

  1. 代码 mid:Int =Math.round( (low+high)/2) 中不需要调用round方法, 如: (1+2)/2=1, 原因:

当二元运算的两个操作数的数据类型不同时,运算结果的数据类型和参与运算的操作数的数据类型中精度较高(或位数较长)一致。
Java的算数运算符、关系运算符、逻辑运算符、位运算符

  1. 代码逻辑问题:
    while (low <= high) { // l =1 , h=2, m=1
      if (rules(mid)._2 < ip) {//假设成立(实际就是存在成立的情况)
        low = mid //l=m=1
        mid = Math.round( (low+high)/2) //m=(1+2)/2 =1
      }
      ...

此处从注释可见, 此时代码进入死循环,
一开始没走debug, 以为是因为数据量太大导致的. 此处有个疑点:

如果在测试环境遇到这种问题, 如何通过指令去查找bug ? 可以通过成神之路中什么JVM指令 查错 ?

代码调整

  def binarySerch(rules: Array[(Long, Long, String)], ip: Long): Int = {
    var low = 0
    var high = rules.length - 1
    var mid = (low+high)/2
    while (low <= high) {

      if (rules(mid)._2 < ip) { //最小范围最大值 < ip =>舍弃最小范围
        low = mid +1
        mid = (low+high)/2  // m=2, i=2,x=3
      } else if (rules(mid)._1 > ip) { //最小范围最大值 < ip =>舍弃最大范围
        high = mid -1
        mid =(low+high)/2 //i=2,m=2,x=1
      }
      else {
        return mid
      }
    }
    -1
  }

调整说明: 调整过程还是有点迷迷糊糊, 自己理解, 调整要点抓住一个点:

循环走到最后如果匹配不到结果, 则需要让代码能够跳出while (low <= high)这个循环条件

所以在循环体重, 对最小/大值的赋值思路做出调整:

low = mid ==> low = mid +1
即 :最小范围最大值 < ip =>舍弃最小范围

import scala.io.Source

object IpAddrFinderUtil {
  def main(args: Array[String]): Unit = {
    val ip = getLongIP("1.2.2.255")
    val rules = readRules("R:/ip.txt")
    val index = binarySerch(rules, ip)
    val addr = rules(index)
    println(addr)

  }
  def readRules(path: String) = {
    val source = Source.fromFile(path).getLines()
    val rang2addr = source.map(line => {
      val strings = line.split("\\|");
      val floor = strings(2).toLong;
      val top = strings(3).toLong;
      (floor, top, strings(6))
    })
    rang2addr.toArray
  }

  def getLongIP(ip: String) = {
    val strings = ip.split("\\.")
    var longIP: Long = strings(0).toLong * 256 * 256 * 256 + strings(1).toLong * 256 * 256 + strings(2).toLong * 256 + strings(3).toLong
    longIP
  }


  /**
    * 二分法匹配查找:
    * @param rules
    * @param ip
    * @return index
    */
  def binarySerch(rules: Array[(Long, Long, String)], ip: Long): Int = {
    var low = 0
    var high = rules.length - 1
    var mid = (low+high)/2
    while (low <= high) {

      if (rules(mid)._2 < ip) { //最小范围最大值 < ip =>舍弃最小范围
        low = mid +1
        mid = (low+high)/2 // m=2, i=2,x=3
      } else if (rules(mid)._1 > ip) { //最小范围最大值 < ip =>舍弃最大范围
        high = mid -1
        mid =(low+high)/2 //i=2,m=2,x=1
      }
      else {
        return mid
      }
    }
    -1
  }
}

上一篇 下一篇

猜你喜欢

热点阅读