二分法的简单使用问题
2018-07-31 本文已影响43人
JokerRun
使用场景
根据访问日志的ip地址计算出访问者的归属地.
归属地规则文件:
1.0.1.0|1.0.3.255|16777472|16778239|亚洲|中国|福建|福州||电信|350100|China|CN|119.306239|26.075302
1.0.8.0|1.0.15.255|16779264|16781311|亚洲|中国|广东|广州||电信|440100|China|CN|113.280637|23.125178
1.0.32.0|1.0.63.255|16785408|16793599|亚洲|中国|广东|广州||电信|440100|China|CN|113.280637|23.125178
1.1.0.0|1.1.0.255|16842752|16843007|亚洲|中国|福建|福州||电信|350100|China|CN|119.306239|26.075302
....
访问日志:
20090121000132095572000|125.213.100.123|show.51.com|/shoplist.php?phpfile=shoplist2.php&style=1&sex=137|Mozilla/4.0 (compatible; MSIE 6.0; Windows NT 5.1; SV1; Mozilla/4.0(Compatible Mozilla/4.0(Compatible-EmbeddedWB 14.59 http://bsalsa.com/ EmbeddedWB- 14.59 from: http://bsalsa.com/ )|http://show.51.com/main.php|
20090121000132124542000|117.101.215.133|www.jiayuan.com|/19245971|Mozilla/4.0 (compatible; MSIE 6.0; Windows NT 5.1; SV1; TencentTraveler 4.0)|http://photo.jiayuan.com/index.php?uidhash=d1c3b69e9b8355a5204474c749fb76ef|__tkist=0; myloc=50%7C5008; myage=2009; PROFILE=14469674%3A%E8%8B%A6%E6%B6%A9%E5%92%96%E5%95%A1%3Am%3Aphotos2.love21cn.com%2F45%2F1b%2F388111afac8195cc5d91ea286cdd%3A1%3A%3Ahttp%3A%2F%2Fimages.love21cn.com%2Fw4%2Fglobal%2Fi%2Fhykj_m.jpg; last_login_time=1232454068; SESSION_HASH=8176b100a84c9a095315f916d7fcbcf10021e3af; RAW_HASH=008a1bc48ff9ebafa3d5b4815edd04e9e7978050; COMMON_HASH=45388111afac8195cc5d91ea286cdd1b; pop_1232093956=1232468896968; pop_time=1232466715734; pop_1232245908=1232469069390; pop_1219903726=1232477601937; LOVESESSID=98b54794575bf547ea4b55e07efa2e9e; main_search:14469674=%7C%7C%7C00; registeruid=14469674; REG_URL_COOKIE=http%3A%2F%2Fphoto.jiayuan.com%2Fshowphoto.php%3Fuid_hash%3D0319bc5e33ba35755c30a9d88aaf46dc%26total%3D6%26p%3D5; click_count=0%2C3363619
20090121000132406516000|117.101.222.68|gg.xiaonei.com|/view.jsp?p=389|Mozilla/4.0 (compatible; MSIE 7.0; Windows NT 5.1; CIBA)|http://home.xiaonei.com/Home.do?id=229670724|_r01_=1; __utma=204579609.31669176.1231940225.1232462740.1232467011.145; __utmz=204579609.1231940225.1.1.utmccn=(direct)
20090121000132581311000|115.120.36.118|tj.tt98.com|/tj.htm|Mozilla/4.0 (compatible; MSIE 6.0; Windows NT 5.1; SV1; TheWorld)|http://www.tt98.com/|
开发思路
- 数据清洗+整理, 将规则文件写入List[(Long,Long,String)](注: 需要是排序过得list);
- 利用二分法, 对日志文件逐条匹配, 获取对应的规则文件中的地址信息(需要将IP地址转化为数值);
- 输出结果;
代码实现
规则获取
def readRules(path: String) = {
val source = Source.fromFile(path).getLines()
val rang2addr = source.map(line => {
val strings = line.split("\\|");
val floor = strings(2).toLong;
val top = strings(3).toLong;
(floor, top, strings(6))
})
rang2addr.toArray
}
地址转换
def getLongIP(ip: String) = {
val strings = ip.split("\\.")
var longIP: Long = strings(0).toLong * 256 * 256 * 256 + strings(1).toLong * 256 * 256 + strings(2).toLong * 256 + strings(3).toLong
longIP
}
二分法查找
设计错例
/**
* 二分法思路:
* ip >mid => mid ->min ; mid =(min+max )/2 ;
* ip <mid => mid ->max ; mid = ..
* ...
* ip = mid
* @param rules
* @param ip
* @return index
*/
def binarySerch(rules: Array[(Long, Long, String)], ip: Long): Int = {
var low:Int = 0
var high:Int = rules.length - 1
var mid:Int =Math.round( (low+high)/2)
while (low <= high) {
if (rules(mid)._2 < ip) {
low = mid
mid = Math.round( (low+high)/2)
} else if (rules(mid)._1 > ip) {
high = mid
mid =Math.round( (low+high)/2)
}
else {
return mid
}
}
-1
}
问题点
- 代码
mid:Int =Math.round( (low+high)/2)
中不需要调用round方法, 如:(1+2)/2=1
, 原因:
当二元运算的两个操作数的数据类型不同时,运算结果的数据类型和参与运算的操作数的数据类型中精度较高(或位数较长)一致。
Java的算数运算符、关系运算符、逻辑运算符、位运算符
- 代码逻辑问题:
while (low <= high) { // l =1 , h=2, m=1
if (rules(mid)._2 < ip) {//假设成立(实际就是存在成立的情况)
low = mid //l=m=1
mid = Math.round( (low+high)/2) //m=(1+2)/2 =1
}
...
此处从注释可见, 此时代码进入死循环,
一开始没走debug, 以为是因为数据量太大导致的. 此处有个疑点:
如果在测试环境遇到这种问题, 如何通过指令去查找bug ? 可以通过成神之路中什么JVM指令 查错 ?
代码调整
def binarySerch(rules: Array[(Long, Long, String)], ip: Long): Int = {
var low = 0
var high = rules.length - 1
var mid = (low+high)/2
while (low <= high) {
if (rules(mid)._2 < ip) { //最小范围最大值 < ip =>舍弃最小范围
low = mid +1
mid = (low+high)/2 // m=2, i=2,x=3
} else if (rules(mid)._1 > ip) { //最小范围最大值 < ip =>舍弃最大范围
high = mid -1
mid =(low+high)/2 //i=2,m=2,x=1
}
else {
return mid
}
}
-1
}
调整说明: 调整过程还是有点迷迷糊糊, 自己理解, 调整要点抓住一个点:
循环走到最后如果匹配不到结果, 则需要让代码能够跳出
while (low <= high)
这个循环条件
所以在循环体重, 对最小/大值的赋值思路做出调整:
low = mid
==>low = mid +1
即 :最小范围最大值 < ip =>舍弃最小范围
附
import scala.io.Source
object IpAddrFinderUtil {
def main(args: Array[String]): Unit = {
val ip = getLongIP("1.2.2.255")
val rules = readRules("R:/ip.txt")
val index = binarySerch(rules, ip)
val addr = rules(index)
println(addr)
}
def readRules(path: String) = {
val source = Source.fromFile(path).getLines()
val rang2addr = source.map(line => {
val strings = line.split("\\|");
val floor = strings(2).toLong;
val top = strings(3).toLong;
(floor, top, strings(6))
})
rang2addr.toArray
}
def getLongIP(ip: String) = {
val strings = ip.split("\\.")
var longIP: Long = strings(0).toLong * 256 * 256 * 256 + strings(1).toLong * 256 * 256 + strings(2).toLong * 256 + strings(3).toLong
longIP
}
/**
* 二分法匹配查找:
* @param rules
* @param ip
* @return index
*/
def binarySerch(rules: Array[(Long, Long, String)], ip: Long): Int = {
var low = 0
var high = rules.length - 1
var mid = (low+high)/2
while (low <= high) {
if (rules(mid)._2 < ip) { //最小范围最大值 < ip =>舍弃最小范围
low = mid +1
mid = (low+high)/2 // m=2, i=2,x=3
} else if (rules(mid)._1 > ip) { //最小范围最大值 < ip =>舍弃最大范围
high = mid -1
mid =(low+high)/2 //i=2,m=2,x=1
}
else {
return mid
}
}
-1
}
}