关于代理ip的一些笔记
2019-01-22 本文已影响20人
silencefun
还是爬虫需要ip池支撑。
搜一下是一大堆免费的但是需要过滤筛选 能用的。
1.免费代理ip的获取
image.pnghttp://www.66ip.cn/nmtq.php?getnum=1
第一个 需要解析
第二个 可以自定义数量
2.验证
看能否使用正常访问
/**
* 测试 代理ip是否有效
*
* @param ip
* @param port
*/
public static void createIPAddress(String ip, int port) {
URL url = null;
try {
url = new URL("http://www.baidu.com");
} catch (MalformedURLException e) {
System.out.println("url invalidate");
}
InetSocketAddress addr = null;
addr = new InetSocketAddress(ip, port);
Proxy proxy = new Proxy(Proxy.Type.HTTP, addr); // http proxy
InputStream in = null;
try {
URLConnection conn = url.openConnection(proxy);
conn.setConnectTimeout(1000);
in = conn.getInputStream();
} catch (Exception e) {
e.printStackTrace();
System.err.println("ip " + ip + " is not aviable");// 异常IP
}
String s = convertStreamToString(in);
if (s.indexOf("baidu") > 0) {// 有效IP
System.err.println(ip + ":" + port + " is ok");
CrawlerUtis.appendLog("C:\\Users\\21555\\Desktop\\ip_enable.txt",
ip + " " + port + "\r\n");
}
}
public static String convertStreamToString(InputStream is) {
if (is == null)
return "";
BufferedReader reader = new BufferedReader(new InputStreamReader(is));
StringBuilder sb = new StringBuilder();
String line = null;
try {
while ((line = reader.readLine()) != null) {
sb.append(line+"\r\n");
}
} catch (IOException e) {
e.printStackTrace();
} finally {
try {
is.close();
} catch (IOException e) {
e.printStackTrace();
}
}
return sb.toString();
}
3.关于解析
3.1xicidail网站的解析
public static List<String> AnalyIppool() {
try {
URL url = new URL("https://www.xicidaili.com/nn/");
URLConnection connection = url.openConnection();
connection.setRequestProperty("User-Agent","Mozilla/4.0 (compatible, MSIE 7.0, Windows NT 5.1, TencentTraveler 4.0)");
//要加上User-Agent
connection.setRequestProperty("Charsert", "UTF-8"); //设置请求编码
connection.setRequestProperty("Content-Type", "application/json");
connection.connect();
InputStream in = connection.getInputStream();
Document document = Jsoup.parse(convertStreamToString(in));
Elements ss = document.getElementsByClass("odd");
for (Element element : ss) {
AnalyIpAndcheck(element.text());
//System.out.println(element.text());
}
} catch (Exception e) {
e.printStackTrace();
}
return null;
}
private static String AnalyIpAndcheck(String iporign) {
String[] ipp=iporign.split(" ");
createIPAddress(ipp[0],Integer.parseInt(ipp[1]));
return null;
}
3.2 第二个直接是接口数据
http://www.66ip.cn/nmtq.php?getnum=2000
请求多个ip,每次读一行,然后可以使用线程池来执行。
关键代码:
private static ThreadPoolExecutor executor== new ThreadPoolExecutor(5, 30, 300, TimeUnit.MILLISECONDS, new ArrayBlockingQueue<Runnable>(3),
new ThreadPoolExecutor.CallerRunsPolicy());
传入解析后的list
public static List<String> AnalyIppool2(String path) {
List<String> list = CrawlerUtis.filter(path);//转化为list 方法 通过本地文件读 可以直接写请求
for (String string : list) {
executor.execute(new Runnable() {
@Override
public void run() {
String[] ip=string.split(":");
createIPAddress(ip[0],Integer.parseInt(ip[1]));
}
});
}