工作生活

爬取网页内容

2019-07-03  本文已影响0人  kanaSki
    public static void main(String[] args) throws Exception {
        URL url = new URL("https://www.jd.com");
        InputStream inputStream = url.openStream();
        BufferedReader br = new BufferedReader(new InputStreamReader(inputStream, "utf8"));
        String str = null;
        while ((str = br.readLine()) != null) {
            System.out.println(str);
        }
        br.close();
    }

但是有的网站不允许,因此可以模拟浏览器进行访问。

    public static void main(String[] args) throws Exception {
        URL url = new URL("https://www.dianping.com");
        HttpURLConnection urlConnection = (HttpURLConnection) url.openConnection();
        urlConnection.setRequestMethod("GET");
        urlConnection.setRequestProperty("User-Agent", "Mozilla/5.0 (Windows NT 10.0; WOW64) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/75.0.3770.100 Safari/537.36");
        BufferedReader br = new BufferedReader(new InputStreamReader(urlConnection.getInputStream()));
        String s = null;
        while ((s = br.readLine()) != null) {
            System.out.println(s);
        }
        br.close();
    }
上一篇 下一篇

猜你喜欢

热点阅读