JAVA爬虫技术

2017-09-08  本文已影响49人  葡小萄家的猫

总结:爬虫开发的两个核心技术

* Httpclient:帮助我们更好发送网络请求
* Jsoup:帮助我们更好的解析html。两个重点理解jar包

HTTPCLIENT的介绍(转)
JSOUP的介绍

使用maven创建Java工作环境并配置pom.xml
  <dependencies>
    <dependency>
        <groupId>org.apache.httpcomponents</groupId>
            <artifactId>httpclient</artifactId>
        <version>4.5.3</version>
    </dependency>
    <!-- jsoup HTML parser library @ https://jsoup.org/ -->
    <dependency>
        <groupId>org.jsoup</groupId>
        <artifactId>jsoup</artifactId>
        <version>1.10.3</version>
    </dependency>
    <!-- https://mvnrepository.com/artifact/org.springframework/spring-jdbc -->
    <dependency>
        <groupId>org.springframework</groupId>
        <artifactId>spring-jdbc</artifactId>
        <version>4.2.6.RELEASE</version>
    </dependency>

    <dependency>
        <groupId>mysql</groupId>
        <artifactId>mysql-connector-java</artifactId>
        <version>5.1.41</version>
    </dependency>
    <dependency>
        <groupId>c3p0</groupId>
        <artifactId>c3p0</artifactId>
        <version>0.9.1.2</version>
    </dependency>
    <dependency>
        <groupId>com.alibaba</groupId>
        <artifactId>fastjson</artifactId>
        <version>1.2.31</version>
    </dependency>
    <dependency>
        <groupId>com.google.code.gson</groupId>
        <artifactId>gson</artifactId>
        <version>2.8.1</version>
    </dependency>
    
    <dependency>
        <groupId>redis.clients</groupId>
        <artifactId>jedis</artifactId>
        <version>2.8.0</version>
    </dependency>
</dependencies>
上一篇 下一篇

猜你喜欢

热点阅读