Web Crawler (1): What is Web-Cra

2020-11-03 本文已影响0人 Yang_silin

What is Web Crawler?

I believe that you must hear the web-crawler or web-spider. It is very popular now, especially in the Internet industry. Essentially web-scrawler is a program that can automatically gain specific information from the website. They can be some resources like pictures, videos, or a set of information like links, names. Theoretically, almost all things displaying on the web can be craped. But the usage of it depends on your target.

Generally, there are four kinds of crawlers.

Scalable web crawler
Focused web crawler
Incremental web crawler
Deep web crawler

Scalable web crawler

This kind of crawler search extensive in lots of pages. A typical example is a search engine like Google. According to the input message, google find the relevant keyword and return the link.
But it is impossible that search engines work through all pages on the Internet, it is so much. And it also can't locate the specific information so accurately.

Focused web crawler/Topical crawler

If users have some more specific demand for scraped information, a Focused web crawler is more suitable. These crawler scraping in specific pages that be considered to be more important.
That also means people should specifically analyze what kind of information do we need, which page do we look for, how locates the Information. Sometimes people also need to design the search strategy and filter data.

Incremental web crawler

An incremental web crawler only updates the databank based on existing data. It is good for users to gain the newest information rather than rerun the whole program, which can avoid many repeating data.

Deep web crawler

The crawler is used to gain some information that only displaying if you post some needed form. For example, some pages will be shown after you register and log in.