掘金扩展中热门的Github开源项目数据是任何获取的
2018-08-08 本文已影响3人
CoderMiner
Github是一座金矿,里面有很多宝藏亟待挖掘,一直在关注掘金,掘金的chrome扩展中列出了github中本日、本周、本月的热门的开源项目,一直想知道这个数据是怎么来的,因为github的api的是公开的,但是并没有发现类似的接口,直到发现 Github Trending 发现这里的数据和掘金扩展中的数据一致,但是 github trending
并没有相关的API,就利用爬虫,爬取相关的页面并生成对应的api,这里有一个 nodejs版本的接口 https://github.com/huchenme/github-trending-api,改造一下成 Go 语言版本的
Go 爬虫
需要安装 goquery ,类似 jquery
的Go版本,可以向 jquery
一样选择对应的dom信息 go get github.com/PuerkitoBio/goquery

- 获取 所有的编程语言
func GetLanguages(w http.ResponseWriter, r *http.Request) {
client := &http.Client{}
req, err := http.NewRequest("GET", "https://github.com/trending", nil)
if err != nil {
panic(err)
}
req.Header.Set("User-Agent", "Mozilla/5.0 (Windows NT 10.0; Win64; x64) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/68.0.3440.84 Safari/537.36")
resp, err := client.Do(req)
if err != nil {
panic(err)
}
defer resp.Body.Close()
doc, err := goquery.NewDocumentFromResponse(resp)
if err != nil {
panic(err)
}
results := make([]map[string]string, 0)
doc.Find(".select-menu-item.js-navigation-item ").Each(func(i int, contentSelection *goquery.Selection) {
if i > 2 {
href, _ := contentSelection.Attr("href")
span := contentSelection.Find("span").Text()
reg := regexp.MustCompile(`/trending/([^?/]+)`)
s := reg.FindStringSubmatch(href)[1]
params := make(map[string]string)
params["alias"] = s
params["name"] = span
results = append(results, params)
}
})
helper.ResponseWithJson(w, http.StatusOK, results)
}
- 获取对应语言的 热门库
func Repository(w http.ResponseWriter, r *http.Request) {
vars := mux.Vars(r)
lang := vars["lang"]
since := r.FormValue("since")
client := &http.Client{}
fmt.Println(lang, since)
if since == "" {
since = "daily"
}
url := fmt.Sprintf("https://github.com/trending/%s?since=%s", lang, since)
req, err := http.NewRequest("GET", url, nil)
if err != nil {
panic(err)
}
req.Header.Set("User-Agent", "Mozilla/5.0 (Windows NT 10.0; Win64; x64) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/68.0.3440.84 Safari/537.36")
resp, err := client.Do(req)
if err != nil {
panic(err)
}
defer resp.Body.Close()
doc, err := goquery.NewDocumentFromResponse(resp)
if err != nil {
panic(err)
}
results := make([]map[string]string, 0)
doc.Find(".col-12.d-block.width-full.py-4.border-bottom").Each(func(i int, contentSelection *goquery.Selection) {
s := contentSelection.Find(".d-inline-block.col-9.mb-1")
a := s.Find("a")
href, _ := a.Attr("href")
text := strings.Replace(a.Text(), " ", "", -1)
des := contentSelection.Find(".py-1 p")
name := strings.FieldsFunc(strings.Split(text, "/")[0], unicode.IsSpace)[0]
params := make(map[string]string)
params["url"] = "https://github.com" + href
params["name"] = name
params["des"] = des.Text()
div_a := contentSelection.Find(".f6.text-gray.mt-2")
div_a.Find(".muted-link.d-inline-block.mr-3").Each(func(i int, cs *goquery.Selection) {
if i == 0 {
params["stars"] = strings.FieldsFunc(cs.Text(), unicode.IsSpace)[0]
} else if i == 1 {
params["forks"] = strings.FieldsFunc(cs.Text(), unicode.IsSpace)[0]
}
})
results = append(results, params)
})
helper.ResponseWithJson(w, http.StatusOK, results)
}
访问示例Java语言:
今日热门: https://www.coderminer.com/github/trending/java?since=daily
本周热门: https://www.coderminer.com/github/trending/java?since=weekly
本月热门: https://www.coderminer.com/github/trending/java?since=monthly
Go实现Restful Api,请参考 使用Golang和MongoDB构建 RESTful API
不要频繁访问小心被baned,需要利用缓存或者数据库进行缓存,然后从缓存中读取对应的信息