# scrape A simple, higher level interface for Go web scraping. When scraping with Go, I find myself redefining tree traversal and other utility functions. This package is a place to put some simple tools which build on top of the [Go HTML parsing library](https://godoc.org/golang.org/x/net/html). For the full interface check out the godoc [![GoDoc](https://godoc.org/github.com/yhat/scrape?status.svg)](https://godoc.org/github.com/yhat/scrape) ## Sample Scrape defines traversal functions like `Find` and `FindAll` while attempting to be generic. It also defines convenience functions such as `Attr` and `Text`. ```go // Parse the page root, err := html.Parse(resp.Body) if err != nil { // handle error } // Search for the title title, ok := scrape.Find(root, scrape.ByTag(atom.Title)) if ok { // Print the title fmt.Println(scrape.Text(title)) } ``` ## A full example: Scraping Hacker News ```go package main import ( "fmt" "net/http" "github.com/yhat/scrape" "golang.org/x/net/html" "golang.org/x/net/html/atom" ) func main() { // request and parse the front page resp, err := http.Get("https://news.ycombinator.com/") if err != nil { panic(err) } root, err := html.Parse(resp.Body) if err != nil { panic(err) } // define a matcher matcher := func(n *html.Node) bool { // must check for nil values if n.DataAtom == atom.A && n.Parent != nil && n.Parent.Parent != nil { return scrape.Attr(n.Parent.Parent, "class") == "athing" } return false } // grab all articles and print them articles := scrape.FindAll(root, matcher) for i, article := range articles { fmt.Printf("%2d %s (%s)\n", i, scrape.Text(article), scrape.Attr(article, "href")) } } ```