gopl-zh.github.com/ch5/ch5-02.md

## 5.2. 递归

函数可以是递归的，这意味着函数可以直接或间接的调用自身。对许多问题而言，递归是一种强有力的技术，例如处理递归的数据结构。在4.4节，我们通过遍历二叉树来实现简单的插入排序，在本章节，我们再次使用它来处理HTML文件。

下文的示例代码使用了非标准包 golang.org/x/net/html ，解析HTML。golang.org/x/... 目录下存储了一些由Go团队设计、维护，对网络编程、国际化文件处理、移动平台、图像处理、加密解密、开发者工具提供支持的扩展包。未将这些扩展包加入到标准库原因有二，一是部分包仍在开发中，二是对大多数Go语言的开发者而言，扩展包提供的功能很少被使用。

例子中调用golang.org/x/net/html的部分api如下所示。html.Parse函数读入一组bytes解析后，返回html.Node类型的HTML页面树状结构根节点。HTML拥有很多类型的结点如text（文本）、comments（注释）类型，在下面的例子中，我们 只关注< name key='value' >形式的结点。

<u><i>golang.org/x/net/html</i></u>
```Go
package html

type Node struct {
	Type                    NodeType
	Data                    string
	Attr                    []Attribute
	FirstChild, NextSibling *Node
}

type NodeType int32

const (
	ErrorNode NodeType = iota
	TextNode
	DocumentNode
	ElementNode
	CommentNode
	DoctypeNode
)

type Attribute struct {
	Key, Val string
}

func Parse(r io.Reader) (*Node, error)
```

main函数解析HTML标准输入，通过递归函数visit获得links（链接），并打印出这些links：

<u><i>gopl.io/ch5/findlinks1</i></u>
```Go
// Findlinks1 prints the links in an HTML document read from standard input.
package main

import (
	"fmt"
	"os"

	"golang.org/x/net/html"
)

func main() {
	doc, err := html.Parse(os.Stdin)
	if err != nil {
		fmt.Fprintf(os.Stderr, "findlinks1: %v\n", err)
		os.Exit(1)
	}
	for _, link := range visit(nil, doc) {
		fmt.Println(link)
	}
}
```

visit函数遍历HTML的节点树，从每一个anchor元素的href属性获得link,将这些links存入字符串数组中，并返回这个字符串数组。

```Go
// visit appends to links each link found in n and returns the result.
func visit(links []string, n *html.Node) []string {
	if n.Type == html.ElementNode && n.Data == "a" {
		for _, a := range n.Attr {
			if a.Key == "href" {
				links = append(links, a.Val)
			}
		}
	}
	for c := n.FirstChild; c != nil; c = c.NextSibling {
		links = visit(links, c)
	}
	return links
}
```

为了遍历结点n的所有后代结点，每次遇到n的孩子结点时，visit递归的调用自身。这些孩子结点存放在FirstChild链表中。

让我们以Go的主页（golang.org）作为目标，运行findlinks。我们以fetch（1.5章）的输出作为findlinks的输入。下面的输出做了简化处理。

```
$ go build gopl.io/ch1/fetch
$ go build gopl.io/ch5/findlinks1
$ ./fetch https://golang.org | ./findlinks1
#
/doc/
/pkg/
/help/
/blog/
http://play.golang.org/
//tour.golang.org/
https://golang.org/dl/
//blog.golang.org/
/LICENSE
/doc/tos.html
http://www.google.com/intl/en/policies/privacy/
```

注意在页面中出现的链接格式，在之后我们会介绍如何将这些链接，根据根路径（ https://golang.org ）生成可以直接访问的url。

在函数outline中，我们通过递归的方式遍历整个HTML结点树，并输出树的结构。在outline内部，每遇到一个HTML元素标签，就将其入栈，并输出。

<u><i>gopl.io/ch5/outline</i></u>
```Go
func main() {
	doc, err := html.Parse(os.Stdin)
	if err != nil {
		fmt.Fprintf(os.Stderr, "outline: %v\n", err)
		os.Exit(1)
	}
	outline(nil, doc)
}
func outline(stack []string, n *html.Node) {
	if n.Type == html.ElementNode {
		stack = append(stack, n.Data) // push tag
		fmt.Println(stack)
	}
	for c := n.FirstChild; c != nil; c = c.NextSibling {
		outline(stack, c)
	}
}
```

有一点值得注意：outline有入栈操作，但没有相对应的出栈操作。当outline调用自身时，被调用者接收的是stack的拷贝。被调用者对stack的元素追加操作，修改的是stack的拷贝，其可能会修改slice底层的数组甚至是申请一块新的内存空间进行扩容；但这个过程并不会修改调用方的stack。因此当函数返回时，调用方的stack与其调用自身之前完全一致。

下面是 https://golang.org 页面的简要结构:

```
$ go build gopl.io/ch5/outline
$ ./fetch https://golang.org | ./outline
[html]
[html head]
[html head meta]
[html head title]
[html head link]
[html body]
[html body div]
[html body div]
[html body div div]
[html body div div form]
[html body div div form div]
[html body div div form div a]
...
```

正如你在上面实验中所见，大部分HTML页面只需几层递归就能被处理，但仍然有些页面需要深层次的递归。

大部分编程语言使用固定大小的函数调用栈，常见的大小从64KB到2MB不等。固定大小栈会限制递归的深度，当你用递归处理大量数据时，需要避免栈溢出；除此之外，还会导致安全性问题。与此相反，Go语言使用可变栈，栈的大小按需增加（初始时很小）。这使得我们使用递归时不必考虑溢出和安全问题。

**练习 5.1：** 修改findlinks代码中遍历n.FirstChild链表的部分，将循环调用visit，改成递归调用。

**练习 5.2：** 编写函数，记录在HTML树中出现的同名元素的次数。

**练习 5.3：** 编写函数输出所有text结点的内容。注意不要访问`<script>`和`<style>`元素，因为这些元素对浏览者是不可见的。

**练习 5.4：** 扩展visit函数，使其能够处理其他类型的结点，如images、scripts和style sheets。
-												回到简体

											
										
										
											2016-02-15 03:06:34 +00:00
+								## 5.2. 递归
-												good good study, day day up!

											
										
										
											2015-12-09 07:45:11 +00:00
-												回到简体

											
										
										
											2016-02-15 03:06:34 +00:00
+								函数可以是递归的，这意味着函数可以直接或间接的调用自身。对许多问题而言，递归是一种强有力的技术，例如处理递归的数据结构。在4.4节，我们通过遍历二叉树来实现简单的插入排序，在本章节，我们再次使用它来处理HTML文件。
--02 done

											
										
										
											2016-01-02 13:07:14 +00:00
-												回到简体

											
										
										
											2016-02-15 03:06:34 +00:00
+								下文的示例代码使用了非标准包 golang.org/x/net/html ，解析HTML。golang.org/x/... 目录下存储了一些由Go团队设计、维护，对网络编程、国际化文件处理、移动平台、图像处理、加密解密、开发者工具提供支持的扩展包。未将这些扩展包加入到标准库原因有二，一是部分包仍在开发中，二是对大多数Go语言的开发者而言，扩展包提供的功能很少被使用。
--02 done

											
										
										
											2016-01-02 13:07:14 +00:00
-												fix: typo

Update ch5-02.md
											
										
										
											2023-05-07 14:46:29 +00:00
+								例子中调用golang.org/x/net/html的部分api如下所示。html.Parse函数读入一组bytes解析后，返回html.Node类型的HTML页面树状结构根节点。HTML拥有很多类型的结点如text（文本）、comments（注释）类型，在下面的例子中，我们 只关注< name key='value' >形式的结点。
--02 done

											
										
										
											2016-01-02 13:07:14 +00:00
-												ch5: fix code path

											
										
										
											2016-01-21 01:58:28 +00:00
+								<u><i>golang.org/x/net/html</i></u>
-												ch5-2: fmt code

											
										
										
											2016-01-02 13:34:50 +00:00
+								```Go
--02 done

											
										
										
											2016-01-02 13:07:14 +00:00
+								package html
-												ch5-2: fmt code

											
										
										
											2016-01-02 13:34:50 +00:00
--02 done

											
										
										
											2016-01-02 13:07:14 +00:00
+								type Node struct {
-												fmt

											
										
										
											2016-01-18 03:14:19 +00:00
+									Type                    NodeType
 									Data                    string
 									Attr                    []Attribute
 									FirstChild, NextSibling *Node
--02 done

											
										
										
											2016-01-02 13:07:14 +00:00
+								}
-												ch5-2: fmt code

											
										
										
											2016-01-02 13:34:50 +00:00
--02 done

											
										
										
											2016-01-02 13:07:14 +00:00
+								type NodeType int32
 								const (
-												fmt

											
										
										
											2016-01-18 03:14:19 +00:00
+									ErrorNode NodeType = iota
 									TextNode
 									DocumentNode
 									ElementNode
 									CommentNode
 									DoctypeNode
--02 done

											
										
										
											2016-01-02 13:07:14 +00:00
+								)
 								type Attribute struct {
-												fmt

											
										
										
											2016-01-18 03:14:19 +00:00
+									Key, Val string
--02 done

											
										
										
											2016-01-02 13:07:14 +00:00
+								}
-												ch5-2: fmt code

											
										
										
											2016-01-02 13:34:50 +00:00
--02 done

											
										
										
											2016-01-02 13:07:14 +00:00
+								func Parse(r io.Reader) (*Node, error)
 								```
-												ch5-2: fmt code

											
										
										
											2016-01-02 13:34:50 +00:00
-												回到简体

											
										
										
											2016-02-15 03:06:34 +00:00
+								main函数解析HTML标准输入，通过递归函数visit获得links（链接），并打印出这些links：
--02 done

											
										
										
											2016-01-02 13:07:14 +00:00
-												fix typo

											
										
										
											2018-03-30 16:47:24 +00:00
+								<u><i>gopl.io/ch5/findlinks1</i></u>
-												ch5-2: fmt code

											
										
										
											2016-01-02 13:34:50 +00:00
+								```Go
 								// Findlinks1 prints the links in an HTML document read from standard input.
--02 done

											
										
										
											2016-01-02 13:07:14 +00:00
+								package main
-												ch5-2: fmt code

											
										
										
											2016-01-02 13:34:50 +00:00
--02 done

											
										
										
											2016-01-02 13:07:14 +00:00
+								import (
-												fmt

											
										
										
											2016-01-18 03:14:19 +00:00
+									"fmt"
 									"os"
-												ch5-2: fmt code

											
										
										
											2016-01-02 13:34:50 +00:00
-												fmt

											
										
										
											2016-01-18 03:14:19 +00:00
+									"golang.org/x/net/html"
--02 done

											
										
										
											2016-01-02 13:07:14 +00:00
+								)
-												ch5-2: fmt code

											
										
										
											2016-01-02 13:34:50 +00:00
--02 done

											
										
										
											2016-01-02 13:07:14 +00:00
+								func main() {
-												fmt

											
										
										
											2016-01-18 03:14:19 +00:00
+									doc, err := html.Parse(os.Stdin)
 									if err != nil {
 										fmt.Fprintf(os.Stderr, "findlinks1: %v\n", err)
 										os.Exit(1)
 									}
 									for _, link := range visit(nil, doc) {
 										fmt.Println(link)
 									}
--02 done

											
										
										
											2016-01-02 13:07:14 +00:00
+								}
 								```
-												ch5-2: fmt code

											
										
										
											2016-01-02 13:34:50 +00:00
-												回到简体

											
										
										
											2016-02-15 03:06:34 +00:00
+								visit函数遍历HTML的节点树，从每一个anchor元素的href属性获得link,将这些links存入字符串数组中，并返回这个字符串数组。
--02 done

											
										
										
											2016-01-02 13:07:14 +00:00
-												ch5-2: fmt code

											
										
										
											2016-01-02 13:34:50 +00:00
+								```Go
 								// visit appends to links each link found in n and returns the result.
--02 done

											
										
										
											2016-01-02 13:07:14 +00:00
+								func visit(links []string, n *html.Node) []string {
-												fmt

											
										
										
											2016-01-18 03:14:19 +00:00
+									if n.Type == html.ElementNode && n.Data == "a" {
 										for _, a := range n.Attr {
 											if a.Key == "href" {
 												links = append(links, a.Val)
 											}
 										}
 									}
 									for c := n.FirstChild; c != nil; c = c.NextSibling {
 										links = visit(links, c)
 									}
 									return links
--02 done

											
										
										
											2016-01-02 13:07:14 +00:00
+								}
 								```
-												ch5-2: fmt code

											
										
										
											2016-01-02 13:34:50 +00:00
-												回到简体

											
										
										
											2016-02-15 03:06:34 +00:00
+								为了遍历结点n的所有后代结点，每次遇到n的孩子结点时，visit递归的调用自身。这些孩子结点存放在FirstChild链表中。
--02 done

											
										
										
											2016-01-02 13:07:14 +00:00
-												回到简体

											
										
										
											2016-02-15 03:06:34 +00:00
+								让我们以Go的主页（golang.org）作为目标，运行findlinks。我们以fetch（1.5章）的输出作为findlinks的输入。下面的输出做了简化处理。
--02 done

											
										
										
											2016-01-02 13:07:14 +00:00
 								```
 								$ go build gopl.io/ch1/fetch
 								$ go build gopl.io/ch5/findlinks1
 								$ ./fetch https://golang.org | ./findlinks1
 								#
 								/doc/
 								/pkg/
 								/help/
 								/blog/
 								http://play.golang.org/
 								//tour.golang.org/
 								https://golang.org/dl/
 								//blog.golang.org/
 								/LICENSE
 								/doc/tos.html
 								http://www.google.com/intl/en/policies/privacy/
 								```
-												ch5-2: fmt code

											
										
										
											2016-01-02 13:34:50 +00:00
-												回到简体

											
										
										
											2016-02-15 03:06:34 +00:00
+								注意在页面中出现的链接格式，在之后我们会介绍如何将这些链接，根据根路径（ https://golang.org ）生成可以直接访问的url。
--02 done

											
										
										
											2016-01-02 13:07:14 +00:00
-												回到简体

											
										
										
											2016-02-15 03:06:34 +00:00
+								在函数outline中，我们通过递归的方式遍历整个HTML结点树，并输出树的结构。在outline内部，每遇到一个HTML元素标签，就将其入栈，并输出。
--02 done

											
										
										
											2016-01-02 13:07:14 +00:00
-												ch5: fix code path

											
										
										
											2016-01-21 01:58:28 +00:00
+								<u><i>gopl.io/ch5/outline</i></u>
-												ch5-2: fmt code

											
										
										
											2016-01-02 13:34:50 +00:00
+								```Go
--02 done

											
										
										
											2016-01-02 13:07:14 +00:00
+								func main() {
-												fmt

											
										
										
											2016-01-18 03:14:19 +00:00
+									doc, err := html.Parse(os.Stdin)
 									if err != nil {
 										fmt.Fprintf(os.Stderr, "outline: %v\n", err)
 										os.Exit(1)
 									}
 									outline(nil, doc)
--02 done

											
										
										
											2016-01-02 13:07:14 +00:00
+								}
 								func outline(stack []string, n *html.Node) {
-												fmt

											
										
										
											2016-01-18 03:14:19 +00:00
+									if n.Type == html.ElementNode {
 										stack = append(stack, n.Data) // push tag
 										fmt.Println(stack)
 									}
 									for c := n.FirstChild; c != nil; c = c.NextSibling {
 										outline(stack, c)
 									}
--02 done

											
										
										
											2016-01-02 13:07:14 +00:00
+								}
 								```
-												润色。

											
										
										
											2016-12-12 08:17:05 +00:00
+								有一点值得注意：outline有入栈操作，但没有相对应的出栈操作。当outline调用自身时，被调用者接收的是stack的拷贝。被调用者对stack的元素追加操作，修改的是stack的拷贝，其可能会修改slice底层的数组甚至是申请一块新的内存空间进行扩容；但这个过程并不会修改调用方的stack。因此当函数返回时，调用方的stack与其调用自身之前完全一致。
--02 done

											
										
										
											2016-01-02 13:07:14 +00:00
-												回到简体

											
										
										
											2016-02-15 03:06:34 +00:00
+								下面是 https://golang.org 页面的简要结构:
--02 done

											
										
										
											2016-01-02 13:07:14 +00:00
 								```
 								$ go build gopl.io/ch5/outline
 								$ ./fetch https://golang.org | ./outline
 								[html]
 								[html head]
 								[html head meta]
 								[html head title]
 								[html head link]
 								[html body]
 								[html body div]
 								[html body div]
 								[html body div div]
 								[html body div div form]
 								[html body div div form div]
 								[html body div div form div a]
 								...
 								```
-												ch5-2: fmt code

											
										
										
											2016-01-02 13:34:50 +00:00
-												回到简体

											
										
										
											2016-02-15 03:06:34 +00:00
+								正如你在上面实验中所见，大部分HTML页面只需几层递归就能被处理，但仍然有些页面需要深层次的递归。
--02 done

											
										
										
											2016-01-02 13:07:14 +00:00
-												修正半角标点符号

											
										
										
											2018-06-09 16:27:25 +00:00
+								大部分编程语言使用固定大小的函数调用栈，常见的大小从64KB到2MB不等。固定大小栈会限制递归的深度，当你用递归处理大量数据时，需要避免栈溢出；除此之外，还会导致安全性问题。与此相反，Go语言使用可变栈，栈的大小按需增加（初始时很小）。这使得我们使用递归时不必考虑溢出和安全问题。
--02 done

											
										
										
											2016-01-02 13:07:14 +00:00
-												回到简体

											
										
										
											2016-02-15 03:06:34 +00:00
+								**练习 5.1：** 修改findlinks代码中遍历n.FirstChild链表的部分，将循环调用visit，改成递归调用。
-												ch5-2: fmt code

											
										
										
											2016-01-02 13:34:50 +00:00
-												回到简体

											
										
										
											2016-02-15 03:06:34 +00:00
+								**练习 5.2：** 编写函数，记录在HTML树中出现的同名元素的次数。
-												ch5-2: fmt code

											
										
										
											2016-01-02 13:34:50 +00:00
-												修正半角标点符号

											
										
										
											2018-05-27 20:51:15 +00:00
+								**练习 5.3：** 编写函数输出所有text结点的内容。注意不要访问`<script>`和`<style>`元素，因为这些元素对浏览者是不可见的。
-												ch5-2: fmt code

											
										
										
											2016-01-02 13:34:50 +00:00
-												fix typo

											
										
										
											2017-02-21 09:48:04 +00:00
+								**练习 5.4：** 扩展visit函数，使其能够处理其他类型的结点，如images、scripts和style sheets。