This commit is contained in:
Xargin 2015-12-11 15:48:07 +08:00
parent 5b14a3b8de
commit 24a99ea720

View File

@ -1,6 +1,6 @@
## 8.8. 示例: 併髮的字典遍歷
In this section, well build a program that reports the disk usage of one or more directories specified on the command line, like the Unix du command. Most of its work is done by the walkDir function below, which enumerates the entries of the directory dir using the dirents helper function.
在本小节中我们会创建一个程序来生成指定目录的硬盘使用情况报告这个程序和Unix里的du工具比较相似。大多数工作用下面这个walkDir函数来完成这个函数使用dirents函数来枚举一个目录下的所有入口。
```go
gopl.io/ch8/du1
@ -28,9 +28,9 @@ func dirents(dir string) []os.FileInfo {
}
```
he ioutil.ReadDir function returns a slice of os.FileInfo—the same information that a call to os.Stat returns for a single file. For each subdirectory, walkDir recursively calls itself, and for each file, walkDir sends a message on the fileSizes channel. The message is the size of the file in bytes.
ioutil.ReadDir函数会返回一个os.FileInfo类型的sliceos.FileInfo类型也是os.Stat这个函数的返回值。对每一个子目录而言walkDir会递归地调用其自身并且会对每一个文件也递归调用。walkDir函数会向fileSizes这个channel发送一条消息。这条消息包含了文件的字节大小。
The main function, shown below, uses two goroutines. The background goroutine calls walkDir for each directory specified on the command line and finally closes the fileSizes channel. The main goroutine computes the sum of the file sizes it receives from the channel and finally prints the total.
下面的主函数用了两个goroutine。后台的goroutine调用walkDir来遍历命令行给出的每一个路径并最终关闭fileSizes这个channel。主goroutine会对其从channel中接收到的文件大小进行累加并输出其和。
```go
@ -75,15 +75,16 @@ func printDiskUsage(nfiles, nbytes int64) {
}
```
This program pauses for a long while before printing its result:
这个程序会在打印其结果之前卡住很长时间。
```
$ go build gopl.io/ch8/du1
$ ./du1 $HOME /usr /bin /etc
213201 files 62.7 GB
```
The program would be nicer if it kept us informed of its progress. However, simply moving the printDiskUsage call into the loop would cause it to print thousands of lines of output.
The variant of du below prints the totals periodically, but only if the -v flag is specified since not all users will want to see progress messages. The background goroutine that loops over roots remains unchanged. The main goroutine now uses a ticker to generate events every 500ms, and a select statement to wait for either a file size message, in which case it updates the totals, or a tick event, in which case it prints the current totals. If the -v flag is not specified, the tick channel remains nil, and its case in the select is effectively disabled.
如果在运行的时候能够让我们知道处理进度的话想必更好。但是如果简单地把printDiskUsage函数调用移动到循环里会导致其打印出成百上千的输出。
下面这个du的变种会间歇打印内容不过只有在调用时提供了-v的flag才会显示程序进度信息。在roots目录上循环的后台goroutine在这里保持不变。主goroutine现在使用了计时器来每500ms生成事件然后用select语句来等待文件大小的消息来更新总大小数据或者一个计时器的事件来打印当前的总大小数据。如果-v的flag在运行时没有传入的话tick这个channel会保持为nil这样在select里的case也就相当于被禁用了。
```go
gopl.io/ch8/du2
@ -114,11 +115,11 @@ loop:
printDiskUsage(nfiles, nbytes) // final totals
}
```
Since the program no longer uses a range loop, the first select case must explicitly test whether the fileSizes channel has been closed, using the two-result form of receive opera- tion. If the channel has been closed, the program breaks out of the loop. The labeled break statement breaks out of both the select and the for loop; an unlabeled break would break out of only the select, causing the loop to begin the next iteration.
The program now gives us a leisurely stream of updates:
由于我们的程序不再使用range循环第一个select的case必须显式地判断fileSizes的channel是不是已经被关闭了这里可以用到channel接收的二值形式。如果channel已经被关闭了的话程序会直接退出循环。这里的break语句用到了标签break这样可以同时终结select和for两个循环如果没有用标签就break的话只会退出内层的select循环而外层的for循环会使之进入下一轮select循环。
现在程序会悠闲地为我们打印更新流:
```
$ go build gopl.io/ch8/du2
$ ./du2 -v $HOME /usr /bin /etc
28608 files 8.3 GB
@ -127,9 +128,9 @@ $ ./du2 -v $HOME /usr /bin /etc
127169 files 52.9 GB
175931 files 62.2 GB
213201 files 62.7 GB
```
However, it still takes too long to finish. Theres no reason why all the calls to walkDir cant be done concurrently, thereby exploiting parallelism in the disk system. The third version of du, below, creates a new goroutine for each call to walkDir. It uses a sync.WaitGroup (§8.5) to count the number of calls to walkDir that are still active, and a closer goroutine to close the fileSizes channel when the counter drops to zero.
然而这个程序还是会花上很长时间才会结束。无法对walkDir做并行化处理没什么别的原因无非是因为磁盘系统并行限制。下面这个第三个版本的du会对每一个walkDir的调用创建一个新的goroutine。它使用sync.WaitGroup (§8.5)来对仍旧活跃的walkDir调用进行计数另一个goroutine会在计数器减为零的时候将fileSizes这个channel关闭。
```go
gopl.io/ch8/du3
@ -163,7 +164,7 @@ func walkDir(dir string, n *sync.WaitGroup, fileSizes chan<- int64) {
}
```
Since this program creates many thousands of goroutines at its peak, we have to change dirents to use a counting semaphore to prevent it from opening too many files at once, just as we did for the web crawler in Section 8.6:
由于这个程序在高峰期会创建成百上千的goroutine我们需要修改dirents函数用计数信号量来阻止他同时打开太多的文件就像我们在8.7节中的并发爬虫一样:
```go
@ -178,8 +179,8 @@ func dirents(dir string) []os.FileInfo {
```
This version runs several times faster than the previous one, though there is a lot of variability from system to system.
这个版本比之前那个快了好几倍,尽管其具体效率还是和你的运行环境,机器配置相关。
Exercise 8.9: Write a version of du that computes and periodically displays separate totals for each of the root directories.
练习8.9: 编写一个du工具每隔一段时间将root目录下的目录大小计算并显示出来。