2015-12-09 07:57:17 +00:00
<!DOCTYPE HTML>
< html lang = "zh-tw" >
< head >
< meta charset = "UTF-8" >
< meta http-equiv = "X-UA-Compatible" content = "IE=edge" / >
< title > 示例: 併髮的Web爬蟲 | Go编程语言< / title >
< meta content = "text/html; charset=utf-8" http-equiv = "Content-Type" >
< meta name = "description" content = "" >
< meta name = "generator" content = "GitBook 2.5.2" >
< meta name = "HandheldFriendly" content = "true" / >
< meta name = "viewport" content = "width=device-width, initial-scale=1, user-scalable=no" >
< meta name = "apple-mobile-web-app-capable" content = "yes" >
< meta name = "apple-mobile-web-app-status-bar-style" content = "black" >
< link rel = "apple-touch-icon-precomposed" sizes = "152x152" href = "../gitbook/images/apple-touch-icon-precomposed-152.png" >
< link rel = "shortcut icon" href = "../gitbook/images/favicon.ico" type = "image/x-icon" >
< link rel = "stylesheet" href = "../gitbook/style.css" >
< link rel = "stylesheet" href = "../gitbook/plugins/gitbook-plugin-highlight/website.css" >
< link rel = "stylesheet" href = "../gitbook/plugins/gitbook-plugin-search/search.css" >
< link rel = "stylesheet" href = "../gitbook/plugins/gitbook-plugin-fontsettings/website.css" >
< link rel = "next" href = "../ch8/ch8-07.html" / >
< link rel = "prev" href = "../ch8/ch8-05.html" / >
< / head >
< body >
2015-12-16 02:56:29 +00:00
< div class = "book" data-level = "8.6" data-chapter-title = "示例: 併髮的Web爬蟲" data-filepath = "ch8/ch8-06.md" data-basepath = ".." data-revision = "Wed Dec 16 2015 10:54:29 GMT+0800 (中国标准时间)" >
2015-12-09 07:57:17 +00:00
< div class = "book-summary" >
< nav role = "navigation" >
< ul class = "summary" >
< li class = "chapter " data-level = "0" data-path = "index.html" >
< a href = "../index.html" >
< i class = "fa fa-check" > < / i >
前言
< / a >
< ul class = "articles" >
< li class = "chapter " data-level = "0.1" data-path = "ch0/ch0-01.html" >
< a href = "../ch0/ch0-01.html" >
< i class = "fa fa-check" > < / i >
< b > 0.1.< / b >
Go語言起源
< / a >
< / li >
< li class = "chapter " data-level = "0.2" data-path = "ch0/ch0-02.html" >
< a href = "../ch0/ch0-02.html" >
< i class = "fa fa-check" > < / i >
< b > 0.2.< / b >
Go語言項目
< / a >
< / li >
< li class = "chapter " data-level = "0.3" data-path = "ch0/ch0-03.html" >
< a href = "../ch0/ch0-03.html" >
< i class = "fa fa-check" > < / i >
< b > 0.3.< / b >
本書的組織
< / a >
< / li >
< li class = "chapter " data-level = "0.4" data-path = "ch0/ch0-04.html" >
< a href = "../ch0/ch0-04.html" >
< i class = "fa fa-check" > < / i >
< b > 0.4.< / b >
更多的信息
< / a >
< / li >
< li class = "chapter " data-level = "0.5" data-path = "ch0/ch0-05.html" >
< a href = "../ch0/ch0-05.html" >
< i class = "fa fa-check" > < / i >
< b > 0.5.< / b >
緻謝
< / a >
< / li >
< / ul >
< / li >
< li class = "chapter " data-level = "1" data-path = "ch1/ch1.html" >
< a href = "../ch1/ch1.html" >
< i class = "fa fa-check" > < / i >
< b > 1.< / b >
入門
< / a >
< ul class = "articles" >
< li class = "chapter " data-level = "1.1" data-path = "ch1/ch1-01.html" >
< a href = "../ch1/ch1-01.html" >
< i class = "fa fa-check" > < / i >
< b > 1.1.< / b >
Hello, World
< / a >
< / li >
< li class = "chapter " data-level = "1.2" data-path = "ch1/ch1-02.html" >
< a href = "../ch1/ch1-02.html" >
< i class = "fa fa-check" > < / i >
< b > 1.2.< / b >
命令行參數
< / a >
< / li >
< li class = "chapter " data-level = "1.3" data-path = "ch1/ch1-03.html" >
< a href = "../ch1/ch1-03.html" >
< i class = "fa fa-check" > < / i >
< b > 1.3.< / b >
査找重復的行
< / a >
< / li >
< li class = "chapter " data-level = "1.4" data-path = "ch1/ch1-04.html" >
< a href = "../ch1/ch1-04.html" >
< i class = "fa fa-check" > < / i >
< b > 1.4.< / b >
GIF動畫
< / a >
< / li >
< li class = "chapter " data-level = "1.5" data-path = "ch1/ch1-05.html" >
< a href = "../ch1/ch1-05.html" >
< i class = "fa fa-check" > < / i >
< b > 1.5.< / b >
穫取URL
< / a >
< / li >
< li class = "chapter " data-level = "1.6" data-path = "ch1/ch1-06.html" >
< a href = "../ch1/ch1-06.html" >
< i class = "fa fa-check" > < / i >
< b > 1.6.< / b >
併髮穫取多個URL
< / a >
< / li >
< li class = "chapter " data-level = "1.7" data-path = "ch1/ch1-07.html" >
< a href = "../ch1/ch1-07.html" >
< i class = "fa fa-check" > < / i >
< b > 1.7.< / b >
Web服務
< / a >
< / li >
< li class = "chapter " data-level = "1.8" data-path = "ch1/ch1-08.html" >
< a href = "../ch1/ch1-08.html" >
< i class = "fa fa-check" > < / i >
< b > 1.8.< / b >
本章要點
< / a >
< / li >
< / ul >
< / li >
< li class = "chapter " data-level = "2" data-path = "ch2/ch2.html" >
< a href = "../ch2/ch2.html" >
< i class = "fa fa-check" > < / i >
< b > 2.< / b >
程序結構
< / a >
< ul class = "articles" >
< li class = "chapter " data-level = "2.1" data-path = "ch2/ch2-01.html" >
< a href = "../ch2/ch2-01.html" >
< i class = "fa fa-check" > < / i >
< b > 2.1.< / b >
命名
< / a >
< / li >
< li class = "chapter " data-level = "2.2" data-path = "ch2/ch2-02.html" >
< a href = "../ch2/ch2-02.html" >
< i class = "fa fa-check" > < / i >
< b > 2.2.< / b >
聲明
< / a >
< / li >
< li class = "chapter " data-level = "2.3" data-path = "ch2/ch2-03.html" >
< a href = "../ch2/ch2-03.html" >
< i class = "fa fa-check" > < / i >
< b > 2.3.< / b >
變量
< / a >
< / li >
< li class = "chapter " data-level = "2.4" data-path = "ch2/ch2-04.html" >
< a href = "../ch2/ch2-04.html" >
< i class = "fa fa-check" > < / i >
< b > 2.4.< / b >
賦值
< / a >
< / li >
< li class = "chapter " data-level = "2.5" data-path = "ch2/ch2-05.html" >
< a href = "../ch2/ch2-05.html" >
< i class = "fa fa-check" > < / i >
< b > 2.5.< / b >
類型
< / a >
< / li >
< li class = "chapter " data-level = "2.6" data-path = "ch2/ch2-06.html" >
< a href = "../ch2/ch2-06.html" >
< i class = "fa fa-check" > < / i >
< b > 2.6.< / b >
包和文件
< / a >
< / li >
< li class = "chapter " data-level = "2.7" data-path = "ch2/ch2-07.html" >
< a href = "../ch2/ch2-07.html" >
< i class = "fa fa-check" > < / i >
< b > 2.7.< / b >
作用域
< / a >
< / li >
< / ul >
< / li >
< li class = "chapter " data-level = "3" data-path = "ch3/ch3.html" >
< a href = "../ch3/ch3.html" >
< i class = "fa fa-check" > < / i >
< b > 3.< / b >
基礎數據類型
< / a >
< ul class = "articles" >
< li class = "chapter " data-level = "3.1" data-path = "ch3/ch3-01.html" >
< a href = "../ch3/ch3-01.html" >
< i class = "fa fa-check" > < / i >
< b > 3.1.< / b >
整型
< / a >
< / li >
< li class = "chapter " data-level = "3.2" data-path = "ch3/ch3-02.html" >
< a href = "../ch3/ch3-02.html" >
< i class = "fa fa-check" > < / i >
< b > 3.2.< / b >
浮點數
< / a >
< / li >
< li class = "chapter " data-level = "3.3" data-path = "ch3/ch3-03.html" >
< a href = "../ch3/ch3-03.html" >
< i class = "fa fa-check" > < / i >
< b > 3.3.< / b >
復數
< / a >
< / li >
< li class = "chapter " data-level = "3.4" data-path = "ch3/ch3-04.html" >
< a href = "../ch3/ch3-04.html" >
< i class = "fa fa-check" > < / i >
< b > 3.4.< / b >
佈爾型
< / a >
< / li >
< li class = "chapter " data-level = "3.5" data-path = "ch3/ch3-05.html" >
< a href = "../ch3/ch3-05.html" >
< i class = "fa fa-check" > < / i >
< b > 3.5.< / b >
字符串
< / a >
< / li >
< li class = "chapter " data-level = "3.6" data-path = "ch3/ch3-06.html" >
< a href = "../ch3/ch3-06.html" >
< i class = "fa fa-check" > < / i >
< b > 3.6.< / b >
常量
< / a >
< / li >
< / ul >
< / li >
< li class = "chapter " data-level = "4" data-path = "ch4/ch4.html" >
< a href = "../ch4/ch4.html" >
< i class = "fa fa-check" > < / i >
< b > 4.< / b >
復閤數據類型
< / a >
< ul class = "articles" >
< li class = "chapter " data-level = "4.1" data-path = "ch4/ch4-01.html" >
< a href = "../ch4/ch4-01.html" >
< i class = "fa fa-check" > < / i >
< b > 4.1.< / b >
數組
< / a >
< / li >
< li class = "chapter " data-level = "4.2" data-path = "ch4/ch4-02.html" >
< a href = "../ch4/ch4-02.html" >
< i class = "fa fa-check" > < / i >
< b > 4.2.< / b >
切片
< / a >
< / li >
< li class = "chapter " data-level = "4.3" data-path = "ch4/ch4-03.html" >
< a href = "../ch4/ch4-03.html" >
< i class = "fa fa-check" > < / i >
< b > 4.3.< / b >
字典
< / a >
< / li >
< li class = "chapter " data-level = "4.4" data-path = "ch4/ch4-04.html" >
< a href = "../ch4/ch4-04.html" >
< i class = "fa fa-check" > < / i >
< b > 4.4.< / b >
結構體
< / a >
< / li >
< li class = "chapter " data-level = "4.5" data-path = "ch4/ch4-05.html" >
< a href = "../ch4/ch4-05.html" >
< i class = "fa fa-check" > < / i >
< b > 4.5.< / b >
JSON
< / a >
< / li >
< li class = "chapter " data-level = "4.6" data-path = "ch4/ch4-06.html" >
< a href = "../ch4/ch4-06.html" >
< i class = "fa fa-check" > < / i >
< b > 4.6.< / b >
文本和HTML模闆
< / a >
< / li >
< / ul >
< / li >
< li class = "chapter " data-level = "5" data-path = "ch5/ch5.html" >
< a href = "../ch5/ch5.html" >
< i class = "fa fa-check" > < / i >
< b > 5.< / b >
函數
< / a >
< ul class = "articles" >
< li class = "chapter " data-level = "5.1" data-path = "ch5/ch5-01.html" >
< a href = "../ch5/ch5-01.html" >
< i class = "fa fa-check" > < / i >
< b > 5.1.< / b >
函數聲明
< / a >
< / li >
< li class = "chapter " data-level = "5.2" data-path = "ch5/ch5-02.html" >
< a href = "../ch5/ch5-02.html" >
< i class = "fa fa-check" > < / i >
< b > 5.2.< / b >
遞歸
< / a >
< / li >
< li class = "chapter " data-level = "5.3" data-path = "ch5/ch5-03.html" >
< a href = "../ch5/ch5-03.html" >
< i class = "fa fa-check" > < / i >
< b > 5.3.< / b >
多返迴值
< / a >
< / li >
< li class = "chapter " data-level = "5.4" data-path = "ch5/ch5-04.html" >
< a href = "../ch5/ch5-04.html" >
< i class = "fa fa-check" > < / i >
< b > 5.4.< / b >
錯誤
< / a >
< / li >
< li class = "chapter " data-level = "5.5" data-path = "ch5/ch5-05.html" >
< a href = "../ch5/ch5-05.html" >
< i class = "fa fa-check" > < / i >
< b > 5.5.< / b >
函數值
< / a >
< / li >
< li class = "chapter " data-level = "5.6" data-path = "ch5/ch5-06.html" >
< a href = "../ch5/ch5-06.html" >
< i class = "fa fa-check" > < / i >
< b > 5.6.< / b >
匿名函數
< / a >
< / li >
< li class = "chapter " data-level = "5.7" data-path = "ch5/ch5-07.html" >
< a href = "../ch5/ch5-07.html" >
< i class = "fa fa-check" > < / i >
< b > 5.7.< / b >
可變參數
< / a >
< / li >
< li class = "chapter " data-level = "5.8" data-path = "ch5/ch5-08.html" >
< a href = "../ch5/ch5-08.html" >
< i class = "fa fa-check" > < / i >
< b > 5.8.< / b >
Deferred函數
< / a >
< / li >
< li class = "chapter " data-level = "5.9" data-path = "ch5/ch5-09.html" >
< a href = "../ch5/ch5-09.html" >
< i class = "fa fa-check" > < / i >
< b > 5.9.< / b >
Panic異常
< / a >
< / li >
< li class = "chapter " data-level = "5.10" data-path = "ch5/ch5-10.html" >
< a href = "../ch5/ch5-10.html" >
< i class = "fa fa-check" > < / i >
< b > 5.10.< / b >
Recover捕穫異常
< / a >
< / li >
< / ul >
< / li >
< li class = "chapter " data-level = "6" data-path = "ch6/ch6.html" >
< a href = "../ch6/ch6.html" >
< i class = "fa fa-check" > < / i >
< b > 6.< / b >
方法
< / a >
< ul class = "articles" >
< li class = "chapter " data-level = "6.1" data-path = "ch6/ch6-01.html" >
< a href = "../ch6/ch6-01.html" >
< i class = "fa fa-check" > < / i >
< b > 6.1.< / b >
方法聲明
< / a >
< / li >
< li class = "chapter " data-level = "6.2" data-path = "ch6/ch6-02.html" >
< a href = "../ch6/ch6-02.html" >
< i class = "fa fa-check" > < / i >
< b > 6.2.< / b >
基於指鍼對象的方法
< / a >
< / li >
< li class = "chapter " data-level = "6.3" data-path = "ch6/ch6-03.html" >
< a href = "../ch6/ch6-03.html" >
< i class = "fa fa-check" > < / i >
< b > 6.3.< / b >
通過嵌入結構體來擴展類型
< / a >
< / li >
< li class = "chapter " data-level = "6.4" data-path = "ch6/ch6-04.html" >
< a href = "../ch6/ch6-04.html" >
< i class = "fa fa-check" > < / i >
< b > 6.4.< / b >
方法值和方法錶達式
< / a >
< / li >
< li class = "chapter " data-level = "6.5" data-path = "ch6/ch6-05.html" >
< a href = "../ch6/ch6-05.html" >
< i class = "fa fa-check" > < / i >
< b > 6.5.< / b >
示例: Bit數組
< / a >
< / li >
< li class = "chapter " data-level = "6.6" data-path = "ch6/ch6-06.html" >
< a href = "../ch6/ch6-06.html" >
< i class = "fa fa-check" > < / i >
< b > 6.6.< / b >
封裝
< / a >
< / li >
< / ul >
< / li >
< li class = "chapter " data-level = "7" data-path = "ch7/ch7.html" >
< a href = "../ch7/ch7.html" >
< i class = "fa fa-check" > < / i >
< b > 7.< / b >
接口
< / a >
< ul class = "articles" >
< li class = "chapter " data-level = "7.1" data-path = "ch7/ch7-01.html" >
< a href = "../ch7/ch7-01.html" >
< i class = "fa fa-check" > < / i >
< b > 7.1.< / b >
接口是閤約
< / a >
< / li >
< li class = "chapter " data-level = "7.2" data-path = "ch7/ch7-02.html" >
< a href = "../ch7/ch7-02.html" >
< i class = "fa fa-check" > < / i >
< b > 7.2.< / b >
接口類型
< / a >
< / li >
< li class = "chapter " data-level = "7.3" data-path = "ch7/ch7-03.html" >
< a href = "../ch7/ch7-03.html" >
< i class = "fa fa-check" > < / i >
< b > 7.3.< / b >
實現接口的條件
< / a >
< / li >
< li class = "chapter " data-level = "7.4" data-path = "ch7/ch7-04.html" >
< a href = "../ch7/ch7-04.html" >
< i class = "fa fa-check" > < / i >
< b > 7.4.< / b >
flag.Value接口
< / a >
< / li >
< li class = "chapter " data-level = "7.5" data-path = "ch7/ch7-05.html" >
< a href = "../ch7/ch7-05.html" >
< i class = "fa fa-check" > < / i >
< b > 7.5.< / b >
接口值
< / a >
< / li >
< li class = "chapter " data-level = "7.6" data-path = "ch7/ch7-06.html" >
< a href = "../ch7/ch7-06.html" >
< i class = "fa fa-check" > < / i >
< b > 7.6.< / b >
sort.Interface接口
< / a >
< / li >
< li class = "chapter " data-level = "7.7" data-path = "ch7/ch7-07.html" >
< a href = "../ch7/ch7-07.html" >
< i class = "fa fa-check" > < / i >
< b > 7.7.< / b >
http.Handler接口
< / a >
< / li >
< li class = "chapter " data-level = "7.8" data-path = "ch7/ch7-08.html" >
< a href = "../ch7/ch7-08.html" >
< i class = "fa fa-check" > < / i >
< b > 7.8.< / b >
error接口
< / a >
< / li >
< li class = "chapter " data-level = "7.9" data-path = "ch7/ch7-09.html" >
< a href = "../ch7/ch7-09.html" >
< i class = "fa fa-check" > < / i >
< b > 7.9.< / b >
示例: 錶達式求值
< / a >
< / li >
< li class = "chapter " data-level = "7.10" data-path = "ch7/ch7-10.html" >
< a href = "../ch7/ch7-10.html" >
< i class = "fa fa-check" > < / i >
< b > 7.10.< / b >
類型斷言
< / a >
< / li >
< li class = "chapter " data-level = "7.11" data-path = "ch7/ch7-11.html" >
< a href = "../ch7/ch7-11.html" >
< i class = "fa fa-check" > < / i >
< b > 7.11.< / b >
基於類型斷言識彆錯誤類型
< / a >
< / li >
< li class = "chapter " data-level = "7.12" data-path = "ch7/ch7-12.html" >
< a href = "../ch7/ch7-12.html" >
< i class = "fa fa-check" > < / i >
< b > 7.12.< / b >
通過類型斷言査詢接口
< / a >
< / li >
< li class = "chapter " data-level = "7.13" data-path = "ch7/ch7-13.html" >
< a href = "../ch7/ch7-13.html" >
< i class = "fa fa-check" > < / i >
< b > 7.13.< / b >
類型分支
< / a >
< / li >
< li class = "chapter " data-level = "7.14" data-path = "ch7/ch7-14.html" >
< a href = "../ch7/ch7-14.html" >
< i class = "fa fa-check" > < / i >
< b > 7.14.< / b >
示例: 基於標記的XML解碼
< / a >
< / li >
< li class = "chapter " data-level = "7.15" data-path = "ch7/ch7-15.html" >
< a href = "../ch7/ch7-15.html" >
< i class = "fa fa-check" > < / i >
< b > 7.15.< / b >
補充幾點
< / a >
< / li >
< / ul >
< / li >
< li class = "chapter " data-level = "8" data-path = "ch8/ch8.html" >
< a href = "../ch8/ch8.html" >
< i class = "fa fa-check" > < / i >
< b > 8.< / b >
Goroutines和Channels
< / a >
< ul class = "articles" >
< li class = "chapter " data-level = "8.1" data-path = "ch8/ch8-01.html" >
< a href = "../ch8/ch8-01.html" >
< i class = "fa fa-check" > < / i >
< b > 8.1.< / b >
Goroutines
< / a >
< / li >
< li class = "chapter " data-level = "8.2" data-path = "ch8/ch8-02.html" >
< a href = "../ch8/ch8-02.html" >
< i class = "fa fa-check" > < / i >
< b > 8.2.< / b >
示例: 併髮的Clock服務
< / a >
< / li >
< li class = "chapter " data-level = "8.3" data-path = "ch8/ch8-03.html" >
< a href = "../ch8/ch8-03.html" >
< i class = "fa fa-check" > < / i >
< b > 8.3.< / b >
示例: 併髮的Echo服務
< / a >
< / li >
< li class = "chapter " data-level = "8.4" data-path = "ch8/ch8-04.html" >
< a href = "../ch8/ch8-04.html" >
< i class = "fa fa-check" > < / i >
< b > 8.4.< / b >
Channels
< / a >
< / li >
< li class = "chapter " data-level = "8.5" data-path = "ch8/ch8-05.html" >
< a href = "../ch8/ch8-05.html" >
< i class = "fa fa-check" > < / i >
< b > 8.5.< / b >
併行的循環
< / a >
< / li >
< li class = "chapter active" data-level = "8.6" data-path = "ch8/ch8-06.html" >
< a href = "../ch8/ch8-06.html" >
< i class = "fa fa-check" > < / i >
< b > 8.6.< / b >
示例: 併髮的Web爬蟲
< / a >
< / li >
< li class = "chapter " data-level = "8.7" data-path = "ch8/ch8-07.html" >
< a href = "../ch8/ch8-07.html" >
< i class = "fa fa-check" > < / i >
< b > 8.7.< / b >
基於select的多路復用
< / a >
< / li >
< li class = "chapter " data-level = "8.8" data-path = "ch8/ch8-08.html" >
< a href = "../ch8/ch8-08.html" >
< i class = "fa fa-check" > < / i >
< b > 8.8.< / b >
示例: 併髮的字典遍歷
< / a >
< / li >
< li class = "chapter " data-level = "8.9" data-path = "ch8/ch8-09.html" >
< a href = "../ch8/ch8-09.html" >
< i class = "fa fa-check" > < / i >
< b > 8.9.< / b >
併髮的退齣
< / a >
< / li >
< li class = "chapter " data-level = "8.10" data-path = "ch8/ch8-10.html" >
< a href = "../ch8/ch8-10.html" >
< i class = "fa fa-check" > < / i >
< b > 8.10.< / b >
示例: 聊天服務
< / a >
< / li >
< / ul >
< / li >
< li class = "chapter " data-level = "9" data-path = "ch9/ch9.html" >
< a href = "../ch9/ch9.html" >
< i class = "fa fa-check" > < / i >
< b > 9.< / b >
基於共享變量的併髮
< / a >
< ul class = "articles" >
< li class = "chapter " data-level = "9.1" data-path = "ch9/ch9-01.html" >
< a href = "../ch9/ch9-01.html" >
< i class = "fa fa-check" > < / i >
< b > 9.1.< / b >
競爭條件
< / a >
< / li >
< li class = "chapter " data-level = "9.2" data-path = "ch9/ch9-02.html" >
< a href = "../ch9/ch9-02.html" >
< i class = "fa fa-check" > < / i >
< b > 9.2.< / b >
sync.Mutex互斥鎖
< / a >
< / li >
< li class = "chapter " data-level = "9.3" data-path = "ch9/ch9-03.html" >
< a href = "../ch9/ch9-03.html" >
< i class = "fa fa-check" > < / i >
< b > 9.3.< / b >
sync.RWMutex讀寫鎖
< / a >
< / li >
< li class = "chapter " data-level = "9.4" data-path = "ch9/ch9-04.html" >
< a href = "../ch9/ch9-04.html" >
< i class = "fa fa-check" > < / i >
< b > 9.4.< / b >
內存衕步
< / a >
< / li >
< li class = "chapter " data-level = "9.5" data-path = "ch9/ch9-05.html" >
< a href = "../ch9/ch9-05.html" >
< i class = "fa fa-check" > < / i >
< b > 9.5.< / b >
sync.Once初始化
< / a >
< / li >
< li class = "chapter " data-level = "9.6" data-path = "ch9/ch9-06.html" >
< a href = "../ch9/ch9-06.html" >
< i class = "fa fa-check" > < / i >
< b > 9.6.< / b >
競爭條件檢測
< / a >
< / li >
< li class = "chapter " data-level = "9.7" data-path = "ch9/ch9-07.html" >
< a href = "../ch9/ch9-07.html" >
< i class = "fa fa-check" > < / i >
< b > 9.7.< / b >
示例: 併髮的非阻塞緩存
< / a >
< / li >
< li class = "chapter " data-level = "9.8" data-path = "ch9/ch9-08.html" >
< a href = "../ch9/ch9-08.html" >
< i class = "fa fa-check" > < / i >
< b > 9.8.< / b >
Goroutines和綫程
< / a >
< / li >
< / ul >
< / li >
< li class = "chapter " data-level = "10" data-path = "ch10/ch10.html" >
< a href = "../ch10/ch10.html" >
< i class = "fa fa-check" > < / i >
< b > 10.< / b >
包和工具
< / a >
< ul class = "articles" >
< li class = "chapter " data-level = "10.1" data-path = "ch10/ch10-01.html" >
< a href = "../ch10/ch10-01.html" >
< i class = "fa fa-check" > < / i >
< b > 10.1.< / b >
簡介
< / a >
< / li >
< li class = "chapter " data-level = "10.2" data-path = "ch10/ch10-02.html" >
< a href = "../ch10/ch10-02.html" >
< i class = "fa fa-check" > < / i >
< b > 10.2.< / b >
導入路徑
< / a >
< / li >
< li class = "chapter " data-level = "10.3" data-path = "ch10/ch10-03.html" >
< a href = "../ch10/ch10-03.html" >
< i class = "fa fa-check" > < / i >
< b > 10.3.< / b >
包聲明
< / a >
< / li >
< li class = "chapter " data-level = "10.4" data-path = "ch10/ch10-04.html" >
< a href = "../ch10/ch10-04.html" >
< i class = "fa fa-check" > < / i >
< b > 10.4.< / b >
導入聲明
< / a >
< / li >
< li class = "chapter " data-level = "10.5" data-path = "ch10/ch10-05.html" >
< a href = "../ch10/ch10-05.html" >
< i class = "fa fa-check" > < / i >
< b > 10.5.< / b >
匿名導入
< / a >
< / li >
< li class = "chapter " data-level = "10.6" data-path = "ch10/ch10-06.html" >
< a href = "../ch10/ch10-06.html" >
< i class = "fa fa-check" > < / i >
< b > 10.6.< / b >
包和命名
< / a >
< / li >
< li class = "chapter " data-level = "10.7" data-path = "ch10/ch10-07.html" >
< a href = "../ch10/ch10-07.html" >
< i class = "fa fa-check" > < / i >
< b > 10.7.< / b >
工具
< / a >
< / li >
< / ul >
< / li >
< li class = "chapter " data-level = "11" data-path = "ch11/ch11.html" >
< a href = "../ch11/ch11.html" >
< i class = "fa fa-check" > < / i >
< b > 11.< / b >
測試
< / a >
< ul class = "articles" >
< li class = "chapter " data-level = "11.1" data-path = "ch11/ch11-01.html" >
< a href = "../ch11/ch11-01.html" >
< i class = "fa fa-check" > < / i >
< b > 11.1.< / b >
go test
< / a >
< / li >
< li class = "chapter " data-level = "11.2" data-path = "ch11/ch11-02.html" >
< a href = "../ch11/ch11-02.html" >
< i class = "fa fa-check" > < / i >
< b > 11.2.< / b >
測試函數
< / a >
< / li >
< li class = "chapter " data-level = "11.3" data-path = "ch11/ch11-03.html" >
< a href = "../ch11/ch11-03.html" >
< i class = "fa fa-check" > < / i >
< b > 11.3.< / b >
測試覆蓋率
< / a >
< / li >
< li class = "chapter " data-level = "11.4" data-path = "ch11/ch11-04.html" >
< a href = "../ch11/ch11-04.html" >
< i class = "fa fa-check" > < / i >
< b > 11.4.< / b >
基準測試
< / a >
< / li >
< li class = "chapter " data-level = "11.5" data-path = "ch11/ch11-05.html" >
< a href = "../ch11/ch11-05.html" >
< i class = "fa fa-check" > < / i >
< b > 11.5.< / b >
剖析
< / a >
< / li >
< li class = "chapter " data-level = "11.6" data-path = "ch11/ch11-06.html" >
< a href = "../ch11/ch11-06.html" >
< i class = "fa fa-check" > < / i >
< b > 11.6.< / b >
示例函數
< / a >
< / li >
< / ul >
< / li >
< li class = "chapter " data-level = "12" data-path = "ch12/ch12.html" >
< a href = "../ch12/ch12.html" >
< i class = "fa fa-check" > < / i >
< b > 12.< / b >
反射
< / a >
< ul class = "articles" >
< li class = "chapter " data-level = "12.1" data-path = "ch12/ch12-01.html" >
< a href = "../ch12/ch12-01.html" >
< i class = "fa fa-check" > < / i >
< b > 12.1.< / b >
為何需要反射?
< / a >
< / li >
< li class = "chapter " data-level = "12.2" data-path = "ch12/ch12-02.html" >
< a href = "../ch12/ch12-02.html" >
< i class = "fa fa-check" > < / i >
< b > 12.2.< / b >
reflect.Type和reflect.Value
< / a >
< / li >
< li class = "chapter " data-level = "12.3" data-path = "ch12/ch12-03.html" >
< a href = "../ch12/ch12-03.html" >
< i class = "fa fa-check" > < / i >
< b > 12.3.< / b >
Display遞歸打印
< / a >
< / li >
< li class = "chapter " data-level = "12.4" data-path = "ch12/ch12-04.html" >
< a href = "../ch12/ch12-04.html" >
< i class = "fa fa-check" > < / i >
< b > 12.4.< / b >
示例: 編碼S錶達式
< / a >
< / li >
< li class = "chapter " data-level = "12.5" data-path = "ch12/ch12-05.html" >
< a href = "../ch12/ch12-05.html" >
< i class = "fa fa-check" > < / i >
< b > 12.5.< / b >
通過reflect.Value脩改值
< / a >
< / li >
< li class = "chapter " data-level = "12.6" data-path = "ch12/ch12-06.html" >
< a href = "../ch12/ch12-06.html" >
< i class = "fa fa-check" > < / i >
< b > 12.6.< / b >
示例: 解碼S錶達式
< / a >
< / li >
< li class = "chapter " data-level = "12.7" data-path = "ch12/ch12-07.html" >
< a href = "../ch12/ch12-07.html" >
< i class = "fa fa-check" > < / i >
< b > 12.7.< / b >
穫取結構體字段標識
< / a >
< / li >
< li class = "chapter " data-level = "12.8" data-path = "ch12/ch12-08.html" >
< a href = "../ch12/ch12-08.html" >
< i class = "fa fa-check" > < / i >
< b > 12.8.< / b >
顯示一個類型的方法集
< / a >
< / li >
< li class = "chapter " data-level = "12.9" data-path = "ch12/ch12-09.html" >
< a href = "../ch12/ch12-09.html" >
< i class = "fa fa-check" > < / i >
< b > 12.9.< / b >
幾點忠告
< / a >
< / li >
< / ul >
< / li >
< li class = "chapter " data-level = "13" data-path = "ch13/ch13.html" >
< a href = "../ch13/ch13.html" >
< i class = "fa fa-check" > < / i >
< b > 13.< / b >
底層編程
< / a >
< ul class = "articles" >
< li class = "chapter " data-level = "13.1" data-path = "ch13/ch13-01.html" >
< a href = "../ch13/ch13-01.html" >
< i class = "fa fa-check" > < / i >
< b > 13.1.< / b >
unsafe.Sizeof, Alignof 和 Offsetof
< / a >
< / li >
< li class = "chapter " data-level = "13.2" data-path = "ch13/ch13-02.html" >
< a href = "../ch13/ch13-02.html" >
< i class = "fa fa-check" > < / i >
< b > 13.2.< / b >
unsafe.Pointer
< / a >
< / li >
< li class = "chapter " data-level = "13.3" data-path = "ch13/ch13-03.html" >
< a href = "../ch13/ch13-03.html" >
< i class = "fa fa-check" > < / i >
< b > 13.3.< / b >
示例: 深度相等判斷
< / a >
< / li >
< li class = "chapter " data-level = "13.4" data-path = "ch13/ch13-04.html" >
< a href = "../ch13/ch13-04.html" >
< i class = "fa fa-check" > < / i >
< b > 13.4.< / b >
通過cgo調用C代碼
< / a >
< / li >
< li class = "chapter " data-level = "13.5" data-path = "ch13/ch13-05.html" >
< a href = "../ch13/ch13-05.html" >
< i class = "fa fa-check" > < / i >
< b > 13.5.< / b >
幾點忠告
< / a >
< / li >
< / ul >
< / li >
< li class = "chapter " data-level = "14" data-path = "exercise/ex.html" >
< a href = "../exercise/ex.html" >
< i class = "fa fa-check" > < / i >
< b > 14.< / b >
習題解答
< / a >
< ul class = "articles" >
< li class = "chapter " data-level = "14.1" data-path = "exercise/ex-ch1.html" >
< a href = "../exercise/ex-ch1.html" >
< i class = "fa fa-check" > < / i >
< b > 14.1.< / b >
第一章 入門
< / a >
< / li >
< / ul >
< / li >
< li class = "chapter " data-level = "15" data-path = "errata.html" >
< a href = "../errata.html" >
< i class = "fa fa-check" > < / i >
< b > 15.< / b >
勘誤
< / a >
< / li >
< li class = "divider" > < / li >
< li >
< a href = "https://www.gitbook.com" target = "blank" class = "gitbook-link" >
本書使用 GitBook 釋出
< / a >
< / li >
< / ul >
< / nav >
< / div >
< div class = "book-body" >
< div class = "body-inner" >
< div class = "book-header" role = "navigation" >
<!-- Actions Left -->
<!-- Title -->
< h1 >
< i class = "fa fa-circle-o-notch fa-spin" > < / i >
< a href = "../" > Go编程语言< / a >
< / h1 >
< / div >
< div class = "page-wrapper" tabindex = "-1" role = "main" >
< div class = "page-inner" >
< section class = "normal" id = "section-" >
< h2 id = "86-示例-併髮的web爬蟲" > 8.6. 示 例 : 併 髮 的 Web爬 蟲 < / h2 >
2015-12-14 04:08:47 +00:00
< p > 在 5.6節 中 , 我 們 做 了 一 箇 簡 單 的 web爬 蟲 , 用 bfs(廣 度 優 先 )算 法 來 抓 取 整 箇 網 站 。 在 本 節 中 , 我 們 會 讓 這 箇 這 箇 爬 蟲 併 行 化 , 這 樣 每 一 箇 彼 此 獨 立 的 抓 取 命 令 可 以 併 行 進 行 IO, 最 大 化 利 用 網 絡 資 源 。 crawl函 數 和 gopl.io/ch5/findlinks3中 的 是 一 樣 的 。 < / p >
< pre > < code class = "lang-go" > gopl.io/ch8/crawl1
< span class = "hljs-keyword" > func< / span > crawl(url < span class = "hljs-typename" > string< / span > ) []< span class = "hljs-typename" > string< / span > {
fmt.Println(url)
list, err := links.Extract(url)
< span class = "hljs-keyword" > if< / span > err != < span class = "hljs-constant" > nil< / span > {
log.Print(err)
}
< span class = "hljs-keyword" > return< / span > list
}
< / code > < / pre >
< p > 主 函 數 和 5.6節 中 的 breadthFirst(深 度 優 先 )類 似 。 像 之 前 一 樣 , 一 箇 worklist是 一 箇 記 録 了 需 要 處 理 的 元 素 的 隊 列 , 每 一 箇 元 素 都 是 一 箇 需 要 抓 取 的 URL列 錶 , 不 過 這 一 次 我 們 用 channel代 替 slice來 做 這 箇 隊 列 。 每 一 箇 對 crawl的 調 用 都 會 在 他 們 自 己 的 goroutine中 進 行 併 且 會 把 他 們 抓 到 的 鏈 接 髮 送 迴 worklist。 < / p >
< pre > < code class = "lang-go" > < span class = "hljs-keyword" > func< / span > main() {
worklist := < span class = "hljs-built_in" > make< / span > (< span class = "hljs-keyword" > chan< / span > []< span class = "hljs-typename" > string< / span > )
< span class = "hljs-comment" > // Start with the command-line arguments.< / span >
< span class = "hljs-keyword" > go< / span > < span class = "hljs-keyword" > func< / span > () { worklist < - os.Args[< span class = "hljs-number" > 1< / span > :] }()
< span class = "hljs-comment" > // Crawl the web concurrently.< / span >
seen := < span class = "hljs-built_in" > make< / span > (< span class = "hljs-keyword" > map< / span > [< span class = "hljs-typename" > string< / span > ]< span class = "hljs-typename" > bool< / span > )
< span class = "hljs-keyword" > for< / span > list := < span class = "hljs-keyword" > range< / span > worklist {
< span class = "hljs-keyword" > for< / span > _, link := < span class = "hljs-keyword" > range< / span > list {
< span class = "hljs-keyword" > if< / span > !seen[link] {
seen[link] = < span class = "hljs-constant" > true< / span >
< span class = "hljs-keyword" > go< / span > < span class = "hljs-keyword" > func< / span > (link < span class = "hljs-typename" > string< / span > ) {
worklist < - crawl(link)
}(link)
}
}
}
}
< / code > < / pre >
< p > 註 意 這 裏 的 crawl所 在 的 goroutine會 將 link作 為 一 箇 顯 式 的 蔘 數 傳 入 , 來 避 免 “ 循 環 變 量 快 照 ” 的 問 題 (在 5.6.1中 有 講 解 )。 另 外 註 意 這 裏 將 命 令 行 蔘 數 傳 入 worklist也 是 在 一 箇 另 外 的 goroutine中 進 行 的 , 這 是 為 了 避 免 在 main goroutine和 crawler goroutine中 衕 時 嚮 另 一 箇 goroutine通 過 channel髮 送 內 容 時 髮 生 死 鎖 (因 為 另 一 邊 的 接 收 操 作 還 沒 有 準 備 好 )。 噹 然 , 這 裏 我 們 也 可 以 用 buffered channel來 解 決 問 題 , 這 裏 不 再 贅 述 。 < / p >
< p > 現 在 爬 蟲 可 以 高 併 髮 地 運 行 起 來 , 併 且 可 以 產 生 一 大 坨 的 URL了 , 不 過 還 是 會 有 倆 問 題 。 一 箇 問 題 是 在 運 行 一 段 時 間 後 可 能 會 齣 現 在 log的 錯 誤 信 息 裏 的 : < / p >
< pre > < code > $ go build gopl.io/ch8/crawl1
$ ./crawl1 http://gopl.io/
http://gopl.io/
https://golang.org/help/
https://golang.org/doc/
https://golang.org/blog/
...
2015/07/15 18:22:12 Get ...: dial tcp: lookup blog.golang.org: no such host
2015/07/15 18:22:12 Get ...: dial tcp 23.21.222.120:443: socket:
too many open files
...
< / code > < / pre > < p > 最 初 的 錯 誤 信 息 是 一 箇 讓 人 莫 名 的 DNS査 找 失 敗 , 卽 使 這 箇 域 名 是 完 全 可 靠 的 。 而 隨 後 的 錯 誤 信 息 揭 示 了 原 因 : 這 箇 程 序 一 次 性 創 建 了 太 多 網 絡 連 接 , 超 過 了 每 一 箇 進 程 的 打 開 文 件 數 限 製 , 旣 而 導 緻 了 在 調 用 net.Dial像 DNS査 找 失 敗 這 樣 的 問 題 。 < / p >
< p > 這 箇 程 序 實 在 是 太 他 媽 併 行 了 。 無 窮 無 盡 地 併 行 化 併 不 是 什 麼 好 事 情 , 因 為 不 管 怎 麼 説 , 你 的 係 統 總 是 會 有 一 箇 些 限 製 因 素 , 比 如 CPU覈 心 數 會 限 製 你 的 計 算 負 載 , 比 如 你 的 硬 盤 轉 軸 和 磁 頭 數 限 製 了 你 的 本 地 磁 盤 IO操 作 頻 率 , 比 如 你 的 網 絡 帶 寬 限 製 了 你 的 下 載 速 度 上 限 , 或 者 是 你 的 一 箇 web服 務 的 服 務 容 量 上 限 等 等 。 為 了 解 決 這 箇 問 題 , 我 們 可 以 限 製 併 髮 程 序 所 使 用 的 資 源 來 使 之 適 應 自 己 的 運 行 環 境 。 對 於 我 們 的 例 子 來 説 , 最 簡 單 的 方 法 就 是 限 製 對 links.Extract在 衕 一 時 間 最 多 不 會 有 超 過 n次 調 用 , 這 裏 的 n是 fd的 limit-20, 一 般 情 況 下 。 這 箇 一 箇 夜 店 裏 限 製 客 人 數 目 是 一 箇 道 理 , 隻 有 噹 有 客 人 離 開 時 , 纔 會 允 許 新 的 客 人 進 入 店 內 (譯 註 : 作 者 你 箇 老 流 氓 )。 < / p >
< p > 我 們 可 以 用 一 箇 有 容 量 限 製 的 buffered channel來 控 製 併 髮 , 這 類 似 於 操 作 係 統 裏 的 計 數 信 號 量 概 唸 。 從 概 唸 上 講 , channel裏 的 n箇 空 槽 代 錶 n箇 可 以 處 理 內 容 的 token(通 行 証 ), 從 channel裏 接 收 一 箇 值 會 釋 放 其 中 的 一 箇 token, 併 且 生 成 一 箇 新 的 空 槽 位 。 這 樣 保 証 了 在 沒 有 接 收 介 入 時 最 多 有 n箇 髮 送 操 作 。 (這 裏 可 能 我 們 拿 channel裏 填 充 的 槽 來 做 token更 直 觀 一 些 , 不 過 還 是 這 樣 吧 ~)。 由 於 channel裏 的 元 素 類 型 併 不 重 要 , 我 們 用 一 箇 零 值 的 struct{}來 作 為 其 元 素 。 < / p >
< p > 讓 我 們 重 寫 crawl函 數 , 將 對 links.Extract的 調 用 操 作 用 穫 取 、 釋 放 token的 操 作 包 裹 起 來 , 來 確 保 衕 一 時 間 對 其 隻 有 20箇 調 用 。 信 號 量 數 量 和 其 能 操 作 的 IO資 源 數 量 應 保 持 接 近 。 < / p >
< pre > < code class = "lang-go" > gopl.io/ch8/crawl2
< span class = "hljs-comment" > // tokens is a counting semaphore used to< / span >
< span class = "hljs-comment" > // enforce a limit of 20 concurrent requests.< / span >
< span class = "hljs-keyword" > var< / span > tokens = < span class = "hljs-built_in" > make< / span > (< span class = "hljs-keyword" > chan< / span > < span class = "hljs-keyword" > struct< / span > {}, < span class = "hljs-number" > 20< / span > )
< span class = "hljs-keyword" > func< / span > crawl(url < span class = "hljs-typename" > string< / span > ) []< span class = "hljs-typename" > string< / span > {
fmt.Println(url)
tokens < - < span class = "hljs-keyword" > struct< / span > {}{} < span class = "hljs-comment" > // acquire a token< / span >
list, err := links.Extract(url)
< -tokens < span class = "hljs-comment" > // release the token< / span >
< span class = "hljs-keyword" > if< / span > err != < span class = "hljs-constant" > nil< / span > {
log.Print(err)
}
< span class = "hljs-keyword" > return< / span > list
}
< / code > < / pre >
< p > 第 二 個 問 題 是 這 個 程 序 永 遠 都 不 會 終 止 , 卽 使 它 已 經 爬 到 了 所 有 初 始 鏈 接 衍 生 齣 的 鏈 接 。 (噹 然 , 除 非 你 慎 重 地 選 擇 了 閤 適 的 初 始 化 URL或 者 已 經 實 現 了 練 習 8.6中 的 深 度 限 製 , 你 應 該 還 沒 有 意 識 到 這 個 問 題 )。 爲 了 使 這 個 程 序 能 夠 終 止 , 我 們 需 要 在 worklist爲 空 或 者 沒 有 crawl的 goroutine在 運 行 時 退 齣 主 循 環 。 < / p >
< pre > < code class = "lang-go" > < span class = "hljs-keyword" > func< / span > main() {
worklist := < span class = "hljs-built_in" > make< / span > (< span class = "hljs-keyword" > chan< / span > []< span class = "hljs-typename" > string< / span > )
< span class = "hljs-keyword" > var< / span > n < span class = "hljs-typename" > int< / span > < span class = "hljs-comment" > // number of pending sends to worklist< / span >
< span class = "hljs-comment" > // Start with the command-line arguments.< / span >
n++
< span class = "hljs-keyword" > go< / span > < span class = "hljs-keyword" > func< / span > () { worklist < - os.Args[< span class = "hljs-number" > 1< / span > :] }()
< span class = "hljs-comment" > // Crawl the web concurrently.< / span >
seen := < span class = "hljs-built_in" > make< / span > (< span class = "hljs-keyword" > map< / span > [< span class = "hljs-typename" > string< / span > ]< span class = "hljs-typename" > bool< / span > )
< span class = "hljs-keyword" > for< / span > ; n > < span class = "hljs-number" > 0< / span > ; n-- {
list := < -worklist
< span class = "hljs-keyword" > for< / span > _, link := < span class = "hljs-keyword" > range< / span > list {
< span class = "hljs-keyword" > if< / span > !seen[link] {
seen[link] = < span class = "hljs-constant" > true< / span >
n++
< span class = "hljs-keyword" > go< / span > < span class = "hljs-keyword" > func< / span > (link < span class = "hljs-typename" > string< / span > ) {
worklist < - crawl(link)
}(link)
}
}
}
}
< / code > < / pre >
< p > 這 箇 版 本 中 , 計 算 器 n對 worklist的 髮 送 操 作 數 量 進 行 了 限 製 。 每 一 次 我 們 髮 現 有 元 素 需 要 被 髮 送 到 worklist時 , 我 們 都 會 對 n進 行 ++操 作 , 在 嚮 worklist中 髮 送 初 始 的 命 令 行 蔘 數 之 前 , 我 們 也 進 行 過 一 次 ++操 作 。 這 裏 的 操 作 ++是 在 每 啓 動 一 箇 crawler的 goroutine之 前 。 主 循 環 會 在 n減 為 0時 終 止 , 這 時 候 説 明 沒 活 可 乾 了 。 < / p >
< p > 現 在 這 箇 併 髮 爬 蟲 會 比 5.6節 中 的 深 度 優 先 蒐 索 版 快 上 20倍 , 而 且 不 會 齣 什 麼 錯 , 併 且 在 其 完 成 任 務 時 也 會 正 確 地 終 止 。 < / p >
< p > 下 麪 的 程 序 是 避 免 過 度 併 髮 的 另 一 種 思 路 。 這 箇 版 本 使 用 了 原 來 的 crawl函 數 , 但 沒 有 使 用 計 數 信 號 量 , 取 而 代 之 用 了 20箇 長 活 的 crawler goroutine, 這 樣 來 保 証 最 多 20箇 HTTP請 求 在 併 髮 。 < / p >
< pre > < code class = "lang-go" > < span class = "hljs-keyword" > func< / span > main() {
worklist := < span class = "hljs-built_in" > make< / span > (< span class = "hljs-keyword" > chan< / span > []< span class = "hljs-typename" > string< / span > ) < span class = "hljs-comment" > // lists of URLs, may have duplicates< / span >
unseenLinks := < span class = "hljs-built_in" > make< / span > (< span class = "hljs-keyword" > chan< / span > < span class = "hljs-typename" > string< / span > ) < span class = "hljs-comment" > // de-duplicated URLs< / span >
< span class = "hljs-comment" > // Add command-line arguments to worklist.< / span >
< span class = "hljs-keyword" > go< / span > < span class = "hljs-keyword" > func< / span > () { worklist < - os.Args[< span class = "hljs-number" > 1< / span > :] }()
< span class = "hljs-comment" > // Create 20 crawler goroutines to fetch each unseen link.< / span >
< span class = "hljs-keyword" > for< / span > i := < span class = "hljs-number" > 0< / span > ; i < < span class = "hljs-number" > 20< / span > ; i++ {
< span class = "hljs-keyword" > go< / span > < span class = "hljs-keyword" > func< / span > () {
< span class = "hljs-keyword" > for< / span > link := < span class = "hljs-keyword" > range< / span > unseenLinks {
foundLinks := crawl(link)
< span class = "hljs-keyword" > go< / span > < span class = "hljs-keyword" > func< / span > () { worklist < - foundLinks }()
}
}()
}
< span class = "hljs-comment" > // The main goroutine de-duplicates worklist items< / span >
< span class = "hljs-comment" > // and sends the unseen ones to the crawlers.< / span >
seen := < span class = "hljs-built_in" > make< / span > (< span class = "hljs-keyword" > map< / span > [< span class = "hljs-typename" > string< / span > ]< span class = "hljs-typename" > bool< / span > )
< span class = "hljs-keyword" > for< / span > list := < span class = "hljs-keyword" > range< / span > worklist {
< span class = "hljs-keyword" > for< / span > _, link := < span class = "hljs-keyword" > range< / span > list {
< span class = "hljs-keyword" > if< / span > !seen[link] {
seen[link] = < span class = "hljs-constant" > true< / span >
unseenLinks < - link
}
}
}
}
< / code > < / pre >
< p > 所 有 的 爬 蟲 goroutine現 在 都 是 被 衕 一 箇 channel-unseenLinks餵 飽 的 了 。 主 goroutine負 責 拆 分 它 從 worklist裏 拿 到 的 元 素 , 然 後 把 沒 有 抓 過 的 經 由 unseenLinks channel髮 送 給 一 箇 爬 蟲 的 goroutine。 < / p >
< p > seen這 箇 map被 限 定 在 main goroutine中 ; 也 就 是 説 這 箇 map隻 能 在 main goroutine中 進 行 訪 問 。 類 似 於 其 它 的 信 息 隱 藏 方 式 , 這 樣 的 約 束 可 以 讓 我 們 從 一 定 程 度 上 保 証 程 序 的 正 確 性 。 例 如 , 內 部 變 量 不 能 夠 在 函 數 外 部 被 訪 問 到 ; 變 量 (§ 2.3.4)在 沒 有 被 轉 義 的 情 況 下 是 無 法 在 函 數 外 部 訪 問 的 ; 一 箇 對 象 的 封 裝 字 段 無 法 被 該 對 象 的 方 法 以 外 的 方 法 訪 問 到 。 在 所 有 的 情 況 下 , 信 息 隱 藏 都 可 以 幫 助 我 們 約 束 我 們 的 程 序 , 使 其 不 髮 生 意 料 之 外 的 情 況 。 < / p >
< p > crawl函 數 爬 到 的 鏈 接 在 一 箇 專 有 的 goroutine中 被 髮 送 到 worklist中 來 避 免 死 鎖 。 為 了 節 省 空 間 , 這 箇 例 子 的 終 止 問 題 我 們 先 不 進 行 詳 細 闡 述 了 。 < / p >
< p > 練 習 8.6: 為 併 髮 爬 蟲 增 加 深 度 限 製 。 也 就 是 説 , 如 果 用 戶 設 置 了 depth=3, 那 麼 隻 有 從 首 頁 跳 轉 三 次 以 內 能 夠 跳 到 的 頁 麪 纔 能 被 抓 取 到 。 < / p >
< p > 練 習 8.7: 完 成 一 箇 併 髮 程 序 來 創 建 一 箇 線 上 網 站 的 本 地 鏡 像 , 把 該 站 點 的 所 有 可 達 的 頁 麪 都 抓 取 到 本 地 硬 盤 。 為 了 省 事 , 我 們 這 裏 可 以 隻 取 齣 現 在 該 域 下 的 所 有 頁 麪 (比 如 golang.org結 尾 , 譯 註 : 外 鏈 的 應 該 就 不 算 了 。 )噹 然 了 , 齣 現 在 頁 麪 裏 的 鏈 接 你 也 需 要 進 行 一 些 處 理 , 使 其 能 夠 在 你 的 鏡 像 站 點 上 進 行 跳 轉 , 而 不 是 指 嚮 原 始 的 鏈 接 。 < / p >
< p > 譯 註 :
拓 展 閱 讀 :
< a href = "http://marcio.io/2015/07/handling-1-million-requests-per-minute-with-golang/" target = "_blank" > http://marcio.io/2015/07/handling-1-million-requests-per-minute-with-golang/< / a > < / p >
2015-12-09 07:57:17 +00:00
< / section >
< / div >
< / div >
< / div >
< a href = "../ch8/ch8-05.html" class = "navigation navigation-prev " aria-label = "Previous page: 併行的循環" > < i class = "fa fa-angle-left" > < / i > < / a >
< a href = "../ch8/ch8-07.html" class = "navigation navigation-next " aria-label = "Next page: 基於select的多路復用" > < i class = "fa fa-angle-right" > < / i > < / a >
< / div >
< / div >
< script src = "../gitbook/app.js" > < / script >
< script src = "../gitbook/plugins/gitbook-plugin-search/lunr.min.js" > < / script >
< script src = "../gitbook/plugins/gitbook-plugin-search/search.js" > < / script >
< script src = "../gitbook/plugins/gitbook-plugin-sharing/buttons.js" > < / script >
< script src = "../gitbook/plugins/gitbook-plugin-fontsettings/buttons.js" > < / script >
< script >
require(["gitbook"], function(gitbook) {
var config = {"highlight":{},"search":{},"sharing":{"facebook":true,"twitter":true,"google":false,"weibo":false,"instapaper":false,"vk":false,"all":["facebook","google","twitter","weibo","instapaper"]},"fontsettings":{"theme":"white","family":"sans","size":2}};
gitbook.start(config);
});
< / script >
< / body >
< / html >