Data Race Detector for Golang
Posted 百里分麾
tags:
篇首语:本文由小常识网(cha138.com)小编为大家整理,主要介绍了Data Race Detector for Golang相关的知识,希望对你有一定的参考价值。
Data Race Detector for Golang
主要介绍 race detector 的用法,golang 中常见的 data race 案例以及消除 race 的方法。
1 race detector 简介
golang 代码发生 data race 的条件:多个(大于或等于2 个) goroutine 同时对同一个变量进行读/写,并且至少有一个 goroutine 进行写操作。如果所有 goroutine 都只是进行读操作,那将不会构成数据争用。
为协助定位 data race 的 bug, Golang 提供了内置的 data race 检测工具。用法很简单,只需在 go 命令加上 -race 参数,官网文档提示支持以下 4 种方法,选择一种适合自己的就行。
go run -race mysrc.go // to run the source file
go build -race mycmd // to build the command
go test -race mypkg // to test the package
go install -race mypkg // to install the package
但其实
go build -race
最终编译出的二进制包,也可用于检测是否存在数据竞态race detector 只能找出运行时的数据竞态,所以未执行代码的数据竞态无法被检测出。测试用例很难覆盖全所有逻辑代码,在现实工作负载情况下,运行用 -race 参数编译出的二进制文件,可以找出更多的数据竞态问题
2 常见的 data race 样例
2.1 循环计数变量
测试源码 loop_counter.go
package main
import (
"fmt"
"sync"
)
func main() {
var wg sync.WaitGroup
n := 5
wg.Add(n)
for i := 0; i < n; i++ {
go func() {
fmt.Println("i", i) // it is not the 'i' you are looking for.
wg.Done()
}()
}
wg.Wait()
}
竞态检测
[whoareyou@data-race]$ go run -race loop_counter.go
==================
WARNING: DATA RACE
Read at 0x00c000018080 by goroutine 8:
main.main.func1()
/Users/whoareyou/code/go/src/github.com/wxdyhj/awesome-golang/data-race/loop_counter.go:14 +0x3c
Previous write at 0x00c000018080 by main goroutine:
main.main()
/Users/whoareyou/code/go/src/github.com/wxdyhj/awesome-golang/data-race/loop_counter.go:12 +0x104
Goroutine 8 (running) created at:
main.main()
/Users/whoareyou/code/go/src/github.com/wxdyhj/awesome-golang/data-race/loop_counter.go:13 +0xdc
==================
i 2
==================
WARNING: DATA RACE
Read at 0x00c000018080 by goroutine 7:
main.main.func1()
/Users/whoareyou/code/go/src/github.com/wxdyhj/awesome-golang/data-race/loop_counter.go:14 +0x3c
Previous write at 0x00c000018080 by main goroutine:
main.main()
/Users/whoareyou/code/go/src/github.com/wxdyhj/awesome-golang/data-race/loop_counter.go:12 +0x104
Goroutine 7 (running) created at:
main.main()
/Users/whoareyou/code/go/src/github.com/wxdyhj/awesome-golang/data-race/loop_counter.go:13 +0xdc
==================
i 2
i 5
i 5
i 5
Found 2 data race(s)
exit status 66
原因分析
用于控制退出循环的变量 i, 对 i 执行自增操作, 有写操作;循环内部,起 5 个 goroutine, 每个 goroutine 均输出 i, 存在读操作;满足 data race 发生的条件
执行
go run loop_counter.go
大概率会输出 55555,而不是 01234在开启数据竞态检测的情况下,每次输出的结果几乎不同,这是因为存在数据并发读写,程序行为不可预测
消除竞态
package main
import (
"fmt"
"sync"
)
func main() {
var wg sync.WaitGroup
n := 5
wg.Add(n)
for i := 0; i < n; i++ {
go func(j int) {
fmt.Println("j", j) // read local copy of the loop counter
wg.Done()
}(i)
}
wg.Wait()
}
消除分析
循环内部各 goroutine 所读的 j, 均是循环控制变量 i 的一份拷贝,不存在竞态问题
上述程序会输出期望得到的 01234
2.2 意外共享变量
测试源码
package main
import (
"fmt"
"os"
"time"
)
func main() {
data := []byte("hello golang")
res := make(chan error, 2)
f1, err := os.Create("file1")
fmt.Printf("pointer of err from file1: %p\n", &err)
if err != nil {
res <- err
} else {
go func() {
// This err is shared with the main goroutine, so the write races with the write below.
_, err = f1.Write(data)
res <- err
f1.Close()
}()
}
f2, err := os.Create("file2")
fmt.Printf("pointer of err from file2: %p\n", &err)
if err != nil {
res <- err
} else {
go func() {
_, err = f2.Write(data)
res <- err
f2.Close()
}()
}
// receive the error message
go func() {
for err := range res {
fmt.Println("err", err)
}
}()
time.Sleep(1 * time.Second) // wait the goroutine to receive the err
}
竞态检测
[whoareyou@data-race]$ go run -race accident_shared_var.go
pointer of err from file1: 0xc0000a01e0
pointer of err from file2: 0xc0000a01e0
err <nil>
==================
WARNING: DATA RACE
Write at 0x00c0000a01e0 by goroutine 7:
main.main.func1()
/Users/whoareyou/code/go/src/github.com/wxdyhj/awesome-golang/data-race/accident_shared_var.go:19 +0x94
Previous write at 0x00c0000a01e0 by main goroutine:
main.main()
/Users/whoareyou/code/go/src/github.com/wxdyhj/awesome-golang/data-race/accident_shared_var.go:25 +0x2f2
Goroutine 7 (running) created at:
main.main()
/Users/whoareyou/code/go/src/github.com/wxdyhj/awesome-golang/data-race/accident_shared_var.go:17 +0x528
==================
err <nil>
Found 1 data race(s)
exit status 66
原因分析
内层 goroutine 与 外层 goroutine 使用同一个 err 变量,两个 goroutine 并发对 err 执行写操作
消除竞态
_, err := f1.Write(data)
_, err := f2.Write(data)
消除分析
只需改两行代码,注意 := 的使用,此时内层 goroutine 的 err 是新变量
2.3 未受保护的全局变量
测试源码
package main
import (
"fmt"
"sync"
)
var (
num2Name = make(map[int]string) // key: number, value: name
)
func SetName(num int, name string) {
num2Name[num] = name
}
func GetName(num int) string {
return num2Name[num]
}
func main() {
wg := &sync.WaitGroup{}
wg.Add(2)
go func() {
defer wg.Done()
SetName(1, "Liu Bei")
}()
go func() {
defer wg.Done()
fmt.Println(GetName(1))
}()
wg.Wait()
}
竞态检测
[whoareyou@data-race]$ go run -race unprotected_global_var.go
==================
WARNING: DATA RACE
Read at 0x00c00011c180 by goroutine 8:
runtime.mapaccess1_fast64()
/usr/local/go1.15.4/src/runtime/map_fast64.go:12 +0x0
main.GetName()
/Users/whoareyou/code/go/src/github.com/wxdyhj/awesome-golang/data-race/unprotected_global_var.go:17 +0xb3
main.main.func2()
/Users/whoareyou/code/go/src/github.com/wxdyhj/awesome-golang/data-race/unprotected_global_var.go:29 +0xd9
Previous write at 0x00c00011c180 by goroutine 7:
runtime.mapassign_fast64()
/usr/local/go1.15.4/src/runtime/map_fast64.go:92 +0x0
main.SetName()
/Users/whoareyou/code/go/src/github.com/wxdyhj/awesome-golang/data-race/unprotected_global_var.go:13 +0xa4
main.main.func1()
/Users/whoareyou/code/go/src/github.com/wxdyhj/awesome-golang/data-race/unprotected_global_var.go:25 +0x6d
Goroutine 8 (running) created at:
main.main()
/Users/whoareyou/code/go/src/github.com/wxdyhj/awesome-golang/data-race/unprotected_global_var.go:27 +0xc8
Goroutine 7 (finished) created at:
main.main()
/Users/whoareyou/code/go/src/github.com/wxdyhj/awesome-golang/data-race/unprotected_global_var.go:23 +0xa6
==================
==================
WARNING: DATA RACE
Read at 0x00c00008e048 by goroutine 8:
main.GetName()
/Users/whoareyou/code/go/src/github.com/wxdyhj/awesome-golang/data-race/unprotected_global_var.go:17 +0xc9
main.main.func2()
/Users/whoareyou/code/go/src/github.com/wxdyhj/awesome-golang/data-race/unprotected_global_var.go:29 +0xd9
Previous write at 0x00c00008e048 by goroutine 7:
main.SetName()
/Users/whoareyou/code/go/src/github.com/wxdyhj/awesome-golang/data-race/unprotected_global_var.go:13 +0xb9
main.main.func1()
/Users/whoareyou/code/go/src/github.com/wxdyhj/awesome-golang/data-race/unprotected_global_var.go:25 +0x6d
Goroutine 8 (running) created at:
main.main()
/Users/whoareyou/code/go/src/github.com/wxdyhj/awesome-golang/data-race/unprotected_global_var.go:27 +0xc8
Goroutine 7 (finished) created at:
main.main()
/Users/whoareyou/code/go/src/github.com/wxdyhj/awesome-golang/data-race/unprotected_global_var.go:23 +0xa6
==================
Liu Bei
Found 2 data race(s)
exit status 66
原因分析
上述代码发生竞态的条件:SetName 和 GetName 访问的是同一个 map num2Name
并发读写同一个 map
消除竞态
package main
import (
"fmt"
"sync"
)
var (
num2Name = make(map[int]string) // key: number, value: name
lock = new(sync.RWMutex) // read and write lock
)
func SetName(num int, name string) {
lock.Lock()
defer lock.Unlock()
num2Name[num] = name
}
func GetName(num int) string {
lock.RLock()
defer lock.RUnlock()
return num2Name[num]
}
func main() {
wg := &sync.WaitGroup{}
wg.Add(2)
go func() {
defer wg.Done()
SetName(1, "Liu Bei")
}()
go func() {
defer wg.Done()
fmt.Println(GetName(1))
}()
wg.Wait()
}
消除分析
解决思路:使用读写锁控制并发,访问 map 前加锁
2.4 未受保护的私有变量
测试源码
package main
import (
"fmt"
"os"
"time"
)
type Watchdog struct{ last int64 }
func (w *Watchdog) KeepAlive() {
w.last = time.Now().UnixNano() // First conflicting access.
}
func (w *Watchdog) Start() {
go func() {
for {
time.Sleep(time.Second)
// Second conflicting access.
if w.last < time.Now().Add(-10*time.Second).UnixNano() {
fmt.Println("No keepalives for 10 seconds. Dying.")
os.Exit(1)
}
}
}()
}
func main() {
watchDog := Watchdog{}
go watchDog.KeepAlive()
watchDog.Start()
time.Sleep(10 * time.Second) // wait a moment
}
竞态检测
[whoareyou@data-race]$ go run -race unprotected_primitive_var.go
==================
WARNING: DATA RACE
Read at 0x00c00013e008 by goroutine 8:
main.(*Watchdog).Start.func1()
/Users/whoareyou/code/go/src/github.com/wxdyhj/awesome-golang/data-race/unprotected_primitive_var.go:20 +0xce
Previous write at 0x00c00013e008 by goroutine 7:
main.(*Watchdog).KeepAlive()
/Users/whoareyou/code/go/src/github.com/wxdyhj/awesome-golang/data-race/unprotected_primitive_var.go:12 +0xa4
Goroutine 8 (running) created at:
main.(*Watchdog).Start()
/Users/whoareyou/code/go/src/github.com/wxdyhj/awesome-golang/data-race/unprotected_primitive_var.go:16 +0x4c
main.main()
/Users/whoareyou/code/go/src/github.com/wxdyhj/awesome-golang/data-race/unprotected_primitive_var.go:31 +0x88
Goroutine 7 (finished) created at:
main.main()
/Users/whoareyou/code/go/src/github.com/wxdyhj/awesome-golang/data-race/unprotected_primitive_var.go:30 +0x7a
==================
Found 1 data race(s)
exit status 66
原因分析
KeepAlive 和 Start 函数均访问了私有变量 last
当多个 goroutine 并发调用 KeepAlive 和 Start 函数时,就会发生竞态
消除竞态
常规方法:使用 channel 或者 锁。
当然,也可以使用 sync/automic 包来实现无锁行为
type Watchdog struct{ last int64 }
func (w *Watchdog) KeepAlive() {
atomic.StoreInt64(&w.last, time.Now().UnixNano())
}
func (w *Watchdog) Start() {
go func() {
for {
time.Sleep(time.Second)
if atomic.LoadInt64(&w.last) < time.Now().Add(-10*time.Second).UnixNano() {
fmt.Println("No keepalives for 10 seconds. Dying.")
os.Exit(1)
}
}
}()
}
2.5 异步的 channel 写和关闭操作
测试源码
package main
func main() {
c := make(chan struct{}) // or buffered channel
// The race detector cannot derive the happens before relation
// for the following send and close operations. These two operations
// are unsynchronized and happen concurrently.
go func() { c <- struct{}{} }()
close(c)
}
竞态检测
[whoareyou@data-race]$ go run -race unsynchronized_channel.go
==================
WARNING: DATA RACE
Read at 0x00c00005e070 by goroutine 6:
runtime.chansend()
/usr/local/go1.15.4/src/runtime/chan.go:158 +0x0
main.main.func1()
/Users/whoareyou/code/go/src/github.com/wxdyhj/awesome-golang/data-race/unsynchronized_channel.go:9 +0x44
Previous write at 0x00c00005e070 by main goroutine:
runtime.closechan()
/usr/local/go1.15.4/src/runtime/chan.go:357 +0x0
main.main()
/Users/whoareyou/code/go/src/github.com/wxdyhj/awesome-golang/data-race/unsynchronized_channel.go:10 +0x74
Goroutine 6 (running) created at:
main.main()
/Users/whoareyou/code/go/src/github.com/wxdyhj/awesome-golang/data-race/unsynchronized_channel.go:9 +0x66
==================
Found 1 data race(s)
exit status 66
原因分析
存在对同一个 channel 同时写和关闭的操作
According to the Go memory model, a send on a channel happens before the corresponding receive from that channel completes.
竞态消除
package main
func main() {
c := make(chan struct{}) // or buffered channel
// The race detector cannot derive the happens before relation
// for the following send and close operations. These two operations
// are unsynchronized and happen concurrently.
go func() { c <- struct{}{} }()
<-c // receive operation can guarantee the send is done
close(c)
}
消除分析
To synchronize send and close operations, use a receive operation that guarantees the send is done before the close
读 channel 的操作能够保证 写 channel 的动作已完成
3 race detector 的高级玩法
可不掌握,有需求时,查阅 Options 即可
环境变量
GORACE
支持设置 race detector 的选项,设置格式如下:
GORACE="option1=val1 option2=val2"
支持的选项
log_path: 检测报告输出文件,默认 stderr,若指定 log_path,检测结果将会输出到名为
log_path.pid
的文件中exitcode: 当检测到 data race 时程序的退出状态码,默认 66
strip_path_prefix: 检测结果报告 data race 位置时,跳过的文件路径前缀,默认 ""
history_size: 每个 goroutine 可访问的内存历史大小为
32K * 2**history_size
,默认 1,增加该值的大小,可解决 "failed to restore the stack" 的问题halt_on_error: 当报告第一个 data race 时,程序是否退出,默认 0,即默认不退出,设置成 非0,程序检测到 data race 就会退出
atexit_sleep_ms: 在 main 函数退出前的睡眠时间,单位毫秒,默认 1000(这么做的目的是,保证 race detector 有足够的时间输出监测报告。仅是个人理解,未在官网找到证据)
举例
GORACE="log_path=/tmp/race/report strip_path_prefix=/my/go/sources/" go run -race main.go
其实就是设置了 GORACE 的环境变量,若要清除 race detector 的设置,执行 unset GORACE
即可清除 GORACE 环境变量
4 小结
养成对 golang 项目做静态监测的习惯
并发读写 bool, int, int64 等类型变量,也会产生 data race,一定要消除
用于线上环境运行的二进制文件,编译时一定不要使用 -race 参数
竞态检测开销
内存增加 5-10 倍
运行时间增加 2-20 倍
非原子操作内存引起的 data race,会带来难以排查的 bug
race detector 支持的操作系统: linux/amd64, linux/ppc64le, linux/arm64, freebsd/amd64, netbsd/amd64, darwin/amd64, darwin/arm64, and windows/amd64
race detector 会为每个 defer 和 recover 语句申请额外的 8 字节内存空间,所在 goroutine 退出时才会被回收。所以,如果程序中存在不退出的 goroutine,并且循环调用 defer 或 recover,那程序使用的内存将会无限增长
在 go 内存模型中,存在 race 的程序的行为是未定义的,理论上可能出现任何情况
5 参考资料
Data Race Detector(https://golang.org/doc/articles/race_detector.html)
Introducing the Go Race Detector(https://blog.golang.org/race-detector)
The Go Memory Model(https://golang.org/ref/mem)
以上是关于Data Race Detector for Golang的主要内容,如果未能解决你的问题,请参考以下文章