聊聊 tcp keepalive

Posted 2021-05-01 董泽润的技术笔记

tags:

篇首语：本文由小常识网(cha138.com)小编为大家整理，主要介绍了聊聊 tcp keepalive相关的知识，希望对你有一定的参考价值。

当我们在谈长连接保活时，会分两个层面：应用与 tcp 层。前者是业务逻辑层的 ping 操作，而后者则是 tcp keepalive^[1] ，侧重点不同，线上建议都要配置。

现象

一般没有挨过社会毒打的，不会关注 tcp keepalive, 但是对于 IM, 接入层，游戏领域来说却是必须了解的。

比如上图 tcp 连接状态，经过三次握手后，tcp 连接变成 ESTABLISHED, 然后开始传送数据。此时如果发生 client 突然断电，防火墙配置有误，公网抖动等等原因，导致 client tcp 连接断开，但是 FIN 包或是 RST 并未发送出去，那么 server 的连接仍然保持，不会释放直到进程重启或是 keepalive 探测超时。

有的同学会说我的服务是内网，也会网络不稳定嘛？当然会，交换机故障导致整个机柜网络隔离的都见过，何况现在公司业务都上云了，我司一个服务因为 aws 网络问题，请求超时，很不幸还级联触发了一个 bug, close 了己经关掉的 channel ...

孤儿连接

还是以 redis 做实现吧，client 172.24.213.40, server 172.24.213.39. 在 client 端开启两个 session, 分别连接 server 和 tcpdump

root@worker1:~# redis-cli -h 172.24.213.39 -p 6380
172.24.213.39:6380>

root@worker1:~# tcpdump -i eth0 -n host 172.24.213.39
tcpdump: verbose output suppressed, use -v or -vv for full protocol decode
listening on eth0, link-type EN10MB (Ethernet), capture size 262144 bytes
14:00:42.604669 IP 172.24.213.40.38470 > 172.24.213.39.6380: Flags [S], seq 189110270, win 29200, options [mss 1460,sackOK,TS val 3222067084 ecr 0,nop,wscale 6], length 0
14:00:42.604890 IP 172.24.213.39.6380 > 172.24.213.40.38470: Flags [S.], seq 3111402640, ack 189110271, win 28960, options [mss 1460,sackOK,TS val 1210274267 ecr 3222067084,nop,wscale 7], length 0
14:00:42.604906 IP 172.24.213.40.38470 > 172.24.213.39.6380: Flags [.], ack 1, win 457, options [nop,nop,TS val 3222067084 ecr 1210274267], length 0

14:03:13.731484 IP 172.24.213.40.38470 > 172.24.213.39.6380: Flags [.], ack 11469, win 559, options [nop,nop,TS val 3222218206 ecr 1210410284], length 0
14:03:13.731628 IP 172.24.213.39.6380 > 172.24.213.40.38470: Flags [.], ack 18, win 227, options [nop,nop,TS val 1210425387 ecr 3222067087], length 0
14:03:28.835480 IP 172.24.213.40.38470 > 172.24.213.39.6380: Flags [.], ack 11469, win 559, options [nop,nop,TS val 3222233310 ecr 1210425387], length 0
14:03:28.835615 IP 172.24.213.39.6380 > 172.24.213.40.38470: Flags [.], ack 18, win 227, options [nop,nop,TS val 1210440491 ecr 3222067087], length 0

会看到 client 每隔 15s 会发送空的 ACK 包给 server, 并收到 server 返回的 ACK, 实际上这就是 client 端的 tcp keepalive 在起作用。然后我们在 server 设置 iptables, 人为制造网络隔离

root@myali:~# iptables -I INPUT -s 172.24.213.40 -j DROP;iptables -I OUTPUT -d 172.24.213.40 -j DROP;iptables -nvL

过一会查看 client tcpdump 输出

14:05:14.563481 IP 172.24.213.40.38470 > 172.24.213.39.6380: Flags [.], ack 11469, win 559, options [nop,nop,TS val 3222339035 ecr 1210531111], length 0
14:05:19.683482 IP 172.24.213.40.38470 > 172.24.213.39.6380: Flags [.], ack 11469, win 559, options [nop,nop,TS val 3222344155 ecr 1210531111], length 0
14:05:24.803489 IP 172.24.213.40.38470 > 172.24.213.39.6380: Flags [.], ack 11469, win 559, options [nop,nop,TS val 3222349275 ecr 1210531111], length 0
14:05:29.923486 IP 172.24.213.40.38470 > 172.24.213.39.6380: Flags [R.], seq 18, ack 11469, win 559, options [nop,nop,TS val 3222354394 ecr 1210531111], length 0

client 172.24.213.40 每 5s 发送一个 ACK 三次，最后发一个 RST 包销毁连接。当然这个 RST redis-server 肯定也没有接收到。过一会将 server 防火墙删除

root@myali:~# iptables -D INPUT -s 172.24.213.40 -j DROP;iptables -D OUTPUT -d 172.24.213.40 -j DROP;iptables -nvL

此时再分别查看网络连接 ss -a | grep 6380, 会发现 client 端消失了，但是 server 端的还在，状态仍然是 ESTAB

root@myali:~# ss -a | grep 6380
tcp   ESTAB    0      0   172.24.213.39:6380   172.24.213.40:38470

这就是孤儿连接

请求超时

回到刚才的 case, 如果在 keepalive 探测失败前 client 发送了请求，会有什么效果呢？答案是超时

root@worker1:~# redis-cli -h 172.24.213.39 -p 6380
172.24.213.39:6380> get a

比如 get a 一直卡在这里，没有任何输出，如果换成业务代码的话，直接超时返回了。

root@worker1:~# tcpdump -i eth0 -n host 172.24.213.39
14:41:25.770785 IP 172.24.213.40.38486 > 172.24.213.39.6380: Flags [P.], seq 18:38, ack 11469, win 520, options [nop,nop,TS val 3224510177 ecr 1212709066], length 20
14:41:25.975487 IP 172.24.213.40.38486 > 172.24.213.39.6380: Flags [P.], seq 18:38, ack 11469, win 520, options [nop,nop,TS val 3224510382 ecr 1212709066], length 20
14:41:26.183509 IP 172.24.213.40.38486 > 172.24.213.39.6380: Flags [P.], seq 18:38, ack 11469, win 520, options [nop,nop,TS val 3224510590 ecr 1212709066], length 20
14:41:26.595484 IP 172.24.213.40.38486 > 172.24.213.39.6380: Flags [P.], seq 18:38, ack 11469, win 520, options [nop,nop,TS val 3224511002 ecr 1212709066], length 20
14:41:27.427486 IP 172.24.213.40.38486 > 172.24.213.39.6380: Flags [P.], seq 18:38, ack 11469, win 520, options [nop,nop,TS val 3224511834 ecr 1212709066], length 20
14:41:29.091484 IP 172.24.213.40.38486 > 172.24.213.39.6380: Flags [P.], seq 18:38, ack 11469, win 520, options [nop,nop,TS val 3224513498 ecr 1212709066], length 20
14:41:32.611489 IP 172.24.213.40.38486 > 172.24.213.39.6380: Flags [P.], seq 18:38, ack 11469, win 520, options [nop,nop,TS val 3224517018 ecr 1212709066], length 20
14:41:39.267484 IP 172.24.213.40.38486 > 172.24.213.39.6380: Flags [P.], seq 18:38, ack 11469, win 520, options [nop,nop,TS val 3224523674 ecr 1212709066], length 20
14:41:52.579488 IP 172.24.213.40.38486 > 172.24.213.39.6380: Flags [P.], seq 18:38, ack 11469, win 520, options [nop,nop,TS val 3224536985 ecr 1212709066], length 20

通过抓包发现，tcp 底层一直处于超时重传阶段，指数退避 backoff resend P 包。此时会发现，tcp keepalive 并不起作用，并没有主动 RST 当前连接。

那么这个重传时间要多久呢？在我的测试内核 4.15.0-66, 最后间隔固定在 2min, 重传 15 次，大约 15min 后连接被销毁。

keepalive 参数

我们来看一下 keepalive 参数有哪些，以及如何配置。

root@myali:~# cat /proc/sys/net/ipv4/tcp_keepalive_time
7200
root@myali:~# cat /proc/sys/net/ipv4/tcp_keepalive_probes
9
root@myali:~# cat /proc/sys/net/ipv4/tcp_keepalive_intvl
75

tcp_keepalive_time keepalive 探测时间间隔，tcp 连接处于最大的 idle 时长，默认 2 小时，太长了，一般都要设短
tcp_keepalive_probes 如果探测失败，peer 没有返回 ACK, 那么再连续探测次数，默认是 9 次
tcp_keepalive_intvl 首次探测失败后，连续 probes 的间隔，默认 75s

我们来看一下 redis-cli 如何设置的

/* Set TCP keep alive option to detect dead peers. The interval option
 * is only used for Linux as we are using Linux-specific APIs to set
 * the probe send time, interval, and count. */
int anetKeepAlive(char *err, int fd, int interval)
{
    int val = 1;

    if (setsockopt(fd, SOL_SOCKET, SO_KEEPALIVE, &val, sizeof(val)) == -1)
    {
        anetSetError(err, "setsockopt SO_KEEPALIVE: %s", strerror(errno));
        return ANET_ERR;
    }

    /* Default settings are more or less garbage, with the keepalive time
     * set to 7200 by default on Linux. Modify settings to make the feature
     * actually useful. */

    /* Send first probe after interval. */
    val = interval;
    if (setsockopt(fd, IPPROTO_TCP, TCP_KEEPIDLE, &val, sizeof(val)) < 0) {
        anetSetError(err, "setsockopt TCP_KEEPIDLE: %s\n", strerror(errno));
        return ANET_ERR;
    }

    /* Send next probes after the specified interval. Note that we set the
     * delay as interval / 3, as we send three probes before detecting
     * an error (see the next setsockopt call). */
    val = interval/3;
    if (val == 0) val = 1;
    if (setsockopt(fd, IPPROTO_TCP, TCP_KEEPINTVL, &val, sizeof(val)) < 0) {
        anetSetError(err, "setsockopt TCP_KEEPINTVL: %s\n", strerror(errno));
        return ANET_ERR;
    }

    /* Consider the socket in error state after three we send three ACK
     * probes without getting a reply. */
    val = 3;
    if (setsockopt(fd, IPPROTO_TCP, TCP_KEEPCNT, &val, sizeof(val)) < 0) {
        anetSetError(err, "setsockopt TCP_KEEPCNT: %s\n", strerror(errno));
        return ANET_ERR;
    }
    return ANET_OK;
}

首先是 SO_KEEPALIVE enable, 然后再分别设置上面提到的三个参数。redis-cli 设置的比较激进，IDLE 15s, 连续探测 3 次，每次间隔 5s

go keepalive

现在看一下 go 的设置，一般 server 创建的套路如下

ln, err := net.Listen("tcp", ":8080")
if err != nil {
 // handle error
}
for {
 conn, err := ln.Accept()
 if err != nil {
 }
 go handleConnection(conn)
}

Listen 之后，循环 Accept 接收新的连接请求，然后 goroutine 异步处理

conn, err := net.Dial("tcp", "golang.org:80")
if err != nil {
 // handle error
}
fmt.Fprintf(conn, "GET / HTTP/1.0\r\n\r\n")
status, err := bufio.NewReader(conn).ReadString('\n')

client 处理也比较简单，net.Dial 后撸就可以了，不用像 c 语言那么麻烦。但是默认 keepalive 怎么设置的呢？看版本

经过 net: enable TCP keepalives by default^[2]和 net: add KeepAlive field to ListenConfig^[3] 更新之后，从 go1.13 开始，默认都会开启 client 端与 server 端的 keepalive, 默认是 15s

func (ln *TCPListener) accept() (*TCPConn, error) {
 fd, err := ln.fd.accept()
 if err != nil {
  return nil, err
 }
 tc := newTCPConn(fd)
 if ln.lc.KeepAlive >= 0 {
  setKeepAlive(fd, true)
  ka := ln.lc.KeepAlive
  if ln.lc.KeepAlive == 0 {
   ka = defaultTCPKeepAlive
  }
  setKeepAlivePeriod(fd, ka)
 }
 return tc, nil
}

func setKeepAlivePeriod(fd *netFD, d time.Duration) error {
 // The kernel expects seconds so round to next highest second.
 secs := int(roundDurationUp(d, time.Second))
 if err := fd.pfd.SetsockoptInt(syscall.IPPROTO_TCP, syscall.TCP_KEEPINTVL, secs); err != nil {
  return wrapSyscallError("setsockopt", err)
 }
 err := fd.pfd.SetsockoptInt(syscall.IPPROTO_TCP, syscall.TCP_KEEPIDLE, secs)
 runtime.KeepAlive(fd)
 return wrapSyscallError("setsockopt", err)
}

但是 go 有点挫，无法设置 probes 探测次数，而且 intvl 与 idle time 都是同一个值。对于小于 go1.13 版本的，需要手工设用 SetKeepAlive enable, 再调用 SetKeepAlivePeriod 设置 idle time

从这一点来看，也建议升级至 go1.13, 语言层面多做一点，开发者就会少犯些错误，少走些弯路，毕竟时间都用来堆屎了。

长连接 vs 短连接

长短连接各有优缺点，本文讨论的网络隔离问题，短连接会很好处理，dial timeout 直接报错返回即可，但是长连接略过了 dial timeout 阶段，很有可能直接读写超时，此时服务的 P99 spike 少不了。

长连接可以减少 tcp timewait 数据，并且省去了三次握手时间，收益还是可观的。对于 http2 grpc 环境，线上几个 tcp 长连接就可以服务业务的高并发。

前公司 thrift 服务大量使用短连接，使用体验也不错，而且由于短连接的原因，流量捕获后导流也很方便。总之长短连接还是看架构体系吧。

小结

这次分享就这些，以后面还会分享更多 tcp 的内容，如果感兴趣，可以关注并转发(:

参考资料

[1]

rfc1122 keepalive: https://tools.ietf.org/html/rfc1122#page-101,

[2]

net: enable TCP keepalives by default: https://github.com/golang/go/commit/5bd7e9c54f946eec95d32762e7e9e1222504bfc1,

[3]

net: add KeepAlive field to ListenConfig: https://github.com/golang/go/commit/1abf3aa55bb8b346bb1575ac8db5022f215df65a,

以上是关于聊聊 tcp keepalive的主要内容，如果未能解决你的问题，请参考以下文章