postgresql tcp 连接超时问题
错误现象
应用中遇到一个错误
server closed the connection unexpectedly
This probably means the server terminated abnormally
before or while processing the request.
问题分析
Connection timed out
数据库侧记录 的日志之一
could not receive data from client: Connection timed out
数据库日志中没有同样的记录,应该是连接断开,客户端直接返回这个错误了。于是想办法重现这样的错误。
猜测和连接断开有关,这个好模拟在服务端将对应的连接kill -15 杀掉,客户端再继续操作会出现类似的错误。
[postgres@work ~]$ psql -h 192.168.149.131 -U antdb
Password for user antdb:
psql (11.5, server 11.6)
Type "help" for help.
192.168.149.131:5432 antdb@test=# begin;
BEGIN
192.168.149.131:5432 antdb@test=*# commit;
FATAL: terminating connection due to administrator command
server closed the connection unexpectedly
This probably means the server terminated abnormally
before or while processing the request.
The connection to the server was lost. Attempting reset: Succeeded.
真实应用中不会有主动断开连接的操作,所以继续分析。联想到数据库会对空闲的连接做处理的机制。由几个参数控制
tcp_keepalives_count | Maximum number of TCP keepalive retransmits.
tcp_keepalives_idle | Time between issuing TCP keepalives.
tcp_keepalives_interval | Time between TCP keepalive retransmits.
和操作系统对tcp socket 控制类似。
在server端查看数据库连接网络情况,用ss或者netstat 命令
[antdb@node1 ~]$ ss -apte|grep 105408
ESTAB 0 0 192.168.149.131:postgres 192.168.149.100:44338
users:(("postgres",pid=105408,fd=11)) timer:(keepalive,19sec,0) uid:1004 ino:118045263
sk:ffff92e9416a4d80 <->
[antdb@node1 ~]$ netstat -antpo|grep 105408
tcp 0 0 192.168.149.131:5432 192.168.149.100:44338 ESTABLISHED 105408/postgres: an keepalive (10.05/0/0)
可以看到keepalive 的情况。
查看数据库中tcp相关参数
[local]:5432 postgres@test=# select name,setting,reset_val from pg_settings where name ~ 'tcp';
name | setting | reset_val
-------------------------+---------+-----------
tcp_keepalives_count | 0 | 3
tcp_keepalives_idle | 0 | 60
tcp_keepalives_interval | 0 | 10
setting 列没有改变,感觉生效的是reset_val 列的数值,此处没有细研究。这三个参数的组合意思就是数据库对每个链接都会有计时器一样的操作,轮询计时,间隔tcp_keepalives_idle 这么长时间没有网络交互的话,数据库会发送心跳包给客户端,用tcpdump 可以抓取到
16:57:08.366531 IP 192.168.149.131.postgres > work.47230: Flags [.], ack 160, win 227, options [nop,nop,TS val 294551424 ecr 123129537], length 0
16:57:08.366555 IP work.47230 > 192.168.149.131.postgres: Flags [.], ack 316, win 239, options [nop,nop,TS val 123189623 ecr 294491338], length 0
16:58:08.398387 IP 192.168.149.131.postgres > work.47230: Flags [.], ack 160, win 227, options [nop,nop,TS val 294611456 ecr 123189623], length 0
16:58:08.398409 IP work.47230 > 192.168.149.131.postgres: Flags [.], ack 316, win 239, options [nop,nop,TS val 123249654 ecr 294491338], length 0
16:59:08.558872 IP 192.168.149.131.postgres > work.47230: Flags [.], ack 160, win 227, options [nop,nop,TS val 294671616 ecr 123249654], length 0
16:59:08.558907 IP work.47230 > 192.168.149.131.postgres: Flags [.], ack 316, win 239, options [nop,nop,TS val 123309815 ecr 294491338], length 0
客户端对心跳包做返回时,数据库就认为这不是一个废弃连接,也就不会进一步杀掉这个连接。
什么情况下数据库会主动杀掉连接呢,从心跳机制看数据库主动发给客户端的心跳超过设置的次数并且客户端一直没有反馈,该连接就会被杀掉。但怎么模拟客户端没有反馈呢,曾尝试各个级别的kill命令,和在代码里调用sleep函数,gdb close掉tcp socket 都没有模拟出来能力有限。决定把客户端的服务器网网卡关闭(用的虚拟机模拟),这样就可以模拟网络中断导致的通信失败了。
从命令观测到,在尝试了3次后,数据库把这个连接给杀掉了。
(Not all processes could be identified, non-owned process info
will not be shown, you would have to be root to see it all.)
tcp 0 0 192.168.149.131:5432 192.168.149.100:47752 ESTABLISHED 63841/postgres: ant keepalive (20.44/0/0)
[antdb@node1 ~]$ netstat -antpo|grep 63841
(Not all processes could be identified, non-owned process info
will not be shown, you would have to be root to see it all.)
tcp 0 0 192.168.149.131:5432 192.168.149.100:47752 ESTABLISHED 63841/postgres: ant keepalive (19.29/0/0)
[antdb@node1 ~]$ netstat -antpo|grep 63841
(Not all processes could be identified, non-owned process info
will not be shown, you would have to be root to see it all.)
tcp 0 0 192.168.149.131:5432 192.168.149.100:47752 ESTABLISHED 63841/postgres: ant keepalive (18.80/0/0)
[antdb@node1 ~]$ netstat -antpo|grep 63841
(Not all processes could be identified, non-owned process info
will not be shown, you would have to be root to see it all.)
tcp 0 0 192.168.149.131:5432 192.168.149.100:47752 ESTABLISHED 63841/postgres: ant keepalive (2.86/0/0)
[antdb@node1 ~]$ netstat -antpo|grep 63841
(Not all processes could be identified, non-owned process info
will not be shown, you would have to be root to see it all.)
tcp 0 0 192.168.149.131:5432 192.168.149.100:47752 ESTABLISHED 63841/postgres: ant keepalive (1.10/0/0)
[antdb@node1 ~]$ netstat -antpo|grep 63841
(Not all processes could be identified, non-owned process info
will not be shown, you would have to be root to see it all.)
tcp 0 0 192.168.149.131:5432 192.168.149.100:47752 ESTABLISHED 63841/postgres: ant keepalive (0.29/0/0)
[antdb@node1 ~]$ netstat -antpo|grep 63841
(Not all processes could be identified, non-owned process info
will not be shown, you would have to be root to see it all.)
tcp 0 0 192.168.149.131:5432 192.168.149.100:47752 ESTABLISHED 63841/postgres: ant keepalive (9.45/0/1)
[antdb@node1 ~]$ netstat -antpo|grep 63841
(Not all processes could be identified, non-owned process info
will not be shown, you would have to be root to see it all.)
tcp 0 0 192.168.149.131:5432 192.168.149.100:47752 ESTABLISHED 63841/postgres: ant keepalive (6.29/0/2)
[antdb@node1 ~]$ netstat -antpo|grep 63841
(Not all processes could be identified, non-owned process info
will not be shown, you would have to be root to see it all.)
tcp 0 0 192.168.149.131:5432 192.168.149.100:47752 ESTABLISHED 63841/postgres: ant keepalive (4.47/0/2)
[antdb@node1 ~]$ netstat -antpo|grep 63841
(Not all processes could be identified, non-owned process info
will not be shown, you would have to be root to see it all.)
tcp 0 0 192.168.149.131:5432 192.168.149.100:47752 ESTABLISHED 63841/postgres: ant keepalive (0.30/0/2)
[antdb@node1 ~]$ netstat -antpo|grep 63841
(Not all processes could be identified, non-owned process info
will not be shown, you would have to be root to see it all.)
tcp 0 0 192.168.149.131:5432 192.168.149.100:47752 ESTABLISHED 63841/postgres: ant keepalive (8.72/0/3)
[antdb@node1 ~]$ netstat -antpo|grep 63841
(Not all processes could be identified, non-owned process info
will not be shown, you would have to be root to see it all.)
tcp 0 0 192.168.149.131:5432 192.168.149.100:47752 ESTABLISHED 63841/postgres: ant keepalive (3.19/0/3)
[antdb@node1 ~]$ netstat -antpo|grep 63841
(Not all processes could be identified, non-owned process info
will not be shown, you would have to be root to see it all.)
数据库日志中能看到断开连接。
2020-07-30 17:55:32.955 CST,,,63841,"192.168.149.100:47752",5f229914.f961,1,"",2020-07-30 17:55:32 CST,,0,LOG,00000,"connection received: host=192.168.149.100 port=47752",,,,,,,,,""
2020-07-30 17:55:32.957 CST,"antdb","test",63841,"192.168.149.100:47752",5f229914.f961,2,"authentication",2020-07-30 17:55:32 CST,3/2434,0,LOG,00000,"connection authorized: user=antdb database=test",,,,,,,,,""
2020-07-30 17:57:03.087 CST,"antdb","test",63841,"192.168.149.100:47752",5f229914.f961,3,"idle",2020-07-30 17:55:32 CST,3/0,0,LOG,XX000,"could not receive data from client: Connection timed out",,,,,,,,,"psql"
2020-07-30 17:57:03.087 CST,"antdb","test",63841,"192.168.149.100:47752",5f229914.f961,4,"idle",2020-07-30 17:55:32 CST,,0,LOG,00000,"disconnection: session time: 0:01:30.131 user=antdb database=test host=192.168.149.100 port=47752",,,,,,,,,"psql"
根据模拟出来的日志在生产系统中过滤确实有相同的日志,基本可以断定该错误是因为心跳超时数据库主动杀掉连接导致的,但生产系统为什么会出现心跳超时错误呢,生产系统应用是通过F5负载均衡连接数据库的,初步分析是F5对数据库发出的心跳包不再回应导致断联。
Connection reset by peer
could not receive data from client: Connection reset by peer
这样的日志有别于time out错误,虽然客户端看到的都是server closed the connection unexpectedly,但数据库日志记录中这个是另外的错误。Connection reset by peer是一个典型的tcp 网络连接的问题,当tcp正产建立连接后,如果一端主动端开连接(发送rst 网络包等),另一侧就会收到报错,这个错误不是数据库主动断开的,所以在应用侧针对连接抓包,用wareshark分析一下,可以看到F5主动发了rst信号给连接,将其断掉了。
No. Time Source Destination Protocol Length Info
1 0.000000 10.159.101.45 10.154.52.156 TCP 77 58721 → 15432 [PSH, ACK] Seq=1 Ack=1 Win=19561 Len=11 TSval=899685638 TSecr=3959865574
2 0.001125 10.154.52.156 10.159.101.45 TCP 66 15432 → 58721 [ACK] Seq=1 Ack=12 Win=65535 Len=0 TSval=3960160103 TSecr=899685638
3 0.004199 10.154.52.156 10.159.101.45 TCP 83 15432 → 58721 [PSH, ACK] Seq=1 Ack=12 Win=65535 Len=17 TSval=3960160106 TSecr=899685638
4 0.004215 10.159.101.45 10.154.52.156 TCP 66 58721 → 15432 [ACK] Seq=12 Ack=18 Win=19561 Len=0 TSval=899685642 TSecr=3960160106
5 0.004390 10.159.101.45 10.154.52.156 TCP 1514 58721 → 15432 [ACK] Seq=12 Ack=18 Win=19561 Len=1448 TSval=899685642 TSecr=3960160106
6 0.004398 10.159.101.45 10.154.52.156 TCP 209 58721 → 15432 [PSH, ACK] Seq=1460 Ack=18 Win=19561 Len=143 TSval=899685642 TSecr=3960160106
7 0.005488 10.154.52.156 10.159.101.45 TCP 66 15432 → 58721 [ACK] Seq=18 Ack=1603 Win=65535 Len=0 TSval=3960160108 TSecr=899685642
8 0.078162 10.154.52.156 10.159.101.45 TCP 314 15432 → 58721 [PSH, ACK] Seq=18 Ack=1603 Win=65535 Len=248 TSval=3960160180 TSecr=899685642
9 0.078262 10.159.101.45 10.154.52.156 TCP 7306 58721 → 15432 [ACK] Seq=1603 Ack=266 Win=20823 Len=7240 TSval=899685716 TSecr=3960160180
10 0.078278 10.159.101.45 10.154.52.156 TCP 7306 58721 → 15432 [ACK] Seq=8843 Ack=266 Win=20823 Len=7240 TSval=899685716 TSecr=3960160180
11 0.079582 10.154.52.156 10.159.101.45 TCP 66 15432 → 58721 [ACK] Seq=266 Ack=4499 Win=65535 Len=0 TSval=3960160182 TSecr=899685716
12 0.079590 10.154.52.156 10.159.101.45 TCP 66 15432 → 58721 [ACK] Seq=266 Ack=7395 Win=65535 Len=0 TSval=3960160182 TSecr=899685716
13 0.079595 10.159.101.45 10.154.52.156 TCP 7306 58721 → 15432 [ACK] Seq=16083 Ack=266 Win=20823 Len=7240 TSval=899685717 TSecr=3960160182
14 0.079616 10.154.52.156 10.159.101.45 TCP 66 15432 → 58721 [ACK] Seq=266 Ack=10291 Win=65535 Len=0 TSval=3960160182 TSecr=899685716
15 0.079620 10.154.52.156 10.159.101.45 TCP 66 15432 → 58721 [ACK] Seq=266 Ack=13187 Win=65535 Len=0 TSval=3960160182 TSecr=899685716
16 0.079624 10.159.101.45 10.154.52.156 TCP 7306 58721 → 15432 [ACK] Seq=23323 Ack=266 Win=20823 Len=7240 TSval=899685717 TSecr=3960160182
17 0.079630 10.154.52.156 10.159.101.45 TCP 66 15432 → 58721 [ACK] Seq=266 Ack=16083 Win=65535 Len=0 TSval=3960160182 TSecr=899685716
18 0.079633 10.159.101.45 10.154.52.156 TCP 7306 58721 → 15432 [ACK] Seq=30563 Ack=266 Win=20823 Len=7240 TSval=899685717 TSecr=3960160182
19 0.081022 10.154.52.156 10.159.101.45 TCP 66 15432 → 58721 [ACK] Seq=266 Ack=18979 Win=65535 Len=0 TSval=3960160183 TSecr=899685717
20 0.081077 10.154.52.156 10.159.101.45 TCP 66 15432 → 58721 [ACK] Seq=266 Ack=21875 Win=65535 Len=0 TSval=3960160183 TSecr=899685717
21 0.081089 10.159.101.45 10.154.52.156 TCP 1514 58721 → 15432 [ACK] Seq=37803 Ack=266 Win=20823 Len=1448 TSval=899685719 TSecr=3960160183
22 0.081126 10.159.101.45 10.154.52.156 TCP 988 58721 → 15432 [PSH, ACK] Seq=39251 Ack=266 Win=20823 Len=922 TSval=899685719 TSecr=3960160183
23 0.081132 10.154.52.156 10.159.101.45 TCP 66 15432 → 58721 [ACK] Seq=266 Ack=24771 Win=65535 Len=0 TSval=3960160183 TSecr=899685717
24 0.081136 10.154.52.156 10.159.101.45 TCP 66 15432 → 58721 [ACK] Seq=266 Ack=27667 Win=65535 Len=0 TSval=3960160183 TSecr=899685717
25 0.081140 10.154.52.156 10.159.101.45 TCP 66 15432 → 58721 [ACK] Seq=266 Ack=30563 Win=65535 Len=0 TSval=3960160183 TSecr=899685717
26 0.081144 10.154.52.156 10.159.101.45 TCP 66 15432 → 58721 [ACK] Seq=266 Ack=33459 Win=65535 Len=0 TSval=3960160183 TSecr=899685717
27 0.081147 10.154.52.156 10.159.101.45 TCP 66 15432 → 58721 [ACK] Seq=266 Ack=36355 Win=65535 Len=0 TSval=3960160183 TSecr=899685717
28 0.082225 10.154.52.156 10.159.101.45 TCP 66 15432 → 58721 [ACK] Seq=266 Ack=39251 Win=65535 Len=0 TSval=3960160184 TSecr=899685717
29 0.082251 10.154.52.156 10.159.101.45 TCP 66 15432 → 58721 [ACK] Seq=266 Ack=40173 Win=65535 Len=0 TSval=3960160184 TSecr=899685719
30 0.104067 10.154.52.156 10.159.101.45 TCP 85 15432 → 58721 [PSH, ACK] Seq=266 Ack=40173 Win=65535 Len=19 TSval=3960160206 TSecr=899685719
31 0.104154 10.159.101.45 10.154.52.156 TCP 78 58721 → 15432 [PSH, ACK] Seq=40173 Ack=285 Win=20823 Len=12 TSval=899685742 TSecr=3960160206
32 0.105258 10.154.52.156 10.159.101.45 TCP 66 15432 → 58721 [ACK] Seq=285 Ack=40185 Win=65535 Len=0 TSval=3960160207 TSecr=899685742
33 0.313198 10.154.52.156 10.159.101.45 TCP 84 15432 → 58721 [PSH, ACK] Seq=285 Ack=40185 Win=65535 Len=18 TSval=3960160415 TSecr=899685742
34 0.324214 10.159.101.45 10.154.52.156 TCP 77 58721 → 15432 [PSH, ACK] Seq=40185 Ack=303 Win=20823 Len=11 TSval=899685962 TSecr=3960160415
35 0.325344 10.154.52.156 10.159.101.45 TCP 66 15432 → 58721 [ACK] Seq=303 Ack=40196 Win=65535 Len=0 TSval=3960160427 TSecr=899685962
36 0.337175 10.154.52.156 10.159.101.45 TCP 83 15432 → 58721 [PSH, ACK] Seq=303 Ack=40196 Win=65535 Len=17 TSval=3960160439 TSecr=899685962
37 0.337250 10.159.101.45 10.154.52.156 TCP 1514 58721 → 15432 [ACK] Seq=40196 Ack=320 Win=20823 Len=1448 TSval=899685975 TSecr=3960160439
38 0.337258 10.159.101.45 10.154.52.156 TCP 209 58721 → 15432 [PSH, ACK] Seq=41644 Ack=320 Win=20823 Len=143 TSval=899685975 TSecr=3960160439
39 0.338360 10.154.52.156 10.159.101.45 TCP 66 15432 → 58721 [ACK] Seq=320 Ack=41787 Win=65535 Len=0 TSval=3960160440 TSecr=899685975
40 0.390031 10.154.52.156 10.159.101.45 TCP 314 15432 → 58721 [PSH, ACK] Seq=320 Ack=41787 Win=65535 Len=248 TSval=3960160492 TSecr=899685975
41 0.390156 10.159.101.45 10.154.52.156 TCP 4410 58721 → 15432 [ACK] Seq=41787 Ack=568 Win=22085 Len=4344 TSval=899686028 TSecr=3960160492
42 0.390169 10.159.101.45 10.154.52.156 TCP 90 58721 → 15432 [PSH, ACK] Seq=46131 Ack=568 Win=22085 Len=24 TSval=899686028 TSecr=3960160492
43 0.391292 10.154.52.156 10.159.101.45 TCP 66 15432 → 58721 [ACK] Seq=568 Ack=44683 Win=65535 Len=0 TSval=3960160493 TSecr=899686028
44 0.391301 10.154.52.156 10.159.101.45 TCP 66 15432 → 58721 [ACK] Seq=568 Ack=46155 Win=65535 Len=0 TSval=3960160493 TSecr=899686028
45 0.416738 10.154.52.156 10.159.101.45 TCP 84 15432 → 58721 [PSH, ACK] Seq=568 Ack=46155 Win=65535 Len=18 TSval=3960160519 TSecr=899686028
46 0.416830 10.159.101.45 10.154.52.156 TCP 78 58721 → 15432 [PSH, ACK] Seq=46155 Ack=586 Win=22085 Len=12 TSval=899686055 TSecr=3960160519
47 0.417929 10.154.52.156 10.159.101.45 TCP 66 15432 → 58721 [ACK] Seq=586 Ack=46167 Win=65535 Len=0 TSval=3960160520 TSecr=899686055
48 0.622674 10.154.52.156 10.159.101.45 TCP 84 15432 → 58721 [PSH, ACK] Seq=586 Ack=46167 Win=65535 Len=18 TSval=3960160725 TSecr=899686055
49 0.662720 10.159.101.45 10.154.52.156 TCP 66 58721 → 15432 [ACK] Seq=46167 Ack=604 Win=22085 Len=0 TSval=899686301 TSecr=3960160725
50 5.646245 10.159.101.45 10.154.52.156 TCP 77 58721 → 15432 [PSH, ACK] Seq=46167 Ack=604 Win=22085 Len=11 TSval=899691284 TSecr=3960160725
51 5.647397 10.154.52.156 10.159.101.45 TCP 66 15432 → 58721 [ACK] Seq=604 Ack=46178 Win=65535 Len=0 TSval=3960165749 TSecr=899691284
52 5.650477 10.154.52.156 10.159.101.45 TCP 83 15432 → 58721 [PSH, ACK] Seq=604 Ack=46178 Win=65535 Len=17 TSval=3960165753 TSecr=899691284
53 5.650490 10.159.101.45 10.154.52.156 TCP 66 58721 → 15432 [ACK] Seq=46178 Ack=621 Win=22085 Len=0 TSval=899691288 TSecr=3960165753
54 5.650699 10.159.101.45 10.154.52.156 TCP 1514 58721 → 15432 [ACK] Seq=46178 Ack=621 Win=22085 Len=1448 TSval=899691289 TSecr=3960165753
55 5.650725 10.159.101.45 10.154.52.156 TCP 209 58721 → 15432 [PSH, ACK] Seq=47626 Ack=621 Win=22085 Len=143 TSval=899691289 TSecr=3960165753
56 5.651861 10.154.52.156 10.159.101.45 TCP 66 15432 → 58721 [ACK] Seq=621 Ack=47769 Win=65535 Len=0 TSval=3960165754 TSecr=899691289
57 5.655522 10.154.52.156 10.159.101.45 TCP 314 15432 → 58721 [PSH, ACK] Seq=621 Ack=47769 Win=65535 Len=248 TSval=3960165758 TSecr=899691289
58 5.655759 10.159.101.45 10.154.52.156 TCP 1290 58721 → 15432 [PSH, ACK] Seq=47769 Ack=869 Win=23347 Len=1224 TSval=899691294 TSecr=3960165758
59 5.656885 10.154.52.156 10.159.101.45 TCP 66 15432 → 58721 [ACK] Seq=869 Ack=48993 Win=65535 Len=0 TSval=3960165759 TSecr=899691294
60 5.663736 10.154.52.156 10.159.101.45 TCP 84 15432 → 58721 [PSH, ACK] Seq=869 Ack=48993 Win=65535 Len=18 TSval=3960165766 TSecr=899691294
61 5.663830 10.159.101.45 10.154.52.156 TCP 78 58721 → 15432 [PSH, ACK] Seq=48993 Ack=887 Win=23347 Len=12 TSval=899691302 TSecr=3960165766
62 5.664924 10.154.52.156 10.159.101.45 TCP 66 15432 → 58721 [ACK] Seq=887 Ack=49005 Win=65535 Len=0 TSval=3960165767 TSecr=899691302
63 5.799961 10.154.52.156 10.159.101.45 TCP 84 15432 → 58721 [PSH, ACK] Seq=887 Ack=49005 Win=65535 Len=18 TSval=3960165902 TSecr=899691302
64 5.839732 10.159.101.45 10.154.52.156 TCP 66 58721 → 15432 [ACK] Seq=49005 Ack=905 Win=23347 Len=0 TSval=899691478 TSecr=3960165902
65 306.902943 10.154.52.156 10.159.101.45 TCP 60 15432 → 58721 [RST, ACK] Seq=905 Ack=49005 Win=65535 Len=0
解决方案是修改F5的reset 连接参数,或者在应用端加入重试机制。
操作系统心跳参数和数据库心跳参数关系
将操作系统心跳改小
[root@node1 ~]# sysctl -a|grep keepalive
net.ipv4.tcp_keepalive_intvl = 10
net.ipv4.tcp_keepalive_probes = 3
net.ipv4.tcp_keepalive_time = 60
数据库心跳改大
[local]:5432 antdb@test=# select setting,reset_val from pg_settings where name ~ 'tcp';
setting | reset_val
---------+-----------
0 | 3
0 | 300
0 | 30
(3 rows)
观察数据库连接心跳管理,可以观察到是以数据库设置为准的。
[antdb@node1 pg_log]$ netstat -antpo|grep 122214
(Not all processes could be identified, non-owned process info
will not be shown, you would have to be root to see it all.)
tcp 0 0 192.168.149.131:5432 192.168.149.100:47904 ESTABLISHED 122214/postgres: an keepalive (281.39/0/0)
[antdb@node1 pg_log]$ netstat -antpo|grep 122214
(Not all processes could be identified, non-owned process info
will not be shown, you would have to be root to see it all.)
tcp 0 0 192.168.149.131:5432 192.168.149.100:47904 ESTABLISHED 122214/postgres: an keepalive (271.24/0/0)
同时在客户端抓包,发现确实是以数据库的心跳参数为准。
root@work ~]# tcpdump -i ens33 tcp port 47904
tcpdump: verbose output suppressed, use -v or -vv for full protocol decode
listening on ens33, link-type EN10MB (Ethernet), capture size 262144 bytes
18:20:24.590429 IP 192.168.149.131.postgres > work.47904: Flags [.], ack 2175538745, win 227, options [nop,nop,TS val 299547648 ecr 127884958], length 0
18:20:24.590492 IP work.47904 > 192.168.149.131.postgres: Flags [.], ack 1, win 239, options [nop,nop,TS val 128185847 ecr 299246720], length 0
数据库日志查看连接开始时间,和首次心跳发送时间间隔正好是数据库心跳设置数值。
2020-07-30 18:15:23.659 CST,,,122214,"192.168.149.100:47904",5f229dbb.1dd66,1,"",2020-07-30 18:15:23 CST,,0,LOG,00000,"connection received: host=192.168.149.100 port=47904",,,,,,,,,""
2020-07-30 18:15:23.661 CST,"antdb","test",122214,"192.168.149.100:47904",5f229dbb.1dd66,2,"authentication",2020-07-30 18:15:23 CST,3/2455,0,LOG,00000,"connection authorized: user=antdb database=test",,,,,,,,,""
所以结论是数据库连接由数据库心跳参数管理。