tcp_keepalive_time and rst Flag in NAT Environment

Here, I'm not going to explain the details of what is TCP keepalive, what are the 3 related parameters tcp_keepalive_time, tcp_keepalive_intvl, tcp_keepalive_probes mean.
You need to know, the default value of these 3 parameters with net.ipv4.tcp_keepalive_time = 7200, net.ipv4.tcp_keepalive_intvl = 75, net.ipv4.tcp_keepalive_probes = 9, especially the first one, tcp_keepalive_time may and usually will cause some nasty behavior to you in some environment like NAT.

Let's say, your client now wants to connect to server, via one or more nat in the middle way, whatever SNAT or DNAT. The nat server, no matter its' a router or a Linux based device with ip_forward on, need to maintain a TCP connections mapping table to track each incoming and outcoming connection, since the device's resource is limited, the table can't grow as large as it wants, so it needs to drop some connections which are idle for a period of time with no data change between client and server.

This phenomenon is quite common, not only in consumer oriented/low end/poor implementation router, but also in data center NAT servers. For most of these NAT server, their tcp_keepalive_time is usually set to 90s or around, so, if your client set the parameter larger than 90, and have no data communications for more than 90s, the NAT server will send a TCP package with RST flags to disconnect the both end.

In this situation, one workaround is to lower the keepalive interval than that, like 30s, if you can't control the NAT server.

Recently, when we sync data from MongoDB Master/Slave, note, the architecture is a little old, not the standard replica. We often observe a large amount of errors indicating the failure of synchronisation during creating index process after finishing synchronizing the MongoDB data. Below log is get from our slave server:

Fri Nov 14 12:52:00.356 [replslave] Socket recv() errno:110 Connection timed out 100.80.1.26:27018
Fri Nov 14 12:52:00.356 [replslave] SocketException: remote: 100.80.1.26:27018 error: 9001 socket exception [RECV_ERROR] server [100.80.1.26:27018]
Fri Nov 14 12:52:00.356 [replslave] DBClientCursor::init call() failedFri Nov 14 12:52:00.356 [replslave] repl: AssertionException socket error for mapping query
Fri Nov 14 12:52:00.356 [replslave] Socket flush send() errno:104 Connection reset by peer 100.80.1.26:27018
Fri Nov 14 12:52:00.356 [replslave]   caught exception (socket exception [SEND_ERROR] for 100.80.1.26:27018) in destructor (~PiggyBackData)
Fri Nov 14 12:52:00.356 [replslave] repl: sleep 2 sec before next pass
Fri Nov 14 12:52:02.356 [replslave] repl: syncing from host:100.80.1.26:27018
Fri Nov 14 12:52:03.180 [replslave] An earlier initial clone of 'sso_production' did not complete, now resyncing.
Fri Nov 14 12:52:03.180 [replslave] resync: dropping database sso_production
Fri Nov 14 12:52:04.272 [replslave] resync: cloning database sso_production to get an initial copy

100.80.1.26 is DNAT IP address for MongoDB master. As you can see, the slave receives reset from 100.80.1.26.

After some package capture and analysis, we get the conclusion with the root cause of tcp_keepalive_time related. If you are sensitive, you may consider TCP stack related issues when see "reset by peer" keyword at first glance.

Be careful, the same issue also occurs in some IMAP proxy, IPVS envionment, etc.  After many years hands-on online operations experience, I have to say, DNS and NAT are two potential threats of various issues, even in some so-called 100% uptime environment.

  • oliverhuang

    对于mongo replicaset,net.ipv4.tcp_keepalive_time , net.ipv4.tcp_keepalive_intvl , net.ipv4.tcp_keepalive_probes等值要设置为多少,楼主有建议吗?谢谢!