Trips to HCMC, Nha Trang, Phnom Penh and Siem Reap

Last month, Kiki and I went for a vacation, here are some experience and tips.

For HCMC, it’s quite modern, as invaded by France before, it still exits a lot culture from France, like iced coffee and French baguette, both are delicious and cheap. HCMC is also famous for its French cuisine, we went to one of the best French restaurants in HCMC, but not so impressed, the only impression is so many ants on the desk, maybe we don’t know how to enjoy French cuisine.

Later, we flew to Nha Trang, the beach is fancy, we spent a few nights in Sheraton, it’s location is really good, walking distance to most of the restaurants, and Lanterns restaurant is one of the best we ever met, authentic Vietnamese cuisine, we went there many time, from breakfast, launch to dinner. We were too busy eating and drinking to enjoy the free beach chair, umbrella and the best public beach sections in Nha Trang provided by hotel, later we regretted. After that, we went to mia, the so called best resort in Nha Trang, generally speaking, its still better than others, but didn’t give us as much as we expected. It’s location is far aways from city center, and nothing when step out of the resort, so you need to take the shuttle bus to go into the city or just stay in the resort, enjoying its private beach. I had to say, the beach here is even better than those in the city, since the whole resort only holds about two dozens of customers, the beach is quiet and clean at any time, you can play canoe and surfing, but the food here is just the average level with top price. For drinks, they have happy hours, buy one get one free, so Kiki and I ordered 4 cups of mojito, watching the night sea view, till midnight, nobody but we two, totally drunk.

As saied before, the coffee here is wonderful and price is reasonable, so just pick up one when you need a rest. Also the mongo shake is worth trying, we tried dozens cups of shake during the staying in Vietnam, even later in Cambodia, we still ordered the same, most of them are between $1 to $2, enjoy. So many Russians, there are probably three languages in the city, Vietnamese, English and Russian.

We had a terrible experience while taking the Mai linh taxi. The first day when we arrived at HCMC airport in the midnight due to flight delay, we halled a Mai linh, and told him by meter, but when he took us to the hotel, he forced us to pay 100, 000VND, which was 30RMB, just a 5 minutes tour, we tried to explained to him, the only word he said is one hundred, well, we surrendered. Later, we only hall vinasun taxi, which was quite fair, by meter by default.

Continue reading

How Many Non-Persistent Connections Can Nginx/Tengine Support Concurrently

Recently, I took over a product line which has horrible performance issues and our customers complain a lot. The architecture is qute simple, clients, which are SDKs installed in our customers' handsets send POST requests to a cluster of Tengine servers, via a cluster of IPVS load balancers, actually the Tengine is a highly customized Nginx server, it comes with tons of handy features. then, the Tengine proxy redirects the requests to the upstream servers, after some computations, the app servers will send the results to MongoDB.

When using curl to send a POST request like this:
$ curl -X POST -H "Content-Type: application/json" -d '{"name": "jaseywang","sex": "1","birthday": "19990101"}' -v

Every 10 tries, you'll probably get 8 or even more failure responses with "connection timeout" back.

After some basic debugging, I find that Nginx is quite abnormal with TCP accept queue totally full, therefore, it's explainable that the client get unwanted response. The CPU and memory util is acceptable, but the networking goes wild cuz packets nics received is extremely tremendous, about 300kpps average, 500kpps at peak times, since the package number is so large, the interrupts is correspondingly quite high. Fortunately, these Nginx servers all ship with 10G network cards, with mode 4 bonding, some link layer and lower level are also pre-optimizated like TSO/GSO/GRO/LRO, ring buffer size, etc. I havn't seen any package drop/overrun using ifconfig or similar tools.

After some package capturing, I found more teffifying facts, almost all of the incoming packagea are less than 75Byte. Most of these package are non-persistent. They start the 3-way handshake, send one or a few more usually less than 2 if need TCP sengment HTTP POST request, and exist with 4-way handshake. Besides that, these clients usually resend the same requests every 10 minutes or longer, the interval time is set by the app's developer and is beyond our control. That means:

1. More than 80% of the traffic are purely small TCP packages, which will have significant impact on network card and CPU. You can get the overall ideas about the percent with the below image. Actually, the percent is about 88%.

2. Since they can't keep the connections persistent, just TCP 3-wayhandshak, one or more POST, TCP 4-way handshake, quite simple. You have no way to reuse the connection. That's OK for network card and CPU, but it's a nightmare for Nginx, even I enlarge the backlog, the TCP accept queue quickly becomes full after reloading Nginx. The below two images show the a client packet's lifetime. The first one is the packet capture between load balance IPVS and Nginx, the second one is the communications between Nginx and upstream server.

3. It's quite expensive to set up a TCP connection, especially when Nginx runs out of resources. I can see during that period, the network traffic it quite large, but the new request connected per second is qute small compare to the normal. HTTP 1.0 needs client to specify Connection: Keep-Alive in the request header to enable persistent connection, and HTTP 1.1 enables by default. After turning off, not so much effect.

It now have many evidences shows that our 3 Nginx servers, yes, its' 3, we never thought Nginx will becomes bottleneck one day, are half broken in such a harsh environment. How mnay connections can it keep and how many new connections can it accept? I need to run some real traffic benchmarks to get the accurate result.

Since recovering the product is the top priority, I add another 6 Nginx servers with same configurations to IPVS. With 9 servers serving the online service, it behaves normal now. Each Nginx gets about 10K qps, with 300ms response time, zero TCP queue, and 10% CPU util.

The benchmark process is not complex, remove one Nginx server(real server) from IPVS at a time, and monitor its metrics like qps, response time, TCP queue, and CPU/memory/networking/disk utils. When qps or similar metric don't goes up anymore and begins to turn round, that's usually the maximum power the server can support.

Before kicking off, make sure some key parameters or directives are setting correct, including Nginx worker_processes/worker_connections, CPU affinity to distrubute interrupts, and kernel parameter(tcp_max_syn_backlog/file-max/netdev_max_backlog/somaxconn, etc.). Keep an eye on the rmem_max/wmem_max, during my benchmark, I notice quite different results with different values.

Here are the results:
The best performance for a single server is 25K qps, however during that period, it's not so stable, I observe a almost full queue size in TCP and some connection failures during requesting the URI, except that, everything seems normal. A conservative value is about 23K qps. That comes with 300K TCP established connections, 200Kpps, 500K interrupts and 40% CPU util.
During that time, the total resource consumed from the IPVS perspective is 900K current connection, 600Kpps, 800Mbps, and 100K qps.
The above benchmark was tested during 10:30PM ~ 11:30PM, the peak time usually falls between 10:00PM to 10:30PM.

Turning off your access log to cut down the IO and timestamps in TCP stack may achieve better performance, I haven't tested.

Don't confused TCP keepalievd with HTTP keeplive, they are totally diffent concept. One thing to keep in mind is that, the client-LB-Nginx-upstream mode usually has a LB TCP sesstion timeout value with 90s by default. that means, when client sends a request to Nginx, if Nginx doesn't response within 90s to client, LB will disconnect both end TCP connection by sending rst package in order to save LB's resource and sometimes for security reasons. In this case, you can decrease TCP keepalive parameter to workround.