Router Matters

We are transfering PBs of our HDFS data from one data center to another via a router, we never thought the performance of a router will becomes the bottleneck until we find the below statistic:

#show interfaces Gi0/1
GigabitEthernet0/1 is up, line protocol is up
  Hardware is iGbE, address is 7c0e.cece.dc01 (bia 7c0e.cece.dc01)
  Description: Connect-Shanghai-MSTP
  Internet address is 10.143.67.18/30
  MTU 1500 bytes, BW 1000000 Kbit/sec, DLY 10 usec,
     reliability 255/255, txload 250/255, rxload 3/255
  Encapsulation ARPA, loopback not set
  Keepalive set (10 sec)
  Full Duplex, 1Gbps, media type is ZX
  output flow-control is unsupported, input flow-control is unsupported
  ARP type: ARPA, ARP Timeout 04:00:00
  Last input 00:00:06, output 00:00:00, output hang never
  Last clearing of "show interface" counters 1d22h
  Input queue: 0/75/0/6 (size/max/drops/flushes); Total output drops: 22559915
  Queueing strategy: fifo
  Output queue: 39/40 (size/max)

The output queue is full, hence the txload is obviously high.

How awful it is. At the beginning, we found there were many failures or retransmissions during the transfer between two data centers. After adding some metrics, everything is clear, the latency between two data centers is quite unstable, sometimes around 30ms, and sometimes reaches 100ms or even more which is unacceptable for some latency sensitive application. we then ssh into the router and found the above result.

After that, we drop it and replace it with a more advanced one, now, everything returns to normal, latency is around 30ms, packet drop is below 1%.

Trips to HCMC, Nha Trang, Phnom Penh and Siem Reap

Last month, Kiki and I went for a vacation, here are some experience and tips.

For HCMC, it’s quite modern, as invaded by France before, it still exits a lot culture from France, like iced coffee and French baguette, both are delicious and cheap. HCMC is also famous for its French cuisine, we went to one of the best French restaurants in HCMC, but not so impressed, the only impression is so many ants on the desk, maybe we don’t know how to enjoy French cuisine.

Later, we flew to Nha Trang, the beach is fancy, we spent a few nights in Sheraton, it’s location is really good, walking distance to most of the restaurants, and Lanterns restaurant is one of the best we ever met, authentic Vietnamese cuisine, we went there many time, from breakfast, launch to dinner. We were too busy eating and drinking to enjoy the free beach chair, umbrella and the best public beach sections in Nha Trang provided by hotel, later we regretted. After that, we went to mia, the so called best resort in Nha Trang, generally speaking, its still better than others, but didn’t give us as much as we expected. It’s location is far aways from city center, and nothing when step out of the resort, so you need to take the shuttle bus to go into the city or just stay in the resort, enjoying its private beach. I had to say, the beach here is even better than those in the city, since the whole resort only holds about two dozens of customers, the beach is quiet and clean at any time, you can play canoe and surfing, but the food here is just the average level with top price. For drinks, they have happy hours, buy one get one free, so Kiki and I ordered 4 cups of mojito, watching the night sea view, till midnight, nobody but we two, totally drunk.

As saied before, the coffee here is wonderful and price is reasonable, so just pick up one when you need a rest. Also the mongo shake is worth trying, we tried dozens cups of shake during the staying in Vietnam, even later in Cambodia, we still ordered the same, most of them are between $1 to $2, enjoy. So many Russians, there are probably three languages in the city, Vietnamese, English and Russian.

We had a terrible experience while taking the Mai linh taxi. The first day when we arrived at HCMC airport in the midnight due to flight delay, we halled a Mai linh, and told him by meter, but when he took us to the hotel, he forced us to pay 100, 000VND, which was 30RMB, just a 5 minutes tour, we tried to explained to him, the only word he said is one hundred, well, we surrendered. Later, we only hall vinasun taxi, which was quite fair, by meter by default.

Continue reading

How Many Non-Persistent Connections Can Nginx/Tengine Support Concurrently

Recently, I took over a product line which has horrible performance issues and our customers complain a lot. The architecture is qute simple, clients, which are SDKs installed in our customers' handsets send POST requests to a cluster of Tengine servers, via a cluster of IPVS load balancers, actually the Tengine is a highly customized Nginx server, it comes with tons of handy features. then, the Tengine proxy redirects the requests to the upstream servers, after some computations, the app servers will send the results to MongoDB.

When using curl to send a POST request like this:
$ curl -X POST -H "Content-Type: application/json" -d '{"name": "jaseywang","sex": "1","birthday": "19990101"}' http://api.jaseywang.me/person -v

Every 10 tries, you'll probably get 8 or even more failure responses with "connection timeout" back.

After some basic debugging, I find that Nginx is quite abnormal with TCP accept queue totally full, therefore, it's explainable that the client get unwanted response. The CPU and memory util is acceptable, but the networking goes wild cuz packets nics received is extremely tremendous, about 300kpps average, 500kpps at peak times, since the package number is so large, the interrupts is correspondingly quite high. Fortunately, these Nginx servers all ship with 10G network cards, with mode 4 bonding, some link layer and lower level are also pre-optimizated like TSO/GSO/GRO/LRO, ring buffer size, etc. I havn't seen any package drop/overrun using ifconfig or similar tools.

After some package capturing, I found more teffifying facts, almost all of the incoming packagea are less than 75Byte. Most of these package are non-persistent. They start the 3-way handshake, send one or a few more usually less than 2 if need TCP sengment HTTP POST request, and exist with 4-way handshake. Besides that, these clients usually resend the same requests every 10 minutes or longer, the interval time is set by the app's developer and is beyond our control. That means:

1. More than 80% of the traffic are purely small TCP packages, which will have significant impact on network card and CPU. You can get the overall ideas about the percent with the below image. Actually, the percent is about 88%.

2. Since they can't keep the connections persistent, just TCP 3-wayhandshak, one or more POST, TCP 4-way handshake, quite simple. You have no way to reuse the connection. That's OK for network card and CPU, but it's a nightmare for Nginx, even I enlarge the backlog, the TCP accept queue quickly becomes full after reloading Nginx. The below two images show the a client packet's lifetime. The first one is the packet capture between load balance IPVS and Nginx, the second one is the communications between Nginx and upstream server.

3. It's quite expensive to set up a TCP connection, especially when Nginx runs out of resources. I can see during that period, the network traffic it quite large, but the new request connected per second is qute small compare to the normal. HTTP 1.0 needs client to specify Connection: Keep-Alive in the request header to enable persistent connection, and HTTP 1.1 enables by default. After turning off, not so much effect.

It now have many evidences shows that our 3 Nginx servers, yes, its' 3, we never thought Nginx will becomes bottleneck one day, are half broken in such a harsh environment. How mnay connections can it keep and how many new connections can it accept? I need to run some real traffic benchmarks to get the accurate result.

Since recovering the product is the top priority, I add another 6 Nginx servers with same configurations to IPVS. With 9 servers serving the online service, it behaves normal now. Each Nginx gets about 10K qps, with 300ms response time, zero TCP queue, and 10% CPU util.

The benchmark process is not complex, remove one Nginx server(real server) from IPVS at a time, and monitor its metrics like qps, response time, TCP queue, and CPU/memory/networking/disk utils. When qps or similar metric don't goes up anymore and begins to turn round, that's usually the maximum power the server can support.

Before kicking off, make sure some key parameters or directives are setting correct, including Nginx worker_processes/worker_connections, CPU affinity to distrubute interrupts, and kernel parameter(tcp_max_syn_backlog/file-max/netdev_max_backlog/somaxconn, etc.). Keep an eye on the rmem_max/wmem_max, during my benchmark, I notice quite different results with different values.

Here are the results:
The best performance for a single server is 25K qps, however during that period, it's not so stable, I observe a almost full queue size in TCP and some connection failures during requesting the URI, except that, everything seems normal. A conservative value is about 23K qps. That comes with 300K TCP established connections, 200Kpps, 500K interrupts and 40% CPU util.
During that time, the total resource consumed from the IPVS perspective is 900K current connection, 600Kpps, 800Mbps, and 100K qps.
The above benchmark was tested during 10:30PM ~ 11:30PM, the peak time usually falls between 10:00PM to 10:30PM.

Turning off your access log to cut down the IO and timestamps in TCP stack may achieve better performance, I haven't tested.

Don't confused TCP keepalievd with HTTP keeplive, they are totally diffent concept. One thing to keep in mind is that, the client-LB-Nginx-upstream mode usually has a LB TCP sesstion timeout value with 90s by default. that means, when client sends a request to Nginx, if Nginx doesn't response within 90s to client, LB will disconnect both end TCP connection by sending rst package in order to save LB's resource and sometimes for security reasons. In this case, you can decrease TCP keepalive parameter to workround.

Setup a AQI(Air Quality Index) Monitoring System with Dylos and Raspberry Pi 2

I have been using air purifier for years in Beijing, China. So far so good except the only problem troubles me, is that effective? is my PM2.5 or PM10 reduced? the answer is probably obvious, but how effectively is it? Nobody knows.

At the morment, Dylos is the only manufacturer that provides consumer level product with accurate air quality index, so, not many choice. I got a Dylos air quality counter from Amazon, there are many products, if you want to export the index data from the black box into your desktop for later process , you'll need at least a DC 1100 pro with PC interface or higher version. Strongly not recommend buying from Taobao or the similar online stores, as far as I saw, None of them can provide the correct version, most of them exaggerate for a higher sale.

Now, half done. You need to Raspberry Pi, at the time of writing, Raspberry Pi 2 is coming to the market. I got a complete starter kit with a Pi 2 Model B, Clear Case, power supply, WiFi Dongle and a 8GB Micro SD.

In order to make the Raspberry Pi up, it's better to find a screen monitoring, or it will take huge pain to you.

After turning Dylos and Raspberry Pi on, the left process is quite simple. You need to connect the Dylos and Raspberry Pi with a serial to USB interface, the serial to USB cable is uncommon these days, if you are a network engineer, you should be quite familir with the cable, else, you can get one from online or somewhere else.

Now, write a tiny program to read the data from Dylos. You can make use of Python's pyserial module to read the data with a few lines. Here is mine. Besides that, you can use other language to implement, such as PHP. The interval depends on your Dylos collecting data interval, the minimum is 60s, ususally 3600s is enough.

Once got the data, you can feed them into different metric systems like Munin, or you can feed them into highcharts to get a more pretty look.

Installing Debian jessie on Loongson notebook(8089_B) with Ubuntu keyboard suits

I got a white yeeloong notebook last year, and it costs me 300RMB which ships with a Loongson 2F CPU, 1GB DDR2 memory bar, and a 160GB HHD.

The netbook has a pre-installed Liunx based operating system, I can't tell its distribution, and it looks quite like a Windows XP. Since then, I put it in my closet and never use it again.

Yesterday the untouched toy crossed my mind, so I took it out and spent a whole night to get Debian Jessie working on my yeeloong notebook. Here are some procedures you may need to note if you want to have your own box running.


At the beginning, I download the vmlinux and initrd.gz file from Debian mirror. I set up a TFTP server on my mac, ensure it's working from local. later I power on the notebook,  enter into the GMON screen, using ifaddr to assign a IP address manually, and it can ping to my mac, this means the networking works now. Problem comes since then, I execute load directive to load the vmlinux file, everytime after several minutes waiting, it shows connection timeout, after some time debuging, nonthing exceptions can be found since the tftp server and the connectivity between my mac and yeeloong both normal.

I give it up and find a USB stick, this time I'm going to put vmlinix and initird file into USB and let the notebook boot from USB. You have to check that the filesystem is formatted as ext2/ext3. Most importantly, the single partition must not be larger than 4GB, say, you have a 8G USB,  you need to format at least 3 partiitons, 3-3-2 is a good choice. If the partition is larger than 4GB, you can't even enter into PMON, and it just stalls here after you power on with USB attached to your notebook with "filesys_type() usb0 don't find" showing on the screen.


After entering into PMON, you should find your USB using devls directive, then type:
> bl -d ide (usb0, 0)/vmlinux
> bl -d ide (usb0, 0)/initird.gz

Don't copy directly, you need to figure out partitions location in your USB, maybe yours is (usb0, 1), and your vmlinux file is called vmlinux-3.14-2-loongson-2f.

Be patient, I waited about 10 minutes before both files loaded into memory successfully, press "g" to let it run. Haaaaam, there're fews steps to get a brand new toy.

The installation process now begins, press Yes/No, answer questions as your normal. at the end, I choose to install LXDE desktop environment. For me, I waited about 2 hours to finish the installation.

Now, reboot, and finally, it enters into LEVEL 3 without X because of some buggy tools.

Download the xserver-xorg-video-siliconmotion package from here, don't upset with the tool's version, you need some modifications to make it work on jessie.

Using dpkg to unpack the deb file, remove xorg-video-abi-12 from its control file in "Depends" section, repack the file, before dpkg -i, using apt-get to install the xserver-xorg-core file.

startx, you will get the final screen. The default network manager for LXDE is Wicd Network Manager which is fine, what if you want to use command line, you need to modify the interface file and some other steps before connected to internet. 

Wow, with Ubuntu keyborad suits, it's definitely a good combo.

How about the performance? No matter what I run, it always ends with CPU bound, and the average load with LXDE is around 1. That does not hurt, since it does not bother me too much, but It takes more than 3 minutes to boot up, probably slower than the majority of Windows users ;-)

Since the dummy box is terribly slooow, Why you buy the box? well, just for fun.