/tcpcopy

an online request replication tool, fit for online testing, stress testing, performance evaluation,etc

Primary LanguageShellOtherNOASSERTION

Name:
    tcpcopy
    It is an online TCP duplication tool and can be used for testing (using netlink and raw sockets).


Description:
    It can help you find bugs without deploying your server software on your production servers. 
    It can also be used to do smoke testing against your products.

    For example, when you want to migrate your web server from Apache to Nginx, tcpcopy can help you test
    it. Apache is running online, while tcpcopy can copy the TCP flows from Apache to Nginx. To Nginx, the
    TCP flows are just forwarding to it. This will not affect Apache at all except cost a little network 
    bandwidth and CPU load.


Scenarios:
    1) Distributed stress testing
       Use tcpcopy to copy real-world data to stress test your server software. Bugs that only can be
       produced in high-stress situations can be found.

    2) Hot backup
       It is very suitable for backup tasks if connections are short-lived. The request loss ratio is
       quite low (e.g. 0.000001).

    3) Online testing
       Prove the new system is stable and find bugs that only occur in the real world.

    4) Benchmark
       Do performance benchmark. For instance, you can use tcpcopy to compare the performance of Apache 
       and Nginx.


Usage:
    1) Install
       a) download the source code from github:
          git clone https://github.com/wangbin579/tcpcopy
       b) ./configure
       c) make
       d) make install

    2) Run:
       a) on the source host (root privilege is required):
          ./tcpcopy local_ip1[:local_ip2:...] local_port remote_ip remote_port 
 
       b) on the target host (root privilege is required):
          modprobe ip_queue # if not running
          iptables -I OUTPUT -p tcp --sport port -j QUEUE # if not set
          ./interception 

Example:
    Suppose there are two online hosts, 1.2.3.13 and 1.2.3.14. And 1.2.3.148 is the target host (similar to 
    the online hosts). Port 12321 is used both as local port and remote port. We use tcpcopy to test if
    1.2.3.148 can process 2X requests than a host can serve.
    
    Here we use tcpcopy to perform the above test task.
    
    1) on the target host (1.2.3.148)
       # modprobe ip_queue # if not run up
       # iptables -I OUTPUT -p tcp --sport 12321 -j QUEUE # if not set
       # ./interception

    2) online host (1.2.3.13)
       # ./tcpcopy 1.2.3.13 12321 1.2.3.148 12321

    3) online host(1.2.3.14)
       # ./tcpcopy 1.2.3.14 12321 1.2.3.148 12321

    CPU load and memory usage is as follows:
       1.2.3.13:
           11124 adrun 15 0 193m 146m 744 S 18.6 7.3 495:31.56 asyn_server
           11281 root 15 0 65144 40m 1076 S 12.3 2.0 0:47.89 tcpcopy
       1.2.3.14:
           16855 adrun 15 0 98.7m 55m 744 S 21.6 2.7 487:49.51 asyn_server
           16429 root 15 0 41156 17m 1076 S 14.0 0.9 0:33.63 tcpcopy
       1.2.3.148:
           25609 root 15 0 76892 59m 764 S 49.6 2.9 63:03.14 asyn_server
           20184 root 15 0 5624 4232 292 S 17.0 0.2 0:52.82 interception

    Access log analysis:
       1.2.3.13:
           $ grep 'Tue 11:08' access_0913_11.log | wc -l
             89316,  1489 reqs/sec
       1.2.3.14:
           $ grep 'Tue 11:08' access_0913_11.log | wc -l
             89309,  1488 reqs/sec
       1.2.3.148:
           $ grep 'Tue 11:08' access_0913_11.log | wc -l
             178175, 2969 reqs/sec
       request loss ratio:
           (89316 + 89309 - 178175) / (89316 + 89309) = 0.25%

    Clearly, the target host can process 2X of requests a source host can serve.

    How is the CPU load? Well, tcpcopy on online host 1.2.3.13 used 12.3%, host 1.2.3.14 
    used 14%, while interception on the target host consumed about 17%. We can see that
    the CPU load is low here, and so is the memory usage.


Note:
    1) It is tested on Linux only (kernal 2.6 or above).
    2) Tcpcopy may lose packets hence lose requests.
    3) If you use tcpcopy to duplicate local requests, please keep lo MTU not more than 1500.
    4) Interception (tcpcopy server) is single-threaded now.
    5) Root privilege is required.
    6) Long-lived requests (such as uploading a large file) does not work well (no retransmission
       when sending packets to the target test server)
    7) Check error.log if you encounter some problem and feel free to report it to us
       (163.beijing@gmail.com or wangbin579@gmail.com).