netselect: A C repository from ml10

netselect 0.3
=============

This is netselect, an ultrafast intelligent parallelizing binary-search
implementation of "ping."

Now stop laughing and pay attention.

netselect determines several facts about all of the hosts given on the
command line, much faster you would if you manually tried to use ping and
traceroute.  For example, if I type:

	netselect -vv ftp.fceia.unr.edu.ar ftp.kulnet.kuleuven.ac.be \
			ftp.cdrom.com ftp.debian.org ftp.de.debian.org
			
It tells me this:

ftp.fceia.unr.edu.ar                  2792 ms  23 hops  100% ok ( 1/ 1) [ 9213]
ftp.kulnet.kuleuven.ac.be             9999 ms  30 hops    0% ok
ftp.cdrom.com                           94 ms   8 hops  100% ok (10/10) [  169]
ftp.debian.org                          46 ms  15 hops  100% ok (10/10) [  115]
ftp.de.debian.org                     9999 ms  30 hops    0% ok
  115 ftp.debian.org
  
For each host, it figures out the approximate ping time (though not as
accurately as "ping" does), the number of network "hops" to reach the
target, and the percentage of ping requests that got through successfully. 

The value in brackets is the "score" of each operational host based on these
values.  A lower score is better.  The last line shows the server with the
best score.  If we had not used '-vv' on the command line, only this last
line would have been printed.

Note that for ftp.kulnet.kuleuven.ac.be and ftp.de.debian.org in this case,
nothing got through at all.  That indicates that either the host doesn't
exist, or it is down.

For a bigger example, try netselect-apt to build your own sources.list for apt
with the (possibily) fastest debian mirror.


But Why?
========

Why do I want to know about my ping times to computers in Belgium?  Well,
the main reason for netselect -- and its name gives you a hint -- is to help
choose the "best" server for you from among a (possibly very large) list.

Starting with version 0.2, netselect can make these decisions for you using
its scoring mechanism.  If you want, however, you can still pass the raw
results and score the servers as you like.  Try this, for example:

	netselect -vv -s 0 $(cat <your_list_of_sites>)
	
The "-s 0" option disables printing of scores at the bottom of the list, and
"-vv" enables printing of the statistics.


How does it work?
=================

First:

 - decode each hostname into an IP address, and stores each IP address into
   a table.  In netselect 0.2, this code was rewritten to resolve hostnames
   much more quickly than before.
   
Now for all hosts at once:
   
 - start firing UDP packets with "random-guess" TTL values, much like
   traceroute does.  Actually, the code for this is derived from traceroute.
   
 - if an "ICMP TTL Expired" message comes back, then the TTL was too low:
   the host is farther away than that.  Increase TTL next time.  Otherwise,
   a "Port Unreachable" message comes back, meaning the TTL was large
   enough.  Try a smaller one.  We do this until we narrow down the TTL. 
   (This is where the "binary search" comes in.)
   
 - Meanwhile, collect timing statistics for all packets that reached the
   host.  Packets that don't come back are considered lost.

When all the hosts have had their TTL values narrowed down, and the "-t"
minimum tries have expired, we're done.  Close the sockets and dump
the statistics to stdout.



Command-line Options
====================

Not much right now.  

	-v	-- verbose mode.  Displays nameserver resolution messages to
		   stderr.  You probably want this so that you don't get
		   bored waiting for a hundred name resolutions to finish.
		   
	-vv	-- very verbose mode.  Displays nameserver resolution and
		   statistics (not just scores) to stderr and stdout.
		   
	-vvv	-- very very verbose mode.  Everything -vv prints, plus
		   print every packet received as it happens.  Good for
		   debugging or trying to figure out how it works.
	
	-vvvv   -- very very very verbose mode. Everything -vvv prints,
		   plus a trace of all packets sent.
		   
	-m #	-- maximum ttl.  Don't accept hosts with more hops than
		   this.
		   
	-t #	-- make sure at least 50% of the hosts get tested with this
		   many packets.  The more packets you use, the more
		   accurate the results... and the longer it takes to run.
		   The default is 10, which is usually okay.
	
	-s #	-- print this many "top-scoring" servers at the end of
		   the list.  "-s 0" disables printing of high scores.


The Future
==========


Here are some possible improvements:

	- try to estimate line bandwidth somehow.  The 'bing' program does
	  it using two different ping packet sizes.

	- try to improve 'ping time' estimate.  It's a problem right now
	  because netselect writes a lot of packets in a quick stream (for
	  speed reasons).  It's fair to each host, though:  they all put up
	  with an equal amount of lag :)

This program is highly experimental.  Please let me know what you think.

	- Avery Pennarun
	  <apenwarr@gmail.com>
ml10/netselect