revtr: A Python repository from littlespace

*** What is this project all about? ***

Primary Goal
* Make the reverse traceroute system better by minimizing the number of RR-pings required,
* and by maximizing the amount of information we get from each one.

Why Rank VPs?
* First of all, to do reverse traceroute, we need to find VPs that are RR-reachable
* i.e. there are empty slots in the RR header by the time that the VP->Dst ping reaches
* Dst. Second of all, we do not want to issue so many RR-pings into a network that
* rate-limiting is triggered. Third of all, it would be great if the VPs chosen were
* as close as possible to the destination, so as to maximize the number of free slots
* in the RR header.

VP Ranking Methods
* 1) Destination Cover
+ Overview:
- Say we have a new destination with prefix 'P'. Given a set 'S' of VPs for which we
- already have some RR-reachable measurements to other destinations within P, rank
- each VP 'v' in S in terms of how many such destinations it reaches. This is a greedy
- set-covering approximation algorithm which just picks the "set" v that has the largest
- number of RR-reachable destinations in P.
+ Drawbacks:
- 1) It has no "terminating condition"; i.e. just because it supplies us with a rank-ordering
- of VPs, we may still have to issue probes from each VP moving down the list towards our
- new target destination. This could end up having us send pings from every single VP in S!
- 2) It does not take into account the number of hops away from the destination a VP sits.
- By extension, it does not maximize the amount of information we can get back from an RR-ping.
* 2) Top-K / Restricted Set
+ Overview:
- A small number of M-Lab sites can to a large majority of RR-reachable destinations. As shown
- in Brian's paper, using just the top 10 M-Lab sites reached 95% of RR-reachable destinations
- , within the same number of hops, as reached by the full set of M-Lab + PlanetLab nodes.
- In light of this, restricting the set of VPs to just the top-K can give most of the results,
- while greatly reducing the amount of probing required.
+ Drawbacks:
- ???
* 3) Ingress-Based
+ Overview:
- Define some notion of "network" -- we choose BGP-routable prefix in our work -- and find for each
- RR-pingable measurement the point of "ingress." That could be either the last hop before entering
- the given prefix, or the first hop after entering the prefix -- as long as the choice is consistent.
- Now just deliver a ranking based on the number of hops away from an ingress that a VP sits. If several
- VPs enter the network via the same ingress, just keep the one with the smallest hop-count, and discard
- the rest. This is okay because Internet routing is destination-based: any distinct routes from S to D
- going through some common router R will likely be exactly the same from R to D.
+ Benefits:
- 1) This approach does have a "terminating condition" in that many VPs from the original set S will be
- eliminated from the delivered list of rankings. This contributes towards the overall goal of minimizing
- the number of probes sent overall during the RTR process.
- 2) The distance from VP to ingress is taken into account for these rankings. This effectively maximizes
- the amount of information we are likely to receive from a successful RR-ping when using top-ranked VPs.
+ Drawbacks:
- ???

Evaluation
* We will use existing measurements from 2011 in conjunction with BGP-dumps from the same time period to evaluate the
* efficacy of each of the three aforementioned methods. For each BGP-routable prefix for which the VPs in this
* dataset have RR-reachable measurements, the measurements will be split into test and training data. The training
* set is used to create VP rankings via each of the three approaches. The test set will be used to emulate actual
* RR-pings to never-before-seen destinations. In the process of emulation for each method we can record metrics:
* - NumProbes: the number of probes we had to send before receiving a successful stamping
* - NumHops: the number of hops it takes to reach the destination on a successful probe
* - VPChosen: ???
* - AltVPExists: ???
* With these metrics in hand, we'll be able to make a first-order judgement about the relative effectiveness of each
* ranking method. Results will include:
* - CDF showing the fraction of RR-reachable test-destinations for which k probes were required to receive a stamp;
* legend: ranking methods
* - CDF showing the fraction of RR-reachable test-destinations for which the destination was reached within k hops;
* legend: ranking methods
* - ...

Challenges
*) Large datasets can take a long time to process, so writing efficient fast code was an important consideration
*) ...

littlespace/revtr