TopoScope is an AS relationship inference algorithm that combines ensemble learning and Bayesian networks. In addition, TopoScope also supports hidden link inference. You can learn more about TopoScope in IMC 2020.
To get started using TopoScope, clone or download this GitHub repo.
TopoScope runs with python 3.6.8
Install Python dependencies
$ pip install --user -r requirements.txt
Download AS to Organization Mapping Dataset from CAIDA
https://www.caida.org/data/as-organizations/
Download PeeringDB Dataset from CAIDA
Before March 2016: http://data.caida.org/datasets/peeringdb-v1/
After March 2016: http://data.caida.org/datasets/peeringdb-v2/
Download Prefix2AS Dataset from CAIDA
http://data.caida.org/datasets/routing/routeviews-prefix2as/
Prepare BGP paths from Route Views and RIS
You can prepare BGP paths from BGPStream or download rib file from Route Views and RIS.
Noting that TopoScope only use IPv4 AS paths. Here is an example to extract AS paths from rib file:
prefix = re.search(r'PREFIX: ([^\n]*)\n', block).group(1).strip()
if prefix:
aspath = re.search(r'ASPATH: ([^\n]*)\n', block).group(1).strip()
if '{' in aspath or '(' in aspath:
continue
if ":" not in temp_prefix:
output.append(aspath.replace(' ', '|'))
#output.append(aspath.replace(' ', '|') + '&' + prefix)
Prepare BGP paths from Isolario
You can download rib file and extract AS paths from Isolario.
Noting that Isolario data is only used for hidden link inference in the IMC paper. But you can also use it for basic inference by yourself.
The ASes on each BGP path should be delimited by '|' on each line, for example, AS1|AS2|AS3.
Parse downloaded BGP paths
$ python uniquePath.py -i=<aspaths file> -p=<peeringdb file>
# e.g. python uniquePath.py -i=aspaths_2019.txt -p=peeringdb_2019.json
# Output is written to 'aspaths.txt'.
Run AS-Rank algorithm to bootstrap TopoScope
$ perl asrank.pl aspaths.txt > asrel.txt
Run Toposcope
$ python toposcope.py -o=<ASorg file> -p=<peeringdb file> -d=<temporary storage folder name>
#e.g. python toposcope.py -o=asorg_2019.txt -p=peeringdb_2019.json -d=tmp/
# Output is written to 'asrel_toposcope.txt'.
Output data format
<provider-as>|<customer-as>|-1
<peer-as>|<peer-as>|0
<sibling-as>|<sibling-as>|1
Hidden link inference
The ASes on each BGP path should be delimited by '|' on each line, followed by '&' and prefix, for example, AS1|AS2|AS3&prefix.
Parse downloaded BGP paths
$ python cleanPrefix.py -i=<asprefix file> -p=<peeringdb file>
# e.g. python cleanPrefix.py -i=asprefix_2019.txt -p=peeringdb_2019.json
# Output is written to 'fullVP.txt', 'aspaths0.txt', 'aspaths1.txt', 'asprefix0.txt', 'asprefix1.txt', 'chooseVP0.txt' and 'chooseVP1.txt'.
Run AS-Rank algorithm to bootstrap TopoScope
$ perl asrank.pl aspaths0.txt > asrel0.txt
$ perl asrank.pl aspaths1.txt > asrel1.txt
You can also use basic inference result of TopoScope instead of ASRank to finish this step.
Find miss edges and choose ASes similar to full VPs
$ python getMissEdge.py
# Output is written to 'triplet_miss0.txt' and 'triplet_miss1.txt'.
$ python chooseAS.py
# Output is written to 'chooseAS.txt'.
Run Toposcope to find hidden links
$ python newlink.py -f=<prefix2AS file>
# e.g. python cleanPrefix.py -f=pfx2as_2019.txt
# Output is written to 'futher0.txt' and 'futher1.txt'.
Infer AS relationships of hidden links
$ python linkRel.py
# Output is written to 'asrel_hidden.txt'
Output data format
<provider-as>|<customer-as>|-1
<peer-as>|<peer-as>|0
Example
You can download AS relationship inference result form 2017 to 2019 in /asrel/.
You can contact us at jinzt19@mails.tsinghua.edu.cn.
We will update them later.