Since its first appearance in 2009, the term "Deep Web" has designated the non-indexed parts of the World Wide Web, that is by standard search engine. Using the development of a FOSS anonymity network software called TOR , a whole digital world was born and has been growing ever since. Making the best out of the anonymity that is provided to them, TOR users have, over the time, developed complex infrastructure in this Deep Web to make the discussion, the advertisement and the purchase of any service or item that would be deemed illegal by local authorities, accessible to all.
With this new kind of distribution, law enforcement had to adapt in order to regulate these illicit markets. On 5 and 6 November 2014 an international law enforcement operation targeting darknet markets and other hidden services operating on the Tor network was launch, Operation Onymous, the operation involved the police forces of 17 countries, more than 400 sites were closed and 17 arrests were made.
If the anonymity factor remains intact, tools have been developed to scrape and archive most services available on the TOR network. From forums to marketplaces, including search engines, messaging services, etc. This Project will try to get an overview of the impact of huge raids such as Operation Onymous on the darknet us.
During this project, we will address several research questions regarding impact of Operation Onymous on the darknet market Agora:
- How did the market was impacted in term of volumes, categories of products?
- How did the prices evolve globally ?
- How the import/export flows were impacted ?
- How did the vendors habits and operations security evolve ?
- A Data story website relating our findings can be found here.
- The Notebook we relied on to make it is Final_Notebook.ipynb
-
The archive contains mostly scrapped html pages from the many marketplaces, forums and other services (e.g. Grams search engine) that were active during the period mentioned in the title. This raw data is organized first by service, then by date (meaning that for every service, one can go to a specific date and see a list of html pages). Every archive is unambiguous on the format of the platform it represents, standard formatting can then be expected (e.g. item, profile, forum thread, list of items, etc.). However it is expected to be highly incomplete and most likely present inconsistencies. All the directories are compressed using
tar.gz
compression. The whole archive is about60 GiB
compressed and estimated to be about1TiB
completely uncompressed. -
Unshaken by the enormous size of this archive, a large amount of processing work is expected in order to filter out all the html formatting data. Extracted data will most likely be placed into several
Pandas DataFrame
before being processed and prepared for statistical work. - Using online resources like the description of the dataset or tools from provided and found papers. As mentioned in the source description, the incompleteness of the Dataset will require a thorough study of the semantic behind the data as well as the use of adapted tools and methods.
data/
└── agora
└── YYYY-MM-DD
├── cat # Directory containing list of listing for every category
│ ├── cat_name_hash
│ │ ├── page_0.html # Contains Title, Ships Fr. Ships To, Price in BTC, vendor_name, rating
│ ├── [...]
│ │ ├── page_N.html
├── p # Directory containing list of all listings page
│ ├── listing_0_hash.html
│ ├── [...]
│ └── listing_N_hash.html
└── vendor # Directory containing list of all vendors profile page
├── vendor_0_name.html
├── [...]
└── vendor_N_name.html
└── agora-forum
└── YYYY-MM-DD
├── index.php
│ ├── board,n.items_offset.html
[...] # Each File contains a list a topic for a given board (title, authors, n_views, n_replies)
│ ├── board,N.10650.html
└── index.php?action=stats # Contains num of posts, replies, and other global stats
Due to a process of automatic parsing, prone to failure, the data is inconsistent and part of it is unusable. Usually it is because the web scraping failed so the files are incomplete at best if not inexistent at all. We were force to get rid of many scrapped dates because of that, to avoid plots leading to wrong conclusions.
We have seen that Operation Onymous did not have a huge impact on the market as it went back to normal shortly after Operation Onymous. However it leads to interesting changes on the vendors behalf : the small suppliers tend to quit the market while the bigger ones seem to grow. One could then ask if the operation was a success since it reduce the number of vendors but it didn't disturb the bigger ones who are supposedly the hardest to arrest. Either way, during this project we manage to extract information from a huge amount of data and make a nice Data Story out of it.
The darknet is a really interesting source of data and one could imagine continue this project with other research questions to highlight the impacts of Operation Onymous or similar operations typically by analysing the impact they had on other market or by taking in account external parameters that could influence the market. Doing other data analysis project on drug consumption and weapon trafic and merge the result with darknet exchanges could also bring an interesting point of view on the subject.
- Arthur: Forum analysis, Grams pages parsing, vendors analysis
- François: Data story page, products analysis
- Florine: Data story texts, poster, presentation ?
- Quentin: Agora web pages parsing, product price analysis
-
'Dark Net Market archives, 2013-2015' - Gwern Branwen and al., - 2015, - https://www.gwern.net/DNM-archives
-
'The Dark Net: De-Anonymization, Classification and Analysis' - Rebecca Portnoff - 2018 - http://digitalassets.lib.berkeley.edu/techreports/ucb/text/EECS-2018-5.pdf
-
'Tools for Automated Analysis of Cybercriminal Markets' - Rebecca Portnoff and al. - 2017 - http://damonmccoy.com/papers/cyberforum-analysis-www17.pdf
-
'Do police crackdowns disrupt drug cryptomarkets? A longitudinal analysis of the effects of Operation Onymous' - Décary-Hétu and Giommoni - 2016 - http://damonmccoy.com/papers/cyberforum-analysis-www17.pdfe