MIME Diversity for TREC Polar Data set
Please refer to the below link for the Visualization of MIME diversity of original dataset http://cs-server.usc.edu:14596/visualization/index.html
By this project, we intend to extend the capabilities of Apache Tika for its MIME Detection for application/octet files in specific In the course of this assignment, we will be using BFC, BFD, FHT algorithms to generate model for files based on their MIME type, using training data of the TREC Polar datasets. We will later validate those model and suggest extensions in the Apache Tika MIME library.
Visualization of BFA for all MIME Types in the TREC POLAR dataset available here http://cs-server.usc.edu:14596/visualization/bfa.html