/ContentCopyExtraction

MS Project work by Dan Stevents

Primary LanguageJupyter Notebook

Information Similarity: Scalable Method For Finding Very Similar News Articles

Use run.sh to run the model

When reading in the data you must be connected to the VPN or on the school wifi. You must download the first page of the data from the API as a json file then the code will pull the rest using the "next" pointer.

When displaying the output there are 3 options: first displaying the output from one data file, second giving 2 data files and the flag "common" displays the articles in common, or third changing the flag to "difference" will output results from data file one that are not in data file 2.