Authors: Jonathan Su, Kevin Liu, Filip Koperwas, Vicson Moses
nltk pip3 install nltk
pandas pip3 install pandas
mysql.connector pip3 install mysql-connector
prettytable pip3 install prettytable
Edit files/project1.py and change the database username and password to connect to your local database
May have to do this if you run into a download error error at nltk.download()
Change directory to the python folder: cd /Applications/Python 3.6/
Run the command: ./Install Certificates.command
Some information may be outdated
- use the 10 documents in the projects/temp_data
- find a tokenizer to turn every file into a tuple of (token_id, token, document_id)
- Use it to get the TFIDF ratios for each token.
- Make a table of tokens where the tuple is (doc_id, TFIDF ratio).
- recognize tokens by 1.1 'alphabet' , 1.2 'numeral' and 1.3 rest of them will be one byte one token
- Please look at the data in TT20