/code

code for DisCo project

Primary LanguagePython

code

code for DisCo project

stripcorpus.py is designed to take the reuters21578 data files, turn them into one file per document with the SGML removed. Name is according to topic. Contents is based on title and body fields. Train and test directories according to modApte split