- Kevin Mu, Jonathan Miller, Stella Pantela, and Dianna Hu
- CS187 - Computational Lingustics
- Final Project (TextTiling) - Group Implementation
- README
Setup Instructions
-
If nltk is already installed, skip to step 5.
-
Run "python ez_setup.py"
-
Run (sudo) "easy_install pip"
-
Run (sudo) "pip install -U nltk"
-
Run "python", then type "import nltk"
-
Type "nltk.download()". A new window should open, showing the nltk Downloader.
-
Click the "corpora" tab.
-
Select "Stopwords Corpus" (stopwords) and "WordNet" (wordnet), and click Download.
-
Close the nltk downloader and exit python.
Running Instructions
-
cd into the project directory
-
Run: python texttiling.py a) The scores_outfile is the file where you want the results to be written
e.g., python texttiling.py outfile.txt
Scraping Articles
- If you would like to scrape other articles using scraper.py, you can do that by first installing BeautifulSoup.
- Then change the 'seed' value in the main() function of scraper.py
- You can also adjust the number of articles you scrape (N).
- Run: python scraper.py
- Verify that the articles were correctly scraped and placed in the articles folder.