/texttiling

Implementation of the TextTiling algorithm for CS187

Primary LanguagePython

  • Kevin Mu, Jonathan Miller, Stella Pantela, and Dianna Hu
  • CS187 - Computational Lingustics
  • Final Project (TextTiling) - Group Implementation
  • README

Setup Instructions


  1. If nltk is already installed, skip to step 5.

  2. Run "python ez_setup.py"

  3. Run (sudo) "easy_install pip"

  4. Run (sudo) "pip install -U nltk"

  5. Run "python", then type "import nltk"

  6. Type "nltk.download()". A new window should open, showing the nltk Downloader.

  7. Click the "corpora" tab.

  8. Select "Stopwords Corpus" (stopwords) and "WordNet" (wordnet), and click Download.

  9. Close the nltk downloader and exit python.


Running Instructions


  1. cd into the project directory

  2. Run: python texttiling.py a) The scores_outfile is the file where you want the results to be written

    e.g., python texttiling.py outfile.txt


Scraping Articles


  1. If you would like to scrape other articles using scraper.py, you can do that by first installing BeautifulSoup.
  2. Then change the 'seed' value in the main() function of scraper.py
  3. You can also adjust the number of articles you scrape (N).
  4. Run: python scraper.py
  5. Verify that the articles were correctly scraped and placed in the articles folder.