This is a Python script that fetches news articles from Yahoo Finance and Google Finance RSS by stock symbol (e.g. MSFT, ORCL, USO), and groups the articles using non-negative matrix factorization (NMF). Sometimes the groupings are wonky, but relevant themes throughout the stories do arise. This can at least be used for a quick perusal of news articles, FWIW. Beautiful Soup is used to extract the text from each article. Stop words are removed and only English words are used in the word matrices for each article. The generated article groupings are rendered as an HTML file via a Cheetah template. The HTML output is sent to ./output/<symbol_name>.html by default. Usage: gen_features.py -s <symbol> [-o <outputfile>] [-n <numfeatures>] [-i <iterations>] Example: python gen_features.py -s msft -n 10 -i 50 Requirements: Beautiful Soup (http://www.crummy.com/software/BeautifulSoup/) Cheetah (http://www.cheetahtemplate.org/) feedparser (http://www.feedparser.org/) NumPy (http://numpy.scipy.org/) PyWordNet (http://osteele.com/projects/pywordnet/)
edtechre/Yahoo-Finance-and-Google-Finance-NMF
Groups a company's news articles from Yahoo Finance and Google Finance using non-negative matrix factorization (NMF)
Python