The goal of this program is to track and utilize data found from the news/reddit about hot stocks.
End game is to have something trading for us that will just be reading posts on WSB and yFinance. Currently this program finds word counts of stocks (or what it believes are stocks) and tracks the data.
Use Main.py not RedditScrapeEx.py for main operation
Important Classes
news.py : newsScraper
StonkRedditAPI.py : StonkRedditAPI
DataTracker : Manager
Class Descriptions
newsScraper
Description :
This class is used for finding word counts from news websites that do not have APIs. The flow of this class is as follows. Input a link to the root of a financial news site(Parent link). From that link find any additional links containing the seperator at or after the given tolerance. Visit each Child link and temporarily store all of the <p> tag data. Using the tag data and a CSV containing known symbols find and count every use of any symbol. Do this for every child link and then create a counter object called word_count that contains the final representation of the data found.
This is not a very useful field to view but it is relevant to the overall data produced and I used it a few times during debugging so it is not fully useless.
quotes
list
This is specific to YahooFinnances website structure. There are some random tickers included in the articles that are separate from the p_data
["SYMB1, "SYMB2",... "SYMBN"]
Only helpful for yahoo news links
word_count
Counter
Counts the amount of times a stock was mentioned in all of the articles.
url and separator must be set. While tol, nickname, and debug all have default values. The field summary is listed above but this constructor is straight forward.
Method SummaryThere are no methods in this class. Everything is done on initialization
StonkRedditAPI.py
Description :
This class uses the reddit API directly and runs it through the Spacy model en_core_web_trf. We are using the 'organizations' entity to find tickers and important terminology within subs.
Field Summary
Name
Type
Description
Example
Notes
sub
string
Subreddit to be searched
'r/Wallstreetbetsnew'
Choose a sub from reddit to view
listing
string
The group of posts to be searched
'hot', 'new', 'random', 'rising'
Categories of searches within reddit API
dataType
string
Tag for the data we want to look at
'title', 'selftext', 'all'
Only look at titles, selftext of a posr only look (Default - 'all')
searchAmount
int
Number of searches to perform
100
The amount of searches for the listing. (Default - 100)
sub, and listing are required while dataType, searchAmount, and headNum have default values. Parameters are specified above.
Method Summary
Name
Description
get_orgs(text)
Loads Spacy model en_core_web_trf and processes all the data taken from the reddit API into a list to be filtered by the model. and returns the list of organizations found.
Manager
Description :
This class is used to store our count data over longer periods of time. For each symbol that is found the Manager will first combine all lists of found symbols and counts, then create or append a file for each individual symbol. It is also able to read the count data for any symbols in our database
Field Summary
Name
Type
Description
Example
Notes
root
string
The root directory to store all of the data into
'StockData' or 'DataFolder'
Create this folder before hand or there will be issues. The read method is only able to read from data in the root folder.
Method Summary
Name
Return Type
Parameters
Parameter Types
Description
Example
Notes
storeDict(dict)
void
inDict
Counter/Dictionary
Takes an input list of counters containing (symbol,count) and then appends the count data to a file for that symbol
let someCounter = {'a':2,'b':3,...'z':n}, storeDict(someCounter)
Bread and butter of this class.
storeDataSet(list)
void
listOfDicts
List
Takes in a list of counter objects. Merges them together so there are no duplicates, and then runs storeDict() on the merged counter.
let li = [someCounter,someCounter1, ... someCounterN], storeDataSet(li)
Caller of the storeDict method
readData(str)
List
symbol
string
Looks through the root folder for the given symbols data. Then returns the tracked data.
readData('AMC') or readData('GE')
This only works if the files are in the root folder. If you are in a new root, or you have not tracked the symbol yet then it will return an error
Main.py
Description Main driver for the application. Has a UI to run the scrapers and data collection, and also allows for viewing of the collected data
Method Summary
Name
Return
Params
Param Type
Description
Example
Notes
runSingle()
void
none
N.A
Runs a single instance to collect data via news.py and StonkRedditAPI.py.
N.A
This function controls what is used in the runInBack function
runInBack()
void
none
N.A
Runs a runSingle() over a specified time period and frequency
N.A
look()
void
none
N.A
Allows user input to look at collected data
N.A
Going to improve this to show collected symbols and then let user pick
changeSettings()
void
none
N.A
Nothing yet, but plan is to let user add/change what scrapers are used to fine tune the data collection if wanted
N.A
processCMD
function
input
str,char
There is a command dictionary, the input is the key for the function to be executed in the dictionary
let CMDS = {'a':doSomething,'b':doAnotherThing}, processInp('a') -> doSomething
Helpful and fun thing. Happy I found this and I want to use this style more
printMenu()
void
none
N.A
Prints all the available options, then based on input it will run one of the above functions