Post analysis (and possibly prediction) for questions/answers from Stack Overflow.
-
Get data into a Pandas Dataframe
-
Use the following to use as features
- Word count
- Readability index (find the specifics)
- Contains Source Code
- Contains Latex math
- Sentiment Analysis (Possibly predictive, but more intended for descriptive)
-
Run a Predictive Analysis
-
Also Consider Descriptive Analysis of Different Stackexchange Communities.
Flesch-Kincaid Grade
Gunning Fog Index
Coleman-Liau Index (Using)
SMOG Index
Automated Readability Index
Flesch-Kincaid Reading Ease (Using)
Spache Score
New Dale-Chall Score (Using)
Code Count (whether code is present)
Latex Count (whether Latex code is present)
Punctuation Count (how much punctuation is present)
Cleaned Text (Usable for sentiment analysis)
Internet Archive Currently testing work on AI and IOT data sets from Internet Archive. Should add more to extend analysis once we have a working model.