/InsightOverflow

A bachelor's thesis focusing on making an exploratory analysis from Stack Overflow posts, making general and user-centric analyses on discussed topics.

Primary LanguagePythonMIT LicenseMIT

Insight Overflow

An exploratory analysis employing topic modeling: Tracking evolution and loyalty from Stack Overflow users' interests

Running this experiment requires downloading Stack Overflow posts from the data dump and extract the .7z file into src/data/. As this algorithm employs Redis database for extraction step, installing, configuring, and starting Redis is essential (a tutorial is found here).

Extraction

Extraction started
  Extracted: 49598818
  Ignored: 739023
  Total: 50337841
Execution time: 04:11:27.56

Pre-processing

Pre-processing started
Execution time: 102:39:36.14

Topic modeling

Topic modeling started
  Corpus built: 00:00:01.65
  Experiment done: k=20 i=10 | p=4133.9019, cv=0.4946
  Experiment done: k=20 i=100 | p=1433.5471, cv=0.6330
  Experiment done: k=20 i=200 | p=1388.5460, cv=0.6343
  Experiment done: k=20 i=500 | p=1365.3670, cv=0.6341
  Experiment done: k=40 i=10 | p=5503.5514, cv=0.5449
  Experiment done: k=40 i=100 | p=1448.7289, cv=0.6046
  Experiment done: k=40 i=200 | p=1379.5958, cv=0.6051
  Experiment done: k=40 i=500 | p=1330.4556, cv=0.6072
  Experiment done: k=60 i=10 | p=6675.3963, cv=0.5221
  Experiment done: k=60 i=100 | p=1448.0626, cv=0.5874
  Experiment done: k=60 i=200 | p=1349.6507, cv=0.5940
  Experiment done: k=60 i=500 | p=1290.6926, cv=0.5880
  Experiment done: k=80 i=10 | p=7576.2664, cv=0.5115
  Experiment done: k=80 i=100 | p=1457.7716, cv=0.5800
  Experiment done: k=80 i=200 | p=1351.4062, cv=0.5866
  Experiment done: k=80 i=500 | p=1288.1277, cv=0.5892
  Experiment done: k=100 i=10 | p=8093.3122, cv=0.5114
  Experiment done: k=100 i=100 | p=1448.3062, cv=0.5762
  Experiment done: k=100 i=200 | p=1341.3547, cv=0.5787
  Experiment done: k=100 i=500 | p=1272.4512, cv=0.5794
Execution time: 00:54:22.32

Post-processing

Post-processing started
  Extracting topics
  Creating coherence chart
  Creating perplexity chart
  Computing general popularity
    Posts covered: 49573604
    Number of posts with empty topics: 36085
    Computed metrics: 4410
  Creating general popularity charts
  Computing user popularity
    Posts covered: 49573604
    Number of users: 4943206
    Number of posts with empty topics: 36085
    Computed metrics: 534554010
  Creating user popularity charts
Execution time: 12:57:57.99