2ch-arch-web-scrapper

Script to parse all 2ch.hk /b archive into csv file and Jupyter Notebook to analize dat shit.

Prerequisites

Installed conda or miniconda - link to docs.

On mac you can install it with the brew:

brew install --cask miniconda

Create environment from a file:

conda env create -f environment.yml python=3.9

Additional info:

In contrast, if you fail to specify an interpreter, as with conda create --name env-00, the environment won't appear in the list.

Run

Gather dataset:

conda activate 2ch-arch-web-scrapper
python scrapper.py

Clean dataset from every chuck header: id,date,title,link. Just replace it with empty row.

Then you can run notebook.

Motivation

Want to roll into DS a little, so I decided to analyze 2ch threads with some metrics. Of course, firstly I need a dataset.