Have you ever wondered what English word is most frequently used? Do you know if people are talking about dogs or cats, which one is more frequently? This small project should help you answer all these questions.
First, you clone this repo:
cd ~/code/
git clone https://github.com/kittipatkampa/english_word_frequency.git
You will find a new directory english_word_frequency
created under your current directory ~/code/
. Now, go into the new dir:
cd english_word_frequency
Now, we will use pipenv to install all the dependencies:
pipenv install
We will use the English Word Frequency dataset made available by Rachael Tatman in this Kaggle article. Somehow, I just put the csv file unigram_freq.csv.zip
in this repo already for our convenience.
So, we will need to unzip the file using this command:
unzip unigram_freq.csv.zip
which gives you unigram_freq.csv
in your project directry, which can be verified by:
ls -l *.csv
and you should see the csv file there:
-rw-r--r--@ 1 kittipat.kampa staff 4956252 Sep 21 2019 unigram_freq.csv