/Messenger-analysis

Messenger chat analyzer. Take a look at the in-depth study of your chat history.

Primary LanguagePython

Messenger Report Generator

Number of messages in chats plot

For many of us, Messenger is the main communicator. It contains a lot of information about ourselves and our relationships. This repository contains a script that generates a bunch of charts about your messages history.

Charts generated by the script:
  • messages count rank
  • overall activity over time
  • average activity over a day
  • average activity over a week
  • average message lengths in significant chats
  • word clouds of important phrases in chats
  • activity over time per chat
  • messages length distributions in significant chats
  • language diversity rank (experimental)

Table of contents

  1. Usage
  2. Examples
  3. Contribute

Usage

Collecting data

Facebook enables its users to get their Messenger messages history.

Data requesting steps:

  1. Go to facebook settings and then proceed to downloading your data.
  2. Deselect all data and select only Messages
  3. Choose data format to JSON
  4. Choose the multimedia quality to low (all the media in chats are downloaded as well but they are omitted by the script)
  5. Accept data request

Preparing data file shall not take more than 24h. You will be notified when your file is ready.

Setting up the script

After cloning this repository place the downloaded zip in zips subdirectory and setup the virtual environment for python 3.8.

On Linux you can use virtualenv.

On Windows you have to use conda virtual environment. You can use either

  • Miniconda - install and run cmd via Anaconda Prompt (miniconda3) and cd to the cloned repository directory
  • Anaconda - install and run Anaconda Navigator (anaconda3), then go to Environments, setup new environment, start it via cmd and cd to the repository directory.

After setting up the environment and opening the repository directory run:

pip install -r requirements.txt
python -m spacy download pl_core_news_md
python -m spacy download en_core_web_sm

In params.json you shall set your "user", "language" and "timezone".

{
  "user": "Bartek Pogod",
  "language": "polish",
  "timezone": "Europe/Warsaw",

  [...]
}

Running script

If all is set up properly the charts shall be generated after running:

python messages_analysis.py

After a couple of minutes, all the plots shall appear in figures folder (or other specified in params.json).

Examples:

Activity in chats plot

This plot can show how your relationships changed over time. It can show when your relationships started to form or to collapse. The lines are smoothened to increase visibility. Activity in chats

Average messages length in significant chats

This chart can say a lot about the interactions. Usually, longer messages are more formal, possibly more personal. It says "in significant chats", because some chats have too few messages to be considered important. Message length rank

Chat keyword cloud

It is generated using TextRank algorithm. Size of the words shall represent the importance of them in a chat. The example chart is in polish, because it is the first language of the author. Chat wordcloud

Language diversity rank

Language diversity score shall represent how diverse is the vocabulary of the speaker in a chat.

To calculate the score the messages sent by a chat participant are prepared - numbers, punctuation and entities are removed. All the words are lemmatized, to get the word base form. Then the messages sent by one person are divided into batches of 2000 words. For every 2000 words, there is calculated the quotient of lemmas number and batch size (2000). The final score is a mean of those quotients. Language diversity rank

Contribute

The possibilities are almost endless. Take a look at the issues tab to write your own ideas or see how you can help! Let's make something great :D.