Facebook-Data-Miner provides a set of tools that can help you analyze the data that Facebook has on you.
The vision is to support both data extraction, data analysis and data visualization capabilities through any of the interfaces.
All computation happens on your machine so no data gets sent to remote computers or third-parties.
As of now the package was only tested on Linux, however with pipenv
it is should be easy to set the application up on Windows.
The application was tested on Debian 10 and Python v3.8.3. You will need Python 3.8 (some features are used).
To download Python refer to the official Python distribution page.
This package works by analyzing your Facebook data, so you have to download it.
Please refer to the following link in order to do so.
IMPORTANT NOTE: you have to set Facebook's language to English(US) for the time being you request your data. This change can of course be reverted later.
You will only need the zip file's absolute path later to use this software.
You have to change the DATA_PATH
variable in the
configuration.yml file.
NOTE: facebook-data-miner
will extract your zip file in the same directory.
For this you may need several GBs of free space depending on the volume of the
original data.
Clone this repository by either using SSH:
git clone git@github.com:tardigrde/facebook-data-miner.git
or HTTPS:
git clone https://github.com/tardigrde/facebook-data-miner.git
This project uses pipenv
for dependency and virtual environment management.
Install it by typing:
pip install --user pipenv
In the project root (where Pipfile is) run:
pipenv install --dev
Make sure you run the application in this environment.
With the makefile:
make lint
With the makefile:
make test
Make sure you run the application in this environment.
The app has both a CLI and an API. For now, API is the preferred way to run the app since there is no database yet, which would hold your facebook data in memory. CLI works but it's slow.
I wrote two jupyter notebooks in order to showcase the capabilities and features of the API and CLI. The notebook contains lots of comments to help understand how the app is built, and what kind of information you can access, and how.
For this you have to start a jupyter
server.
As in the notebooks mentioned, you have to set the $PYTHONPATH env var
before starting a jupyter server.
export PYTHONPATH="$PWD"
Then type the following in your terminal if you want to use jupyer notebook
:
jupyer notebook
or for jupyter lab
:
jupyter lab
Select notebooks/API.ipynb (or notebooks/CLI.ipynb) and start executing the cells.
As in the notebook already described, the entrypoint is
miner/app.py's App
class. For now the docstring is the only
documentation.
Call it from a script (after you set the data path) like:
from miner.app import App
app = App()
The command-line interface has a lot of depth, as you are showed in notebooks/CLI.ipynb, but it is slow, because the data that gets read in does not stay in RAM.
For running the CLI:
export PYTHONPATH="$PWD"
python ./miner/app.py --help
Help is more than welcome. It is still a long way to go until v1.0.0
Ideas are welcome too. Feel free to open a new issue.