The goal of this project is to find if profession in certain programming languages influence your code style when writing other programming languages.
A (constructed) example: If we have a developer who programmed a lot of Java in his career and he now tries to write some Python code, will he make some mistakes we can classify as "typical" for a Java developer? E.g. certain naming conventions, unneeded wrapper classes ect.
For a further introduction to the topic and a guideline on how this works, have a look into the notebook.
After cloning the project, you should install the dependencies (for python 3, make sure pip
refers to the version for Python 3 or use pip3
):
pip install -r requirements.txt
To enable linting for Python 2 files, you also need to install pylint. (This command is under the assumption that python
refers to a Python 2 version.)
python -m pip install pylint
Windows users can just use the py
command:
py -2.7 -m pip install pylint
You need access to the GHTorrent Database via a Postgres server. You can configure the access to it in the config-yml file (you have to create this yourself).
Example file:
# The database name
dbname: github
# The database user
user: user
# The database password
password: password
# The host machine where the databse is located
host: 127.0.0.1
# The port on which to reach the database (default)
port: 5432
# Your personal GitHub token for checking out the projects
github-token: aaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaa
# Your GitHub account name (which generated your token)
github-user: ghuser
# The directory to where you want to clone the projects
repo-dir: /path/to/folder/
Work in Progress
Recommended: Install jupyter notebook and serve this folder, then have a look at the notebook. It is an interactive version of this pipeline and includes a lot of explanations.
Deprecated: There is a main file you can run that demonstrates basic functionality:
python3 main.py