/TheGenreFactor

Repository for our machine learning project for the ML Seminar in the summer term 2023.

Primary LanguageJupyter Notebook

The Genre Factor

Instructions to run the complete project

  1. Schleife über Wikidata (Achtung, lange Laufzeit)
  2. Kombiniert Daten aus script/run_add_to_data
  3. Selektion der Daten
  4. HPO für das NN (Achtung, lange Laufzeit)
  5. Zusammenfügen der Hyperparameter aus unterschiedlichen Suchen in eine csv Datei
  6. Training und Auswertung von NN, SVM, kNN

Package Prerequisites

Poetry

The dependencies of this project are managed by poetry. From the poetry website:

Poetry - Python packaging and dependency management made easy

Develop - Poetry comes with all the tools you might need to manage your projects in a deterministic way.

Build - Easily build and package your projects with a single command.

Publish - Make your work known by publishing it to PyPI.

Track - Having an insight of your project’s dependencies is just one command away.

Dependency resolver - Poetry comes with an exhaustive dependency resolver, which will always find a solution if it exists.

Isolation - Poetry either uses your configured virtualenvs or creates its own to always be isolated from your system.

The last part is probably the most relevant here: poetry allows us to specify all dependencies in a text file (the pyproject.toml file). It then creates a python virtualenv with these specifications. This way, we will never have any clashes regarding different versions installed on different machines.

Installation of poetry is straight forward:

curl -sSL https://install.python-poetry.org | python3 -

The full installation manual is here. However, the command above should be enough.

To activate command completion in the terminal you have to run

poetry completions bash >> ~/.bash_completion

Running scripts

To run a script, use

poetry run python your_script.py

Installing dependencies that another user added to the .toml file

When another user added a dependency to the .toml file and pushed it via git, you can sync your own virtualenv by running

poetry install --sync

Adding packages to the .toml yourself

If you would like to add another package to the project, you can run

poetry add numpy

In this example I used numpy, which is already installed, but you get the point. You can always edit the pyproject.toml file manually; this might be handy if you want to have a specific version.

After adding a package to the project, to install it, you can run

poetry install

which should sort out all version conflicts and install the specified packages in the virtualenv.