/cartesius

Benchmark & Pretraining for Cartesian coordinates feature extraction

Primary LanguageJupyter Notebook

cartesius

Benchmark for Cartesian coordinates feature extraction

GitHub release Test status Lint status Documentation status Kaggle

DescriptionInstallUsageDocumentationContribute

Description

This repository contains the data for training & benchmarking neural networks on various tasks, with the goal to evaluate feature extraction capabilities of benchmarked models.

Extracting features 2D polygons is not a trivial task. Many models can be applied to this task, and many approaches exist (learning from raw coordinates, learning from a raster image, etc...).

So it's necessary to have a benchmark, in order to quantify and see which model/approach is the best.

Install

Install cartesius by running :

pip install spwk-cartesius

Usage

In cartesius, the training data is polygons that are randomly generated.

Let's have a look. First, initialize the training set :

from cartesius.data import PolygonDataset

train_data = PolygonDataset(
    x_range=[-50, 50],          # Range for the center of the polygon (x)
    y_range=[-50, 50],          # Range for the center of the polygon (y)
    avg_radius_range=[1, 10],   # Average radius of the generated polygons. Here it will either generate polygons with average radius 1, or 10
    n_range=[6, 8, 11],         # Number of points in the polygon. here it will either generate polygons with 6, 8 or 11 points
)

Then, we will take a look at the generated polygon :

import matplotlib.pyplot as plt
from cartesius.utils import print_polygon

def disp(*polygons):
    plt.clf()
    for p in polygons:
      print_polygon(p)
    plt.gca().set_aspect(1)
    plt.axis("off")
    plt.show()

polygon, labels = train_data[0]
disp(polygon)
print(labels)

The benchmark relies on various tasks : predicting the area of a polygon, its perimeter, its centroid, etc... (see the documentation for more details)

The goal of the benchmark is to write an encoder : a model that can encode a polygon's features into a vector.

After the feature vector is extracted from the polygon using the encoder, several heads (one per task) will predict the labels. If the polygon is well represented through the extracted features, the task-heads should have no problem predicting the labels.


The notebooks/ folder contains a notebook that implements a Transformer model, trains it on cartesius data, and evaluate it. You can use this notebook as a model for further research.

Note : At the end of the notebook, a file submission.csv is saved, you can use it for the Kaggle competition.

Contribute

To contribute, install the package locally, create your own branch, add your code/tests/documentation, and open a PR !

Unit tests

When you add some feature, you should add tests for it and ensure the previous tests pass :

python -m pytest -W ignore::DeprecationWarning

Linters & formatters

Your code should be linted and properly formatted :

isort . && yapf -ri . && pylint cartesius && pylint tests --disable=redefined-outer-name

Documentation

The documentation should be kept up-to-date. You can visualize the documentation locally by running :

mkdocs serve