/soldMss

Sold manuscripts : scripts and data.

Primary LanguagePython

Sold manuscripts

This repository contains results of the manuscripts clustering.

Workflow

Using the JSON file, export.json, produced by the script extractor_json.py, we want to cluster all the entries and, doing so, know exactly how many manuscripts have been sold multiple times.

The entries are clustered are clustered based on their traits :

  • author
  • format
  • number of pages
  • price
  • date

Installation and use

If you want to cluster all the entries of export.json, try this :

* git clone https://github.com/katabase/soldMss.git
* cd soldMss
* python3 -m venv my_env
* source my_env/bin/activate
* pip install -r requirements.txt
* cd scripts
* python3 reconciliator_all.py ../export.json

Note that the output file of this clustering is available here.

Now you can try some data analysis, being in the scripts folder :

  • about the price with
python3 price.py
  • about the authors with
python3 author.py
  • about the number of sales of each manuscript with
python3 mss_list.py

All the results will be in the outputfolder.

Tests

You can test the script with :

python3 test.py

Credits

  • The scripts were created by Alexandre Bartz and Matthias Gille Levenson with the help of Simon Gabay.

Cite this repository

Alexandre Bartz, Simon Gabay, Matthias Gille Levenson, Ljudmila Petkovic and Lucie Rondeau du Noyer, Manuscript sale catalogues : clustering, Neuchâtel: Université de Neuchâtel, 2020, https://github.com/katabase/soldMss.

Licence

Licence Creative Commons
This work is licensed under a Creative Commons Attribution 4.0 International Licence.