/labe

Merging citations with catalog data.

Primary LanguagePythonMIT LicenseMIT

LABE

Merging citations and catalog data at SLUB Dresden.

Status: testing

Project

The project is composed of a couple of command line tools, both written in Python and Go.

  • ckit, citation toolkit contains an API server, plus a few command line tools
  • python, orchestration helper to assemble data files regularly (based on luigi)

Meeting Minutes

Project structure

$ tree -d
.
├── ansible
│   └── roles
│       ├── common
│       │   └── tasks
│       └── labe
│           ├── defaults
│           ├── tasks
│           └── templates
├── data
├── extra
│   └── perfstats
├── go
│   └── ckit
│       ├── cache
│       ├── cmd
│       │   ├── doisniffer
│       │   ├── labed
│       │   ├── makta
│       │   └── tabjson
│       ├── doi
│       ├── fixtures
│       ├── packaging
│       │   └── deb
│       │       └── ckit
│       │           └── DEBIAN
│       ├── set
│       ├── static
│       ├── tabutils
│       ├── testdata
│       └── xflag
├── notes
│   └── 2022_01_30_performance_report_files
├── python
│   ├── labe
│   ├── packaging
│   │   └── deb
│   │       └── labe
│   │           └── DEBIAN
│   └── tests
└── static

40 directories

SLOC

$ tokei -C -t=Go,Python,yaml
===============================================================================
 Language            Files        Lines         Code     Comments       Blanks
===============================================================================
 Go                     19         2668         2136          341          191
 Python                 16         2606         2100          127          379
 YAML                    4          226          184           22           20
===============================================================================
 Total                  39         5500         4420          490          590
===============================================================================

Ideas

  • stats on combined oci, refcat graph; notes
  • stats on combined oci, openalex (mag), refcat graph
  • include "cited by" count in documents; may need a separate mapping database (with about 70M rows) for (doi, cited by count) -- could also be a COUNT on oci, but may want to have separate lookup table for performance (e.g. result would be just an int; db is about 4GB)

Misc

A data web service, lightning talk on the Go side of things at Leipzig Gophers 2021-11-23