title for the pybor package (and paper)

Question

title for the pybor package (and paper)

LinguList opened this issue 4 years ago · 2 comments

I think we should make it clearer what the whole pybor or however we decide calling it will be titled. My proposal is: "A Python library for the identification of borrowed words with help of lexical language models".

The major point is: a "lexical language model" is in fact what we are doing here.

Question to @fractaldragonflies and @tresoldi: is it a "lexical language model" (as it would seem to me, since the models model lexical items), or is there another common term?

In any case, the major asset of this library is: take data, make a language model (even if it is just an SVM), use it to classify data.

Answer 1 · 2020-05-11T16:44:33.000Z

Name sounds good.

Even my direct neural model is lexically based, although I did include the borrowability score of WOLD in my model as well. My other Markov and neural net models were not just lexically based, but entropy based (from lexical forms) as well.

We are also evaluating our performance or the language performance (depending on perspective). So I would like to add that to your brief summary:

the major asset of this library is: take data, make a language model (even if it is just an SVM), use it to classify data,
adding...
evaluate and analyze language model performance.

The evaluation (whether individual reports, cross language reports, or k-fold cross validation reports are important for a user to know how good is the model and predictions.

The analysis (graphical) gives a first look from within the application as to what might be going on. Of course the user can do something more elaborate or prettier beyond this first look.

Answer 2 · 2020-05-11T17:28:32.000Z

Okay. I'd say: all graphical outputs should be done in examples/ or in some tutorial/. They can also be done in notebook style. Plotting is hard to test, will blow up the dependencies, and is better handled individually. For cross-validation, it would be good to have an implementation using only the dev-data, tested there. Should also be generic: you pass a model and the parameters to the evaluator function.