Document data.json format for interface
avinashvarna opened this issue · 3 comments
Document the structure of the data.json or add a "How to add a new set of data" section to the README to make it easy to contribute new aligned audio + text data.
Currently data.json
is of the form,
{
'data': [
{
'key': 'text-used-to-refer-to-the-file',
'name': 'name displayed in the corpus list',
'audio_url': 'url that will be used as is (can still be relative',
'word_alignment': 'path to the file containing word alignment',
'sentence_alignment': 'path to the file containing sentence alignment, unused'
}, ...
]
}
There's scope for adding corpus details such as name
, description
etc in the top-level, (currently the name
is "deduced" in the flask file as the parent directory name of the data.json
(which is a bit clumsy).
Adding new data is basically equivalent to adding a new directory at the top level (besides others such as ramayana
, meghaduta
, and a data.json
file inside them.
I'll put this tentative information in a README later.
The entire process can be definitely made smoother, such as,
- Specifying a data directory which will contain corpora (instead of taking "parent directory of the server directory"
- Corpus name / Description (tweaking front-end for this display)
- Perhaps a way to go to "next" or "previous" corpus
PR #4 handles this for the most part
Closed via PR #4