INL/BlackLab

What are the elements correspondent to the contentViewable permission in every supported corpus file format?

Closed this issue · 3 comments

For example, CoNLL-U file, I guess the correspondent elements are "text", but for TEI, I'm not even able to guess.

I think there might be some confusion here. The corpusConfig.contentViewable in the .blf.yaml file controls whether the CORPUSNAME/docs/PID/contents operation succeeds or fails. It has no relation to the input document format you're using, so it works the same for CoNLL-U and TEI.

As for what is indexed in an annotated field (usually only one, named contents), that is of course specified in the annotatedFields section of the config file. For example, in the file tei-p5.blf.yaml, what words get indexed for contents is determined by the documentPath and containerPath, so for that file it would be //TEI//text.

Does that answer your question?

@jan-niestadt Thank you very much! the CORPUSNAME/docs/PID/contents operation gets extracted plain text or the formats such as CoNLL-U and TEI?

Checked: the formats such as CoNLL-U and TEI

Yes, BlackLab stores your input document and you can retrieve it at /contents. It can highlight the input document as well if you pass your query to that URL.