What are the elements correspondent to the contentViewable permission in every supported corpus file format?
Closed this issue · 3 comments
For example, CoNLL-U file, I guess the correspondent elements are "text", but for TEI, I'm not even able to guess.
I think there might be some confusion here. The corpusConfig.contentViewable
in the .blf.yaml
file controls whether the CORPUSNAME/docs/PID/contents
operation succeeds or fails. It has no relation to the input document format you're using, so it works the same for CoNLL-U and TEI.
As for what is indexed in an annotated field (usually only one, named contents
), that is of course specified in the annotatedFields
section of the config file. For example, in the file tei-p5.blf.yaml
, what words get indexed for contents
is determined by the documentPath
and containerPath
, so for that file it would be //TEI//text
.
Does that answer your question?
@jan-niestadt Thank you very much! the CORPUSNAME/docs/PID/contents
operation gets extracted plain text or the formats such as CoNLL-U and TEI?
Checked: the formats such as CoNLL-U and TEI
Yes, BlackLab stores your input document and you can retrieve it at /contents
. It can highlight the input document as well if you pass your query to that URL.