INL/OpenConvert

Can OpenConvert convert plain text file to TEI?

vanabel opened this issue · 5 comments

I just clone it to a dir, and run the command
java -jar OpenConvert.jar -from text -to TEI test/test.txt test
where test.txt with a single sentence:
Just a test
But it output errors:
Could not find conversion from text to TEI
Did I do something wrong?

Or can you get some example of text files, which will convert to TEI properly.

You can do the conversion to TEI online here: http://openconvert.clarin.inl.nl/openconvert/tagger/ui#file

(you need a CLARIN account, which you should be able to get here: https://user.clarin.eu/user/register)

I didn't develop this code, so I'm not sure about the commandline tool, sorry.

@jan-niestadt Thanks, so If I want to build my-self corpus, How can I combine multi TEI into one? I mean, in practice, I would like to add one sentence containing a key word in plain text format each time (which can be converted to TEI by the tools as you mentioned above), then upload the TEI to my Black Lab-server such that it can be queried by the user. It will be useful for scientific writing, since then I can query by key word.

Hello all, sorry to catch up only today

  • The right command line for conversion from txt to TEI is (txt not text)
    java -jar OpenConvert.jar -from txt -to TEI test/test.txt test/test.tei
  • For use with blacklab, (only available in the online version), it is best to enable the tokenizer in OpenConvertWeb
  • To combine TEI files, there is no special tool. The element (teiCorpus http://www.tei-c.org/release/doc/tei-p5-doc/en/html/ref-teiCorpus.html) may contain an arbitrary number of TEI elements containing documents. It also requires a corpus header, but for blacklab indexing, is should be sufficient to start with <teiCorpus>, then cat all the individual files, and then end the teiCorpus element.

Currently, I grub the data (submit text, and output tei) from the OpenConvert. Since the site may change, I want to have a local version of it, that means, I need a similar function of convert plain text to TEI format. I have noted that you have provided openconvert.client.jar, did it design for this? (In fact, I can't execute it on my server, did it need this openconvert git project?)