data required for training
Siddhijain16 opened this issue · 1 comments
Siddhijain16 commented
Cabir40 commented
Hi @Siddhijain16,
To Prepare CoNLL dataset, it is usually necessary to follow the steps below
-
Data in csv, tsv, txt, conll .etc format is read and analyzed (txt format will be easier to work with)
-
Data should consist of 4 columns
- The first column contains Token (word)
- The second column contains the PartOfSpeach (pos_tag)
- The third column contains chunk_tag
- The last column is the label column and contains the NER_tag
-
If your working data has only token and label (word and Ner_tag) columns, you will need to add the other two columns yourself. The values to add to these two columns can also be -NN- and -O- labels.
-
Here is some important information for the ConNNL format:
-
You can save the conll in txt format and read it via CoNLL().readDataset()