The goal of this repository is to
- convert the text in one column of every row in a CSV file to CoNNL using spaCy
- create a dictionary indicating which rows contain the keywords you are interested in.
Python 3.6 was used to create this project. It might work with older versions of Python.
A number of external modules need to be installed, which are listed in requirements.txt. Depending on how you installed Python, you can probably install the requirements using one of following commands:
pip install -r requirements.txt
Please run the following command to download external resources
bash install.sh
Step 1: conver text to CoNLL This can be achieved using the script at lib/convert_text_to_conll.py. Please call the following command for more information about how to use it.
python lib/convert_text_to_conll.py -h
Step 2: index based on keywords We use the lib/python keyword_index.py to obtain this information. Please call for more information:
python lib/keyword_index.py -h
The result is an index from a label to the CoNLL basenames that belong to this label.
Step 3: package it You can use
python lib/package_sample.py -h
- Marten Postma (m.c.postma@vu.nl)
This project is licensed under the Apache 2.0 License - see the LICENSE.md file for details