French-CRS is a machine-learning based NLP framework for coreference resolution in French language. It is already trained using 25 syntactic/morphological features derived from ANCOR a French Oral Corpus. French-CRS has already pre-trained language models and it is ready to be incorporated for French text. It internaly uses other systems for mention and named entity detections. French-CRS is planned to be enriched by semantic features. This will let it be fitted for other tasks such as nomination detection in social media context.
-
Python 3.7+
-
Virtualenv (a useful resource for non-familiar ones is
https://python.doctor/page-virtualenv-python-environnement-virtuel
-
Optional: DECOFRE (
https://github.com/LoicGrobol/decofre/
) for French mention detection.
For a quick start:
-
Type at the root of your command prompt
git clone https://github.com/mehdi-mirzapour/French-CRS
or download the zip from github and unzip it -
Create a new environment
virtualenv -p /path/to/python3.7 env4fcrs
and replace/path/to
with your local path to python usingwhich python
command; if you are sure that your current python version is 3.7 you can simply usevirtualenv env4fcrs
-
Source it
source env4fcrs/bin/activate
(remember you should do this every time you want to use French-CRS) -
Change the directory to cloned folder
cd French-CRS
-
Let the setup file install autumatically all the components
pip install -e .
(check out if the point sign "." is not accidently removed) -
In case you want to use Jupyter notebook, you can add your virtual environment to it by typing
python -m ipykernel install --user --name=env4fcrs
-
Install the following language model for spaCy
python -m spacy download fr_core_news_md
-
Install an additional dependency https://github.com/LoicGrobol/decofre/
-
To deactivate the enviroment run
deactivate
in the commonad prompt or simply close the terminal
- Ensure you are in the virtualenv
source env4fcrs/bin/activate
- Modify the configuration file that exists in the root cloned folder.
- Run the crs-resolver
crs-resolver --text "..."
. You can get more information bycrs-resolver --help
.
- Ensure you are in the virtualenv
source env4fcrs/bin/activate
- Ensure you have run before
python -m ipykernel install --user --name=env4fcrs
at the command prompt - Type at command prompt
jupyter notebook
- Open "demo" folder and click on the file "Text2CoreferenceChains_Stanza_spaCy.ipynb"
- Ensure you are in the virtualenv
source env4fcrs/bin/activate
- Ensure you have run before
python -m ipykernel install --user --name=env4fcrs
at the command prompt - Type at command prompt
jupyter notebook
- Open "demo" folder and click on the file "fast_Model_ANCOR_Training.ipynb"
ANCOR can be downloaded here: http://www.info.univ-tours.fr/~antoine/parole_publique/
Notice: Downloading ANCOR corpus is not mandatory for running the CRS system. Pre-trained models are already introduced in the "pre-trained language models" folder. The jupyter notebook "/demo/Text2CoreferenceChains.ipynb" also describes how to actually integrate them.
@inproceedings{desoyer2016coreference,
title={Coreference Resolution for French Oral Data: Machine Learning Experiments with ANCOR},
author={D{\'e}soyer, Ad{\`e}le and Landragin, Fr{\'e}d{\'e}ric and Tellier, Isabelle and Lefeuvre, Ana{\"\i}s and Antoine, Jean-Yves and Dinarelli, Marco},
booktitle={International Conference on Intelligent Text Processing and Computational Linguistics},
pages={507--519},
year={2016},
organization={Springer}
}
@inproceedings{muzerelle:hal-01075679,
TITLE = {{ANCOR\_Centre, a Large Free Spoken French Coreference Corpus: description of the Resource and Reliability Measures}},
AUTHOR = {Muzerelle, Judith and Lefeuvre, Ana{\"i}s and Schang, Emmanuel and Antoine, Jean-Yves and Pelletier, Aurore and Maurel, Denis and Eshkol, Iris and Villaneau, Jeanne},
BOOKTITLE = {{LREC'2014, 9th Language Resources and Evaluation Conference.}},
PAGES = {843-847},
YEAR = {2014}
}
French-CRS is BSD-licensed, as found in the LICENSE file.