French Coreference Resolution System

French-CRS is a machine-learning based NLP framework for coreference resolution in French language. It is already trained using 25 syntactic/morphological features derived from ANCOR a French Oral Corpus. French-CRS has already pre-trained language models and it is ready to be incorporated for French text. It internaly uses other systems for mention and named entity detections. French-CRS is planned to be enriched by semantic features. This will let it be fitted for other tasks such as nomination detection in social media context.

Prerequisite:

Python 3.7+
Virtualenv (a useful resource for non-familiar ones is https://python.doctor/page-virtualenv-python-environnement-virtuel
Optional: DECOFRE (https://github.com/LoicGrobol/decofre/) for French mention detection.

Quick Installation Instruction

For a quick start:

Type at the root of your command prompt git clone https://github.com/mehdi-mirzapour/French-CRS or download the zip from github and unzip it
Create a new environment virtualenv -p /path/to/python3.7 env4fcrs and replace /path/to with your local path to python using which python command; if you are sure that your current python version is 3.7 you can simply use virtualenv env4fcrs
Source it source env4fcrs/bin/activate (remember you should do this every time you want to use French-CRS)
Change the directory to cloned folder cd French-CRS
Let the setup file install autumatically all the components pip install -e . (check out if the point sign "." is not accidently removed)
In case you want to use Jupyter notebook, you can add your virtual environment to it by typing python -m ipykernel install --user --name=env4fcrs
Install the following language model for spaCy
```
python -m spacy download fr_core_news_md
```
Install an additional dependency https://github.com/LoicGrobol/decofre/
To deactivate the enviroment run deactivate in the commonad prompt or simply close the terminal

Running a pre-trained model pipelines in command line

Ensure you are in the virtualenv source env4fcrs/bin/activate
Modify the configuration file that exists in the root cloned folder.
Run the crs-resolver crs-resolver --text "...". You can get more information by crs-resolver --help.

Running a pre-trained model pipelines in jupyter notebook

Ensure you are in the virtualenv source env4fcrs/bin/activate
Ensure you have run before python -m ipykernel install --user --name=env4fcrs at the command prompt
Type at command prompt jupyter notebook
Open "demo" folder and click on the file "Text2CoreferenceChains_Stanza_spaCy.ipynb"

Training a new model in jupyter notebook

Ensure you are in the virtualenv source env4fcrs/bin/activate
Ensure you have run before python -m ipykernel install --user --name=env4fcrs at the command prompt
Type at command prompt jupyter notebook
Open "demo" folder and click on the file "fast_Model_ANCOR_Training.ipynb"

Downloading ANCOR and training with it

ANCOR can be downloaded here: http://www.info.univ-tours.fr/~antoine/parole_publique/

Notice: Downloading ANCOR corpus is not mandatory for running the CRS system. Pre-trained models are already introduced in the "pre-trained language models" folder. The jupyter notebook "/demo/Text2CoreferenceChains.ipynb" also describes how to actually integrate them.

Citations

@inproceedings{desoyer2016coreference,
  title={Coreference Resolution for French Oral Data: Machine Learning Experiments with ANCOR},
  author={D{\'e}soyer, Ad{\`e}le and Landragin, Fr{\'e}d{\'e}ric and Tellier, Isabelle and Lefeuvre, Ana{\"\i}s and Antoine, Jean-Yves and Dinarelli, Marco},
  booktitle={International Conference on Intelligent Text Processing and Computational Linguistics},
  pages={507--519},
  year={2016},
  organization={Springer}
}

@inproceedings{muzerelle:hal-01075679,
  TITLE = {{ANCOR\_Centre, a Large Free Spoken French Coreference Corpus:  description of the Resource and Reliability Measures}},
  AUTHOR = {Muzerelle, Judith and Lefeuvre, Ana{\"i}s and Schang, Emmanuel and Antoine, Jean-Yves and Pelletier, Aurore and Maurel, Denis and Eshkol, Iris and Villaneau, Jeanne},
  BOOKTITLE = {{LREC'2014, 9th Language Resources and Evaluation Conference.}},
  PAGES = {843-847},
  YEAR = {2014}
}

License

French-CRS is BSD-licensed, as found in the LICENSE file.

mehdi-mirzapour/French-CRS