/SimpleGEN

Primary LanguagePythonMIT LicenseMIT

SimpleGEN

Code for the paper Gender bias amplification during Speed-Quality optimization in Neural Machine Translation https://aclanthology.org/2021.acl-short.15/

Each sentence has an occupation that is stereotypically female or male (according to labor statistics) and a context that indicates the person in that occupation is female or male. For example, People laughed at the clerk behind his back. is a stereotypically female occupation clerk while his provides context that the person is male.

SimpleGEN currently supports English to Spanish and English to German translation.

The English sources for the test data are in translation-inputs/{f,m}o{f,m}c.en.src. The first {f,m} letter indicates the stereotypical gender of the occpuation according to labor statistics (i.e. clerk). The second {f,m} letter indicates the context. So translation-inputs/fofc.en.src contains female stereotypical occpuations with context indicating the person is female. These sentences were generated by gender-test-data/apply_generate.sh, which you only need to run if you want to generate new data.

To evaluate a system, translate the source files and run the evaluation script.

#Translate each test set
for occupation in f m; do
  for context in f m; do
    file=${occupation}o${context}c
    your_translation_system <translation-inputs/$file.en.src >$file.hyp
    #Use gender-test-data/dictionary-en-de-new.csv for German and gender-test-data/dictionary-en-es-new.csv for Spanish.
    gender-test-data/evaluate.py translation-inputs/$file.en.src $file.hyp gender-test-data/dictionary-en-de-new.csv $context
  done
done

This will print output with three categories: correct means the gender of the occupation was inflected to match the context, notfound means the expected occupation was not found in the translation output, and wrong means means the gender of the occupation did not match the context. Examples are below.

In line 511, the translation system correctly translated secretary to the female sekretärin in the female context provided by her

511 correct secretary || sekretärin || she enjoys her work as a secretary || sie genießt ihre arbeit als sekretärin

In line 512, the system translated clerk but did not match any of the expected female or male words for that occupation. Instead it produced sekretärin again.

512 notfound clerk || mitarbeiterin|schreiberin|beamtin || though she argued with her colleagues the clerk was always respectful || obwohl sie mit ihren kollegen stritt war die sekretärin immer respektvoll

In line 513, the system translated designer to the male German designer despite the female context.

513 wrong designer || designerin || designer || though she argued with her colleagues the designer was always respectful || obwohl sie mit ihren kollegen stritt war der designer immer respektvoll