3.0.10 failing terminology extraction

Question

3.0.10 failing terminology extraction

dcram opened this issue 7 years ago · 4 comments

Version 3.0.10 de termsuite (sur linux) : en sortie, j’obtiens bien les fichiers json des résultats de l’extraction terminologique mais il m’affiche un message d’erreur aussi bien dans le panneau Progress que dans le panneau central qui est vide de tout résultat.

Answer 1 · 2018-10-05T06:18:45.000Z

Hi @dcram,

It seems that the latest version 3.0.10 has a different interface and options even from the ones in the documentation page of the GUI version.

I am trying to do bilingual alignment, but I don't see any results under 'Alignment Results' tab.
I also don't have '“Build context for SWT terms only” option for example and the user interface at http://termsuite.github.io/documentation/gui/#running-alignment is different from the current version. Please see screen shot.

Any idea how can I do bilingual alignment the right way?

Thanks,
MZ

Answer 2 · 2018-10-05T06:27:21.000Z

If I select the options available in the pipeline and try to do Align, I get this error:

Corpus IndexedCorpus[tetxt-en......] are not contextualized.

I think this is caused by the absence of SWT option in this version.

I see this piece of code in src/main/java/fr/univnantes/termsuite/api/BilingualAligner.java

82 | Preconditions.checkArgument(!contextualizedSwts.isEmpty(),
--
83 | "Corpus %s are not contextualized",

Answer 3 · 2018-10-05T10:22:00.000Z

Hi @mzeidhassan

In theory, you should not be able to run an alignment when requirements are not met.

Requirements are:

a source terminology extracted with term contexts,
a target terminology extracted with term contexts,
a bilingual source-to-target dictionary.

Your configuration parameters for terminology extraction (items 1 and 2) look good. I would suspect either an empty terminology, or a non-empty terminology with no extracted contexts...

Are you sure you can see multiple extracted terms in both your source and target terminologies ?

Answer 4 · 2018-10-09T06:34:34.000Z

Hi @dcram

I am simply using the 'wind energy' dataset. I have dictionaries storied in 'dicos' directory. The dictionaries are tab-delimited text files like 'en-fr.txt. The file format is as follows:

en fr
agreement accord
between entre
the la
community communauté
economic économique

Attached is a screen shot to show you what I am seeing. Do you see anything wrong?

Thanks again for your support!
Kind regards,
Mohamed