3.0.10 failing terminology extraction
dcram opened this issue · 4 comments
Hi @dcram,
It seems that the latest version 3.0.10 has a different interface and options even from the ones in the documentation page of the GUI version.
I am trying to do bilingual alignment, but I don't see any results under 'Alignment Results' tab.
I also don't have '“Build context for SWT terms only” option for example and the user interface at http://termsuite.github.io/documentation/gui/#running-alignment is different from the current version. Please see screen shot.
Any idea how can I do bilingual alignment the right way?
Thanks,
MZ
If I select the options available in the pipeline and try to do Align, I get this error:
Corpus IndexedCorpus[tetxt-en......] are not contextualized.
I think this is caused by the absence of SWT option in this version.
I see this piece of code in src/main/java/fr/univnantes/termsuite/api/BilingualAligner.java
82 | Preconditions.checkArgument(!contextualizedSwts.isEmpty(),
--
83 | "Corpus %s are not contextualized",
Hi @mzeidhassan
In theory, you should not be able to run an alignment when requirements are not met.
Requirements are:
- a source terminology extracted with term contexts,
- a target terminology extracted with term contexts,
- a bilingual source-to-target dictionary.
Your configuration parameters for terminology extraction (items 1 and 2) look good. I would suspect either an empty terminology, or a non-empty terminology with no extracted contexts...
Are you sure you can see multiple extracted terms in both your source and target terminologies ?
Hi @dcram
I am simply using the 'wind energy' dataset. I have dictionaries storied in 'dicos' directory. The dictionaries are tab-delimited text files like 'en-fr.txt. The file format is as follows:
en fr
agreement accord
between entre
the la
community communauté
economic économique
Attached is a screen shot to show you what I am seeing. Do you see anything wrong?
Thanks again for your support!
Kind regards,
Mohamed