termsuite/termsuite-core

Error while looking for alignments

Closed this issue · 11 comments

Hello Damien,

I'm doing the evaluation using GUI and for some terms an error occurs.
It doesn't explain what went wrong.

ttc_problem
The error occured when treating "résine" (French). The alignments in the screenshot are made for some other term. The system didn't find any translations for "résine".

I checked the data and found that English term "resin" is in the term list:
ttc_problem_2

For some other terms, the system suggests correct translations, so it is not a general issue.

Thank you in advance!

Sincerely yours,
Yuliya

dcram commented

Hi Yuliya,

You should find the detailed stack trace in the logs. termsuite-*.log logs can be found in your TermSuite install dir, but most of the time, the actual stack trace is only found in another log file located at workspace/.metadata/.log in you TermSuite install (I don't really understand why).

Don't be scared if you see many stack traces in that last log file. Most of them are Eclipse RCP silent bugs, i.e. not TermSuite bugs.

Can you tell me if you find the error stack trace in these files ?

Thx

Aha, I wouldn't guess to look there :)
You are right, it's a Java error :
!ENTRY fr.univnantes.termsuite.ui 4 0 2016-11-23 11:18:08.196
!MESSAGE An error occurred during alignment
!STACK 0
java.lang.NullPointerException
at eu.project.ttc.utils.AlignerUtils.translateVector(AlignerUtils.java:75)
at eu.project.ttc.engines.BilingualAligner.translateWithDico(BilingualAligner.java:326)
at eu.project.ttc.engines.BilingualAligner.alignDistributional(BilingualAligner.java:150)
at eu.project.ttc.engines.BilingualAligner.align(BilingualAligner.java:219)
at fr.univnantes.termsuite.ui.services.impl.AlignmentServiceImpl.align(AlignmentServiceImpl.java:213)
at fr.univnantes.termsuite.ui.menu.AlignHandler$1.run(AlignHandler.java:71)
at org.eclipse.core.internal.jobs.Worker.run(Worker.java:55)

dcram commented

Ok, That is a bug. Tank you for reporting. Is it a bloking issue for you ?

I'll try to fix it soon anyway.

Damien

While comparing my test pipeline and the current one, I found only one difference: in the current (bugging) one, I enabled MWT in contexts. Can it cause the problem? Moreover, it seems that I used "SWT only" and "allow MWT" along :/

dcram commented

Yes it might be, but I have to further investigate and try to reproduce the error before I can tell definitely.

Regarding the contextualizer options, some of them will be depreciated because they are error prone and not efficient at all. Actually, everything regarding MWTs have been tested and proved not relevant for contexts. Please configure you contextualizer this way:

  • Do not allow MWT in contexts,
  • Do not compute context for MWT, i.e. compute contexts for SWT only.

The problem occured in 26 cases of 50, so it was quite an important issue. I fixed it by two steps:

  1. Modify the pipeline, ("SWT only" OR "allow MWT" option selected)
  2. Run on one corpus at a time (before, I've launched two corpora together)

Hope it was just a user behaviour bug :) I'm not really sure whether it was the fact of running two corpora together or combining two options.

Ooops, sorry. I closed the issue. You might still want to fix it.

dcram commented

I am pretty sure that the MWT configs were the issue.

In principle, running multiple corpora together should not affect anything.

dcram commented

I tried to have a look at it, but I could not find a significant instruction at line 75 in AlignerUtils (cf. your stack trace above) in termsuite-core-2.3.3.jar.

Could you please give me the exact version of TermSuite your are using ?

Thanks !

Hello! This is the info from the GUI:
Current version: fr.univnantes.termsuite.ui_2.3.1.201610061343 [123]

dcram commented

Fixed in TermSuite 3.0.

http://termsuite.github.io/#gui

Bilingual alignment has been improved and now support compound terms, multi-word term of size > 2, and neoclassical terms.

Best,

Damien