MediaWikiScanner#docElements() raises unexpected ParseException when encountering <D_TABLE_CAPTION> tokens
Closed this issue · 12 comments
*What steps will reproduce the problem?
1. Process the page provided as an attachment
*What is the expected output? What do you see instead?
We were not able to locate precisely the cause of this error therefore we were
not able to patch it.
We still found out a couple things listed below :
- The grammar reference file doesn't mention Table captions anywhere.
- The getTABLE_CAPTION() method isn't called at all in the MediaWikiScanner
class
- In the docElements() method, when encountering a D_TABLE_CAPTION token, we
don't step in
the table() method. This is probably why the token isn't consumed, leading to
the call to
jj_consume_token(-1) that generates the exception.
*What version of the product are you using? On what operating system?
We use the latest version checked out from the SVN repository. We use it on MAC
OSX Snow
Leopard with the 1.6 JVM.
*Please provide any additional information below.
The input data is an extract of the french Wikipedia export collected through
MWDumper. We use
the WEM component of your project in order to generate a CAS data structure to
be supplied to
the Apache UIMA framework. We encounter the problem with a particular page
supplied as an
attachement.
Original issue reported on code.google.com by Maxime.B...@gmail.com
on 4 Jun 2010 at 3:31
Attachments:
A much simpler example, causing the same bug
Original comment by mki...@portolancs.com
on 5 Aug 2010 at 10:01
Attachments:
This is the Exception thrown, when parsing the "simple_table_with_caption.txt"
example
Original comment by mki...@portolancs.com
on 5 Aug 2010 at 10:03
Attachments:
This patch fixes the support for the D_TABLE_CAPTION token problem.
A JUnit test is also included.
After applying the patch you have to run "RebuildScanners.launch" in order
to rebuild all the Scanners.
Original comment by mki...@portolancs.com
on 5 Aug 2010 at 10:07
Attachments:
Thanks a lot Maxime, I will test your patch right now.
Original comment by thomas.m...@gmail.com
on 5 Aug 2010 at 12:18
/me is not called Maxime, but You're welcome ;-)
Original comment by mki...@portolancs.com
on 5 Aug 2010 at 12:23
Indeed i only looked at the first name and tough is one the same one in each
message. Sorry about that ;)
Original comment by thomas.m...@gmail.com
on 5 Aug 2010 at 12:30
Patch applied and committed without any modification. Thanks.
Original comment by thomas.m...@gmail.com
on 5 Aug 2010 at 12:31
- Changed state: Accepted
Original comment by thomas.m...@gmail.com
on 5 Aug 2010 at 12:35
- Changed state: Fixed
Sorry, but my patch was 'cross project'.
Could you please check, that 'MediaWikiParserTest' is in the correct project.
I think it's better to put it into "org.wikimodel.wem.test".
Original comment by mki...@portolancs.com
on 5 Aug 2010 at 12:51
All the tests actually are in org.wikimodel.wem since a very long time now, I
think org.wikimodel.wem.test is more a leftover. Also theses unit test are here
to validate the parser which is in org.wikimodel.wem so i don't see what could
bring to put it in org.wikimodel.wem.test.
Original comment by thomas.m...@gmail.com
on 5 Aug 2010 at 12:59
All right. Thanks.
Original comment by mki...@portolancs.com
on 5 Aug 2010 at 1:15
<offtopic>
Hi Thomas,
I've made one more patche for wikimodel and put them into an issue.
I've even some more in my workspace (ex. inline macro support) but
its getting harder to separate them. And I want to avoid huge monster patches.
Do you have any chance to commit them?
Thanks Martin
</offtopic>
Original comment by mki...@portolancs.com
on 10 Aug 2010 at 11:22