html5lib/html5lib-tests

adoption01.dat:187 expects 'unexpected-implied-end-tag-in-table-view'

dangibson opened this issue · 3 comments

At line 187 adoption01.dat has this test:

#data
<table><a>1<td>2</td>3</table>
#errors
(1,7): expected-doctype-but-got-start-tag
(1,10): unexpected-start-tag-implies-table-voodoo
(1,11): unexpected-character-implies-table-voodoo
(1,15): unexpected-cell-in-table-body
(1,30): unexpected-implied-end-tag-in-table-view
#document
| <html>
|   <head>
|   <body>
|     <a>
|       "1"
|     <a>
|       "3"
|     <table>
|       <tbody>
|         <tr>
|           <td>
|             "2"

html5lib-python doesn't give any such error - the error text doesn't exist in the entire html5lib-python source code. I'm wondering if the test is wrong?

html5lib-python has for a long time ignored the errors sections of the tests. Also, the actual contents of those lines is meaningless, per the pretty poor documentation of the test syntax:

[#errors] must be followed by one line per parse error that a conformant checker would return. It doesn't matter what those lines are, although they can't be "#document-fragment", "#document", "#script-off", "#script-on", or empty, the only thing that matters is that there be the right number of parse errors.

Someday they'll go back to being meaningful, though probably only insofar as the position is meaningful. (What error is thrown is actually a hard question, given it frequently makes sense to coalesce multiple parse errors into one.)

Do you know where that error came from? Without something actually throwing that error, how do we know the test is correct and html5lib-python is wrong vs html5lib-python is correct and the test is wrong?

I'm implementing my own parser and I'm using html5lib-tests for testing and looking at html5-python for some reference related to the tests, but since html5lib-python doesn't throw that error, well, where did that error come from? Was the test copied from somewhere else or just made up?

git-grep might be useful (I presume we had that error at some point!), otherwise look at https://github.com/nolanw/HTMLReader, which the test results were last based off. Dealing with what's correct is normally done by hand-executing the spec.

There's little impetuous to do anything about the strings the lines contain, given per the README for the tests, the contents of the line is meaningless — yes, ideally we'd have some useful string, but we don't, and nor do we have any way to ensure they stay current.