Dialect specifier breakage
kylebgorman opened this issue · 4 comments
kylebgorman commented
Investigate this breakage:
@pytest.mark.skipif(not can_connect_to_wiktionary(), reason="need Internet")
def test_american_english_dialect_selection():
# Pick a word for which Wiktionary has dialect-specified pronunciations
# for both US and non-US English.
word = "mocha"
html_session = requests_html.HTMLSession()
response = html_session.get(
_PAGE_TEMPLATE.format(word=word), headers=HTTP_HEADERS
)
# Construct two configs to demonstrate the US dialect (non-)selection.
config_only_us = config_factory(key="en", dialect="US | American English")
config_any_dialect = config_factory(key="en")
# Apply each config's XPath selector.
results_only_us = response.html.xpath(config_only_us.pron_xpath_selector)
results_any_dialect = response.html.xpath(
config_any_dialect.pron_xpath_selector
)
> assert (
len(results_any_dialect) # containing both US and non-US results
> len(results_only_us) # containing only the US result
> 0
)
E AssertionError: assert 2 > 2
E + where 2 = len([<Element 'li' >, <Element 'li' >])
E + and 2 = len([<Element 'li' >, <Element 'li' >])
tests/test_wikipron/test_config.py:202: AssertionError
kylebgorman commented
The breakage indicates that even with dialect selection enabled at US | American English
you actually obtain all pronunciations. E.g. for this page used in the tests, we grab both elements under the Pronunciation
header even though the latter does not match the dialect specification.
kylebgorman commented
This is currently blocking #509.
kylebgorman commented
Hi @jacksonllee sorry to bother, any intuitions about what's going on here? I suspect the failure of Latin to grab anything in #509 is related too.
kylebgorman commented
The issue seems to be that the dialect selector wants @class = "ib-content qualifier-content"
but it's now just @class = "ib-content"
. I'll try this fix out and report back in a few days.