Dialect specifier breakage

Question

Dialect specifier breakage

kylebgorman opened this issue a year ago · 4 comments

    @pytest.mark.skipif(not can_connect_to_wiktionary(), reason="need Internet")
    def test_american_english_dialect_selection():
        # Pick a word for which Wiktionary has dialect-specified pronunciations
        # for both US and non-US English.
        word = "mocha"
        html_session = requests_html.HTMLSession()
        response = html_session.get(
            _PAGE_TEMPLATE.format(word=word), headers=HTTP_HEADERS
        )
        # Construct two configs to demonstrate the US dialect (non-)selection.
        config_only_us = config_factory(key="en", dialect="US | American English")
        config_any_dialect = config_factory(key="en")
        # Apply each config's XPath selector.
        results_only_us = response.html.xpath(config_only_us.pron_xpath_selector)
        results_any_dialect = response.html.xpath(
            config_any_dialect.pron_xpath_selector
        )
>       assert (
            len(results_any_dialect)  # containing both US and non-US results
            > len(results_only_us)  # containing only the US result
            > 0
        )
E       AssertionError: assert 2 > 2
E        +  where 2 = len([<Element 'li' >, <Element 'li' >])
E        +  and   2 = len([<Element 'li' >, <Element 'li' >])

tests/test_wikipron/test_config.py:202: AssertionError

Answer 1 · 2023-09-15T22:33:30.000Z

The breakage indicates that even with dialect selection enabled at US | American English you actually obtain all pronunciations. E.g. for this page used in the tests, we grab both elements under the Pronunciation header even though the latter does not match the dialect specification.

Answer 2 · 2023-09-15T22:33:43.000Z

This is currently blocking #509.

Answer 3 · 2023-09-15T22:57:05.000Z

Hi @jacksonllee sorry to bother, any intuitions about what's going on here? I suspect the failure of Latin to grab anything in #509 is related too.

Answer 4 · 2024-01-30T22:05:38.000Z

The issue seems to be that the dialect selector wants @class = "ib-content qualifier-content" but it's now just @class = "ib-content". I'll try this fix out and report back in a few days.