Tests fail using BeautifulSoup4 v4.12.2
Vekhir opened this issue · 2 comments
Hi,
I recently upgraded bs4 from 4.11.2 to 4.12.2 and now pycaption fails during testing.
I'm not exactly sure where the issue is, so I'm starting here and maybe you can reproduce it or give some insight into the kind of error.
Can you verify the issue?
Additional information
Going through the bs4 changelog, I didn't notice any change related to this issue.
I have also rebuilt pycaption with bs4 4.11.2 and the tests ran through just fine. My distro doesn't package 4.12.0 or 4.12.1, but if necessary, I might do some bisecting.
Edit: The issue is changed behaviour in BeautifulSoup 4.12.1 with regards to Tags.prettify, which adds newlines to some outputs. The tests don't account for these newlines, which is why they fail. The most straight-forward solution is to adapt the tests and bump the requirements (See also #305). More details below.
System information
OS: Arch Linux
Kernel: 6.3.2-arch1-1
Python: 3.11.3-1
Pycaption: 2.1.0/2.1.1 (AUR package / manually updated to 2.1.1)
BeautifulSoup4: 4.11.2-3/4.12.2-1 (Official package)
Appendix 1 (full error log)
==> Starting check()...
============================= test session starts ==============================
platform linux -- Python 3.11.3, pytest-7.3.1, pluggy-1.0.0
rootdir: /build/python-pycaption/src/pycaption-2.1.0
plugins: lazy-fixture-0.6.3
collected 256 items
tests/test_base.py ....... [ 2%]
tests/test_dfxp.py ....................................... [ 17%]
tests/test_dfxp_conversion.py F
=================================== FAILURES ===================================
________________________ TestDFXPtoDFXP.test_conversion ________________________
self = <tests.test_dfxp_conversion.TestDFXPtoDFXP object at 0x7fbfff47abd0>
sample_dfxp_output = '<?xml version="1.0" encoding="utf-8"?>\n<tt xml:lang="en" xmlns="http://www.w3.org/ns/ttml" xmlns:tts="http://www.w3....00:00:36.200" region="bottom" style="default">\n <LAUGHING & WHOOPS!>\n </p>\n </div>\n </body>\n</tt>'
sample_dfxp = '<?xml version="1.0" encoding="utf-8"?>\n<tt xml:lang="en" xmlns="http://www.w3.org/ns/ttml"\n xmlns:tts="http://ww...00:00:36.200" region="bottom" style="default">\n <LAUGHING & WHOOPS!>\n </p>\n </div>\n </body>\n</tt>'
def test_conversion(self, sample_dfxp_output, sample_dfxp):
caption_set = DFXPReader().read(sample_dfxp)
results = DFXPWriter().write(caption_set)
assert isinstance(results, str)
> self.assert_dfxp_equals(sample_dfxp_output, results)
tests/test_dfxp_conversion.py:25:
_ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _
self = <tests.test_dfxp_conversion.TestDFXPtoDFXP object at 0x7fbfff47abd0>
first = '<?xml version="1.0" encoding="utf-8"?>\n<tt xml:lang="en" xmlns="http://www.w3.org/ns/ttml" xmlns:tts="http://www.w3....00:00:36.200" region="bottom" style="default">\n <LAUGHING & WHOOPS!>\n </p>\n </div>\n </body>\n</tt>'
second = '<?xml version="1.0" encoding="utf-8"?>\n<tt xml:lang="en" xmlns="http://www.w3.org/ns/ttml" xmlns:tts="http://www.w3....:00:36.200" region="bottom" style="default">\n <LAUGHING & WHOOPS!>\n </p>\n </div>\n </body>\n</tt>\n'
ignore_styling = False, ignore_spans = False
def assert_dfxp_equals(self, first, second,
ignore_styling=False,
ignore_spans=False):
first_soup = BeautifulSoup(first, 'lxml')
second_soup = BeautifulSoup(second, 'lxml')
if ignore_styling:
self._remove_styling(first_soup)
self._remove_styling(second_soup)
if ignore_spans:
self._remove_spans(first_soup)
self._remove_spans(second_soup)
self._trim_text(first_soup)
self._trim_text(second_soup)
> assert first_soup == second_soup
E AssertionError
tests/mixins.py:140: AssertionError
=========================== short test summary info ============================
FAILED tests/test_dfxp_conversion.py::TestDFXPtoDFXP::test_conversion - Asser...
!!!!!!!!!!!!!!!!!!!!!!!!!! stopping after 1 failures !!!!!!!!!!!!!!!!!!!!!!!!!!!
================== 1 failed, 46 passed, 18 warnings in 0.86s ===================
==> ERROR: A failure occurred in check().
So apparently, the issue is that starting with BeautifulSoup 4.12.1, Tag.prettify adds some newlines at the end. From the changelog:
* Tag.prettify() will now consistently end prettified markup with
a newline.
prettify is used in the DFXPWriter.write method, so the output changes, and the tests make a strict comparison where the added newline leads to failure.
Either the tests somehow ignore the newline/check both cases, or the test fixture adds the newline and the requirements go up to 4.12.1.
A PR for the latter case may follow shortly.