dgtlmoon/changedetection.io

Filters - Remove elements doesnt work when there are multiple indexes in the CSS (or xPath), causes to elements shift their index and not be removed.

Closed this issue ยท 6 comments

Describe the bug
Already mentioned in the discussion #2710

tldr to save people from having to compile this issue in their heads, it basically means he just only wants to see the first column of data

Since 0.47.03 the feature "remove elements" doesn't seems to work correctly anymore. I use the following setting to remove two columns of a table which is present on the page. It seems like, that it only removes the elements from column 2 and not from column 3.

body > table > tbody > tr:nth-child(1) > th:nth-child(2)
body > table > tbody > tr:nth-child(2) > td:nth-child(2)
body > table > tbody > tr:nth-child(3) > td:nth-child(2)
body > table > tbody > tr:nth-child(1) > th:nth-child(3)
body > table > tbody > tr:nth-child(2) > td:nth-child(3)
body > table > tbody > tr:nth-child(3) > td:nth-child(3)

On the example page it should remove "Person 2" and "Person 3". Sadly only "Person 2" is removed.
Example page: https://test-changedetection.tiiny.site/
Here you can import my settings: https://changedetection.io/share/f3H3x5x7b3Aa

Version
First version I noticed this bug: v0.47.03
Still present in version: v0.47.04

To Reproduce

Steps to reproduce the behavior:

  1. Import https://changedetection.io/share/f3H3x5x7b3Aa
  2. Click on "Preview/History" and see that column 3 / Person 3 isn't removed.

Expected behavior
All mentioned elements should be removed. In my example columns "Person 2" and "Person 3" should be removed.

Screenshots
Here you can see, that "Person 3" isn't removed
image
image

Desktop (please complete the following information):

  • OS: Windows 11
  • Browser: Chrome
  • Version: 130.0.6723.70

Smartphone (please complete the following information):

  • Device: [e.g. iPhone6]
  • OS: [e.g. iOS8.1]
  • Browser [e.g. stock browser, safari]
  • Version [e.g. 22]

Additional context

I dont think this ever worked, its nothing todo with 0.47.03

#!/usr/bin/python3

from bs4 import BeautifulSoup
import requests

# Fetch the HTML content
html_content = requests.get("https://test-changedetection.tiiny.site/").text
print (f"Before\n-----------\n{html_content}\n--------------\n\n")
# Parse the HTML with BeautifulSoup
soup = BeautifulSoup(html_content, "html.parser")

# CSS selector string (corrected with proper quotes)
css_selector = ("body > table > tbody > tr:nth-child(1) > th:nth-child(2), "
                "body > table > tbody > tr:nth-child(2) > td:nth-child(2), "
                "body > table > tbody > tr:nth-child(3) > td:nth-child(2), "
                "body > table > tbody > tr:nth-child(1) > th:nth-child(3), "
                "body > table > tbody > tr:nth-child(2) > td:nth-child(3), "
                "body > table > tbody > tr:nth-child(3) > td:nth-child(3)")

# Remove selected elements
for item in soup.select(css_selector):
    item.decompose()

# Print the modified HTML content
print(str(soup))

you can see the before and after output is the same, this code hasnt been changed since 2022

instead of removing the elements, just use table tr > *:nth-child(1) to select the first column

https://test-changedetection.tiiny.site/ there is no tbody here!!

body > table > tbody > tr:nth-child(1) > th:nth-child(2)
body > table > tbody > tr:nth-child(2) > td:nth-child(2)
body > table > tbody > tr:nth-child(3) > td:nth-child(2)
body > table > tbody > tr:nth-child(1) > th:nth-child(3)
body > table > tbody > tr:nth-child(2) > td:nth-child(3)
body > table > tbody > tr:nth-child(3) > td:nth-child(3)
<!DOCTYPE html>
<html><head><script defer data-domain="test-changedetection.tiiny.site" src="https://analytics.tiiny.site/js/plausible.js"></script></head>
<style>
table, th, td {
  border:1px solid black;
}
</style>
<body>

<h2>TH elements define table headers</h2>

<table style="width:100%">
  <tr>
    <th>Person 1</th>
    <th>Person 2</th>
    <th>Person 3</th>
  </tr>
  <tr>
    <td>Emil</td>
    <td>Tobias</td>
    <td>Linus</td>
  </tr>
  <tr>
    <td>16</td>
    <td>14</td>
    <td>10</td>
  </tr>
</table>

<p>To understand the example better, we have added borders to the table.</p>

</body>
</html>

instead of a list of selectors with child(n), can you do it on one line and tell me if it works?

body > table > tbody > tr:nth-child(1) > th:nth-child(2), body > table > tbody > tr:nth-child(2) > td:nth-child(2), body > table > tbody > tr:nth-child(3) > td:nth-child(2), body > table > tbody > tr:nth-child(1) > th:nth-child(3), body > table > tbody > tr:nth-child(2) > td:nth-child(3), body > table > tbody > tr:nth-child(3) > td:nth-child(3)

https://github.com/dgtlmoon/changedetection.io/blob/master/changedetectionio/html_tools.py#L71-L80

Yes, this is working as expected. ๐Ÿ‘

I dont think this ever worked, its nothing todo with 0.47.03

#!/usr/bin/python3

from bs4 import BeautifulSoup
import requests

# Fetch the HTML content
html_content = requests.get("https://test-changedetection.tiiny.site/").text
print (f"Before\n-----------\n{html_content}\n--------------\n\n")
# Parse the HTML with BeautifulSoup
soup = BeautifulSoup(html_content, "html.parser")

# CSS selector string (corrected with proper quotes)
css_selector = ("body > table > tbody > tr:nth-child(1) > th:nth-child(2), "
                "body > table > tbody > tr:nth-child(2) > td:nth-child(2), "
                "body > table > tbody > tr:nth-child(3) > td:nth-child(2), "
                "body > table > tbody > tr:nth-child(1) > th:nth-child(3), "
                "body > table > tbody > tr:nth-child(2) > td:nth-child(3), "
                "body > table > tbody > tr:nth-child(3) > td:nth-child(3)")

# Remove selected elements
for item in soup.select(css_selector):
    item.decompose()

# Print the modified HTML content
print(str(soup))

you can see the before and after output is the same, this code hasnt been changed since 2022

That is strange as I'm using the mentioned setting for "remove elements" since a few years for another webpage. After I updated to "0.47.03" the issue started. I think I updated from "0.46.04" to "0.47.03"

I'm fine to rewrite my settings. We could close the issue from my side.

I'm fine to rewrite my settings. We could close the issue from my side.

no please dont close it - there is a PR running

That is strange as I'm using the mentioned setting for "remove elements" since a few years for another webpage. After I updated to "0.47.03" the issue started. I think I updated from "0.46.04" to "0.47.03"

that's because it now reprocesses every check even if the checksum of the content is the same i think

I just installed the newest release. It's working great.
Thank you for this fast fix of this issue ๐Ÿ‘