muety/website-watcher

Add possibility to exclude HTML tags from beeing monitored

Opened this issue · 4 comments

A feature request would be to exclude certain HTML tags from beeing monitored, for example when checking I might want <script> tags to be ignored.

Or even when filtering <div> I might want to ignore another nested <div> with a different class or id.

muety commented

First solution that comes to my mind is a separate config file (something like blacklist.txt or so) consisting of a set of XPath queries addressing elements to exclude.

Something like: watcher.py -u https://github.com --ignore-elements blacklist.txt with blacklist.txt:

//script
//hr
/body/div[@class="some-list"]/div[@class="useless-element"]

What do you think? Do you have any preference about how to realize this feature?

I think, that's a good idea. You could also include it in the example/many.json file. One single XPath to use for diff and then a list of XPath to ignore.

muety commented

Good point, shouldn't forget about that.

An alternative to having a separate file would be to simply allow multiple --ignore parameters. I think I like that option better.

Hope I can work on this soon. If you have some time, of course, feel free to contribute!