anaskhan96/soup

Find by single class

ghandic opened this issue · 16 comments

Currently Find("a", "class", "message") would only work if it was <a class="message"></a> but would not work on <a class="message input-message"></a> even though they are both of class message.

Could this be added?

Good find. I'll look into this.

Now it is not working with multiple class items, as it was in the previous build.
In source html tags looks like this:

<div class="rasp_place s-place"/>

And I've searched them with this code:

block.Find("div", "class", "rasp_place s-place")

In current build my old code doesn't work anymore, only this works:

block.Find("div", "class", "rasp_place")

Also I think that you should revert to the previous logic as now it became not obvious:

<li class="btn_rasp inactive">

In current build all items with class btn_rasp will be selected, but I don't need items with class inactive. And I have no option to not include them, only to filter via attributes map.

The only way to find them in the current version is like this:

attrs := showtimeBlock.Attrs()["class"]
if strings.Contains(attrs, "inactive") {
	continue
}

@Salmondx You have a point. The best way right now is to revert the commit and look into the issue from another angle.

However, I feel like we should first discuss the significance of finding by a single class, and then proceed from there.

This is meant to emulate BeautifulSoup so I guess thats the importance 😄

That may true but as I say it’s trying to emulate it in go so the API should be similar

@anaskhan96 do you have any ideas? I could make a PR if we discuss the API

I looked across other scrapers' implementation and concluded that the Find function should remain as it is. This package was started to emulate the interface of BeautifulSoup (similar function names, conjoined function calls) as mentioned in the README, not how its functions work internally, so I'll stick with that.

However, this issue shall remain open for further discussion and input from other users.

May be we should make a Select method with the desired functionality? Like in the link above.

It's a good idea, although we would have to discuss ways to parse the different kinds of string that Select would receive as an argument.

I think that we should use a string as before:

soup.Select("div", "class", "class-1 class-2 class-3")

With strict matching

I think we should go about adding FindStrict and FindAllStrict which do the job, while retaining the interface similar to Find and FindAll.

soup.FindStrict("div", "class", "class-1 class-2 class-3")

Let's reserve Select for a possible future implementation of BeautifulSoup's select.

Ok, sounds good!