philss/floki

Floki.find/2 support for array of css_selectors?

aegatlin opened this issue · 1 comments

Hello,

First of all, thank you for making Floki, I love this tool! <3

I want to use Floki.find/2 to find elements of more than one css-selector value, and I'm not sure how, nor if I should.

The problem I'm trying to solve is selecting a "whitelist" of html elements for further processing. For example, Floki.find(html_node, ["p", "a", "pre"]), and I get all the desired html elements back, in order. Do you have any thoughts on how I should/could approach this with Floki?

Notes

I saw that the type css_selector() can take an array of type Floki.Selector.t():

@type css_selector :: String.t() | Floki.Selector.t() | [Floki.Selector.t()]

and I found it here when inspecting the code:

@type t :: %__MODULE__{
id: String.t() | nil,
type: String.t() | nil,
classes: [String.t()],
attributes: [AttributeSelector.t()],
namespace: String.t() | nil,
pseudo_classes: [PseudoClass.t()],
combinator: Selector.Combinator.t() | nil
}

but I couldn't understand how to use it, or if it would solve my problem, since I'm used to just passing in a string for my css_selector.

This next bit is probably not helpful, but, for context, here's some iex output of me trying to pass an array:

iex(7)> parsed_html |> Floki.find("a")
[
  {"a", [{"href", "https://github.com/philss/floki"}], ["Github page"]},
  {"a", [{"href", "https://hex.pm/packages/floki"}], ["Hex package"]}
]
iex(8)> parsed_html |> Floki.find(["a"])
** (FunctionClauseError) no function clause matching in Floki.Finder.get_matches/3    
    
    The following arguments were given to Floki.Finder.get_matches/3:
    
        # 1
        %Floki.HTMLTree{
          node_ids: [13, 12, 11, 10, 9, 8, 7, 6, 5, 4, 3, 2, 1],
          nodes: %{
            1 => %Floki.HTMLTree.HTMLNode{
              attributes: [],
              children_nodes_ids: [2],
              node_id: 1,
              parent_node_id: nil,
              type: "html"
            },
            2 => %Floki.HTMLTree.HTMLNode{
              attributes: [],
              children_nodes_ids: [12, 3],
              node_id: 2,
              parent_node_id: 1,
              type: "body"
            },
            3 => %Floki.HTMLTree.HTMLNode{
              attributes: [{"id", "content"}],
              children_nodes_ids: [10, 8, 6, 4],
              node_id: 3,
              parent_node_id: 2,
              type: "section"
            },
            4 => %Floki.HTMLTree.HTMLNode{
              attributes: [{"class", "headline"}],
              children_nodes_ids: [5],
              node_id: 4,
              parent_node_id: 3,
              type: "p"
            },
            5 => %Floki.HTMLTree.Text{content: "Floki", node_id: 5, parent_node_id: 4},
            6 => %Floki.HTMLTree.HTMLNode{
              attributes: [{"class", "headline"}],
              children_nodes_ids: '\a',
              node_id: 6,
              parent_node_id: 3,
              type: "span"
            },
            7 => %Floki.HTMLTree.Text{
              content: "Enables search using CSS selectors",
              node_id: 7,
              parent_node_id: 6
            },
            8 => %Floki.HTMLTree.HTMLNode{
              attributes: [{"href", "https://github.com/philss/floki"}],
              children_nodes_ids: '\t',
              node_id: 8,
              parent_node_id: 3,
              type: "a"
            },
            9 => %Floki.HTMLTree.Text{
              content: "Github page",
              node_id: 9,
              parent_node_id: 8
            },
            10 => %Floki.HTMLTree.HTMLNode{
              attributes: [{"data-model", "user"}],
              children_nodes_ids: '\v',
              node_id: 10,
              parent_node_id: 3,
              type: "span"
            },
            11 => %Floki.HTMLTree.Text{ 
              content: "philss",
              node_id: 11,
              parent_node_id: 10
            },
            12 => %Floki.HTMLTree.HTMLNode{
              attributes: [{"href", "https://hex.pm/packages/floki"}],
              children_nodes_ids: '\r',
              node_id: 12,
              parent_node_id: 2,
              type: "a"
            },
            13 => %Floki.HTMLTree.Text{
              content: "Hex package",
              node_id: 13,
              parent_node_id: 12
            }
          },
          root_nodes_ids: [1]
        }
    
        # 2
        %Floki.HTMLTree.HTMLNode{
          attributes: [],
          children_nodes_ids: [2],
          node_id: 1,
          parent_node_id: nil,
          type: "html"
        }
    
        # 3
        "a"
    
    Attempted function clauses (showing 2 out of 2): 
    
        defp get_matches(tree, html_node, selector = %Floki.Selector{combinator: nil})
        defp get_matches(tree, html_node, selector = %Floki.Selector{combinator: combinator})
    
    (floki 0.31.0) lib/floki/finder.ex:67: Floki.Finder.get_matches/3
    (elixir 1.12.2) lib/enum.ex:3894: Enum.flat_map_list/2
    (floki 0.31.0) lib/floki/finder.ex:51: Floki.Finder.find_selectors/2
    (floki 0.31.0) lib/floki.ex:248: Floki.find/2

Hey @aegatlin 👋
This can be done by using the comma as a separator in a String.t(). Something like this:

Floki.find(html_node, "p, a, pre")

This is pretty common in CSS selectors and you can mix a bunch of selectors separated by comma.
The case of using [Floki.Selector.t()] is when you want to make complex selectors by hand. It's not a common use case though.

I hope this can help :)