Floki.find/2 support for array of css_selectors?
aegatlin opened this issue · 1 comments
Hello,
First of all, thank you for making Floki, I love this tool! <3
I want to use Floki.find/2
to find elements of more than one css-selector value, and I'm not sure how, nor if I should.
The problem I'm trying to solve is selecting a "whitelist" of html elements for further processing. For example, Floki.find(html_node, ["p", "a", "pre"])
, and I get all the desired html elements back, in order. Do you have any thoughts on how I should/could approach this with Floki?
Notes
I saw that the type css_selector()
can take an array of type Floki.Selector.t()
:
Line 76 in a30a7b8
and I found it here when inspecting the code:
Lines 19 to 27 in f1f6fa5
but I couldn't understand how to use it, or if it would solve my problem, since I'm used to just passing in a string for my css_selector.
This next bit is probably not helpful, but, for context, here's some iex output of me trying to pass an array:
iex(7)> parsed_html |> Floki.find("a")
[
{"a", [{"href", "https://github.com/philss/floki"}], ["Github page"]},
{"a", [{"href", "https://hex.pm/packages/floki"}], ["Hex package"]}
]
iex(8)> parsed_html |> Floki.find(["a"])
** (FunctionClauseError) no function clause matching in Floki.Finder.get_matches/3
The following arguments were given to Floki.Finder.get_matches/3:
# 1
%Floki.HTMLTree{
node_ids: [13, 12, 11, 10, 9, 8, 7, 6, 5, 4, 3, 2, 1],
nodes: %{
1 => %Floki.HTMLTree.HTMLNode{
attributes: [],
children_nodes_ids: [2],
node_id: 1,
parent_node_id: nil,
type: "html"
},
2 => %Floki.HTMLTree.HTMLNode{
attributes: [],
children_nodes_ids: [12, 3],
node_id: 2,
parent_node_id: 1,
type: "body"
},
3 => %Floki.HTMLTree.HTMLNode{
attributes: [{"id", "content"}],
children_nodes_ids: [10, 8, 6, 4],
node_id: 3,
parent_node_id: 2,
type: "section"
},
4 => %Floki.HTMLTree.HTMLNode{
attributes: [{"class", "headline"}],
children_nodes_ids: [5],
node_id: 4,
parent_node_id: 3,
type: "p"
},
5 => %Floki.HTMLTree.Text{content: "Floki", node_id: 5, parent_node_id: 4},
6 => %Floki.HTMLTree.HTMLNode{
attributes: [{"class", "headline"}],
children_nodes_ids: '\a',
node_id: 6,
parent_node_id: 3,
type: "span"
},
7 => %Floki.HTMLTree.Text{
content: "Enables search using CSS selectors",
node_id: 7,
parent_node_id: 6
},
8 => %Floki.HTMLTree.HTMLNode{
attributes: [{"href", "https://github.com/philss/floki"}],
children_nodes_ids: '\t',
node_id: 8,
parent_node_id: 3,
type: "a"
},
9 => %Floki.HTMLTree.Text{
content: "Github page",
node_id: 9,
parent_node_id: 8
},
10 => %Floki.HTMLTree.HTMLNode{
attributes: [{"data-model", "user"}],
children_nodes_ids: '\v',
node_id: 10,
parent_node_id: 3,
type: "span"
},
11 => %Floki.HTMLTree.Text{
content: "philss",
node_id: 11,
parent_node_id: 10
},
12 => %Floki.HTMLTree.HTMLNode{
attributes: [{"href", "https://hex.pm/packages/floki"}],
children_nodes_ids: '\r',
node_id: 12,
parent_node_id: 2,
type: "a"
},
13 => %Floki.HTMLTree.Text{
content: "Hex package",
node_id: 13,
parent_node_id: 12
}
},
root_nodes_ids: [1]
}
# 2
%Floki.HTMLTree.HTMLNode{
attributes: [],
children_nodes_ids: [2],
node_id: 1,
parent_node_id: nil,
type: "html"
}
# 3
"a"
Attempted function clauses (showing 2 out of 2):
defp get_matches(tree, html_node, selector = %Floki.Selector{combinator: nil})
defp get_matches(tree, html_node, selector = %Floki.Selector{combinator: combinator})
(floki 0.31.0) lib/floki/finder.ex:67: Floki.Finder.get_matches/3
(elixir 1.12.2) lib/enum.ex:3894: Enum.flat_map_list/2
(floki 0.31.0) lib/floki/finder.ex:51: Floki.Finder.find_selectors/2
(floki 0.31.0) lib/floki.ex:248: Floki.find/2
Hey @aegatlin 👋
This can be done by using the comma as a separator in a String.t()
. Something like this:
Floki.find(html_node, "p, a, pre")
This is pretty common in CSS selectors and you can mix a bunch of selectors separated by comma.
The case of using [Floki.Selector.t()]
is when you want to make complex selectors by hand. It's not a common use case though.
I hope this can help :)