How to ignore child text elements
WillBishop opened this issue · 1 comments
WillBishop commented
Hello,
Say you have a document that looks like this
<!DOCTYPE html>
<html>
<head></head>
<body>
<div>
<div>
<span>I'm the first element</span>
<p>I'm the second element <strong>with bold text</strong></p>
</div>
<div>
<p>I'm the fourth element.</p>
<p>I'm the fifth element</p>
</div>
</div>
</body>
</html>
I'm trying to get an array, containing the text of each of those elements. Obviously I'm simplifying the page, but I'm working with pages that may use span
as the parent for all text, or p
. If I use getAllElements
, then call .text()
on each, I"ll end up with duplicates, as <strong>with bold text</strong>
is its own element, within another.
So in the end I want an array looking like
[
"I'm the first element",
"I'm the second element with bold text",
"I'm the fourth element.",
"I'm the fifth element"
]
I know this isn't a support forum but I wasn't sure where else to go.
scinfu commented
Hi,
try this code for your specific code:
let doc: Document = try SwiftSoup.parse(html)
let array = try doc.select("p,span").map{ try $0.text()}
result:
["I\'m the first element", "I\'m the second element with bold text", "I\'m the fourth element.", "I\'m the fifth element"]