scinfu/SwiftSoup

How to ignore child text elements

WillBishop opened this issue · 1 comments

Hello,

Say you have a document that looks like this

<!DOCTYPE html>
<html>
   <head></head>
   <body>
      <div>
         <div>
            <span>I'm the first element</span>
            <p>I'm the second element <strong>with bold text</strong></p>
         </div>
         <div>
            <p>I'm the fourth element.</p>
            <p>I'm the fifth element</p>
         </div>
      </div>
   </body>
</html>

I'm trying to get an array, containing the text of each of those elements. Obviously I'm simplifying the page, but I'm working with pages that may use span as the parent for all text, or p. If I use getAllElements, then call .text() on each, I"ll end up with duplicates, as <strong>with bold text</strong> is its own element, within another.

So in the end I want an array looking like

[
"I'm the first element",
"I'm the second element with bold text",
"I'm the fourth element.",
"I'm the fifth element"
]

I know this isn't a support forum but I wasn't sure where else to go.

Hi,
try this code for your specific code:

let doc: Document = try SwiftSoup.parse(html)
let array = try doc.select("p,span").map{ try $0.text()}

result:
["I\'m the first element", "I\'m the second element with bold text", "I\'m the fourth element.", "I\'m the fifth element"]