scinfu/SwiftSoup

Is there a way to parse part of HTML instead of parsing full HTML?

alphonse1234 opened this issue · 1 comments

using swift 5, xcode 11.2.1 swiftsoup 2.3.1

Currently I'm using XMLParser to get rss url from xml(about 34 url). Then parsing full html document from the url which got from xml. Lastly, search needed element and retrieve data.

But it takes too much time to parse full html and search element.

I'm extracting always same element

<meta property="#element what I'm searching for#" content="#text I'm retrieving#">
which is located in

<head></head>
So I don't want to wait for converting all html file.I want to parse only part of html , for example <head></head> without body.

Is there any way to do it?

or

Is there any way to do it faster?

This is my code parsing full html and getting element.

    func getStringFromHtml(urlString : String) -> String {
        
        let url = URL(string: urlString)!
        
        var result = ""
        
        do {
            let html = try String(contentsOf: url)
            let doc: Document = try SwiftSoup.parse(html)

            let meta: Element = try doc.select("meta[property=og:title]").first()!
           let text: String = try meta.attr("content")
            result = text
        } catch {
            print("error")
        }
        return result
    }

you can try select head doc.head() for first and parse it:

func getStringFromHtml(urlString : String) -> String {
        let url = URL(string: urlString)!
        var result = ""
        do {
            let html = try String(contentsOf: url)
            let doc: Document = try SwiftSoup.parse(html)
            let meta: Element? = try doc.head()?.select("meta[property=og:title]").first()
           let text: String? = try meta?.attr("content")
            result = text ?? ""
        } catch {
            print("error")
        }
        return result
    }