JuliaWeb/Gumbo.jl

How to extract the contents of elements such as div

Closed this issue · 3 comments

How to extract the contents of elements such as div ?
I suspect all "body" of divs..

julia> for elem in preorder(body)
#println(elem)
if typeof(elem)==HTMLElement{:div} push!(divy,(elem)) end
end

julia> unique(divy)
289-element Array{Any,1}:
Gumbo.HTMLElement{:div}
Gumbo.HTMLElement{:div}
Gumbo.HTMLElement{:div}
Gumbo.HTMLElement{:div}
Gumbo.HTMLElement{:div}
Gumbo.HTMLElement{:div}

Paul

If you have an HTMLElement elem, children(elem) will return the array of it's child nodes. So concretely in you example children(divy[1]) would get the children of the first div, etc. Does that work for you?

OK, now i have (add children):
divy=[]
for elem in preorder(body)
#println(elem)
if typeof(elem)==HTMLElement{:div} push!(divy,children(elem)) end
end

if I call now divy[1]
julia> divy[1]
3-element Array{Gumbo.HTMLNode,1}:
Gumbo.HTMLElement{:p}
Gumbo.HTMLElement{:span}
Gumbo.HTMLElement{:script}

but if I call divy[1:1] i have the body of first div.
Question: Is it the best way to get body od elelemnts ?

julia> divy[1:1]
1-element Array{Any,1}:
Gumbo.HTMLNode[Gumbo.HTMLElement{:p}:

Prawdopodobnie dawno nie było Cię w Wirtualnej Polsce. Zobacz jak się zmieniła!

,Gumbo.HTMLElement{:span}:

,Gumbo.HTMLElement{:script}:

<script> $('.mail-info--close').click(function () { $('.mail-info').slideUp(); }); (function($){var getUrlParam=function(paramName){var regEx=new RegExp("[ ]"+paramName+"=([^&#]*)"),currSearchUrl=document.location.search,resultsArray=currSearchUrl.match(regEx);if(resu sArray){return resultsArray[1]}return false},wzp=getUrlParam("wzp");if (wzp=='wp'){$('.mail-info--content p').pr end('Wylogowano z Poczty WP. ');$('aside.mail-info').show();} else if (wzp=='o2') {$('.mail-info--content p').pr end('Wylogowano z Poczty o2. ');$('aside.mail-info').show();};return}(WP.$)); </script>

]

ok, i see :)
After
divy=[]
for elem in preorder(body)
#println(elem)
if typeof(elem)==HTMLElement{:div} push!(divy,(elem)) end
end
divy[1] return body of this div, ....nice...
julia> divy[1]
Gumbo.HTMLElement{:div}:

Prawdopodobnie dawno nie było Cię w Wirtualnej Polsce. Zobacz jak się zmieniła!

<script> $('.mail-info--close').click(function () { $('.mail-info').slideUp(); }); (function($){var getUrlParam=function(paramName){var regEx=new RegExp("[?& ]"+paramName+"=([^&#]*)"),currSearchUrl=document.location.search,resultsArray=currSearchUrl.match(regEx);if(result sArray){return resultsArray[1]}return false},wzp=getUrlParam("wzp");if (wzp=='wp'){$('.mail-info--content p').prep end('Wylogowano z Poczty WP. ');$('aside.mail-info').show();} else if (wzp=='o2') {$('.mail-info--content p').prep end('Wylogowano z Poczty o2. ');$('aside.mail-info').show();};return}(WP.$)); </script>

julia>