CRI seems to be working differently outside of iex shell
pedroseabra1091 opened this issue ยท 3 comments
So I am using Chroxy and ChromeRemoteInterface to fetch and further parse HTML with Floki. If I get the outer HTML inside the iex I am able to fetch and then find the elements I desire, however, outside of iex, Floki is not able to find anything.
Here it goes a sample of the code I currently have:
ws_addr = Chroxy.connection()
{:ok, page} = ChromeRemoteInterface.PageSession.start_link(ws_addr)
ChromeRemoteInterface.RPC.Page.enable(page)
ChromeRemoteInterface.PageSession.subscribe(page, "Page.loadEventFired", self())
ChromeRemoteInterface.RPC.Page.navigate(page, %{url: url})
{:ok, dom} = ChromeRemoteInterface.RPC.DOM.getDocument(page)
nodeId = dom["result"]["root"]["backendNodeId"]
{:ok, %{"result" => result}} = ChromeRemoteInterface.RPC.DOM.getOuterHTML(page, %{backendNodeId: nodeId})
pre_selected_content = Floki.find(result["outerHTML"], "div.productBoxTop")
Any suggestions?
One thing is that ChromeRemoteInterface.PageSession.subscribe(page, "Page.loadEventFired", self())
subscribes to the events and forwards it to the subscribed process, but doesn't block. (This library is fairly bare-bones and low-level, I know it isn't ideal, I'd like to add a better option for synchronous execution.. ๐ข )
You can add a receive
block right after navigation, that listens for this Page.loadEventFired
event.
receive do
{:chrome_remote_interface, "Page.loadEventFired", _} -> :ok
after
10_000 -> {:error, :timeout}
end
Synchronous API discussion is at #11, I'd like to hear your thoughts, if you have any :)
Thanks for the help! Unfortunately, I don't have any suggestion ๐
However, I do think this information would be helpful in README ๐
I agree! There's really only a little bit of documentation in https://hexdocs.pm/chrome_remote_interface/ChromeRemoteInterface.PageSession.html#subscribe/3. Not a pleasant API to work in right now, apologies.
If it's okay with you, I'm going to mark this closed for now. ๐