Node for found text
rodneyboyd opened this issue · 10 comments
After a sucessful frame.find_text
, frame.html.current_node
is None
. Is there a way to get the node for the selected text? Thanks.
Hi! Not at the moment, but I can add it if needed. What are your trying to do?
Hi. I'm not sure how much detail you need, but basically I have some information stored in attributes that I'd like to be able to access without having to click on the text after finding it.
If you're using find_text
as a way to filter elements and get their data, I would instead use frame.html.search(css_selector)
to get the nodes. For large documents this would be faster and more foolproof than using find_text
. For instance,
self.frame.load_html("<p>Some text</p><p wantthis>Some other text<p>")
self.frame.html.search("wantthis")
would return all nodes that match.
However, if for whatever reason you need to get the nodes found strictly by find_text
, let me know and I will add it!
Hi, I do need to use find_text
because it's for a user find/replace operation. I'm not sure if I could refactor it to use search(css)
instead ... maybe? Btw if you're curious about the app you can download it at https://picardy-indexing.ca/downloads It's an index-editing app. It uses TkinterWeb both for preview and Help delivery.
Hi!
No, it is not possible to use search
to mimic the functionality of find_text
. find_text
takes the text content of the website, uses RegEx to find matches, and then finds the corresponding nodes. You could make your own similar function, but there's no need to reinvent the wheel.
I tweaked find_text
so you can get the selected node. Adding the argument detailed=True
to find_text
will cause a tuple with the number of matches, the selected node, and a tuple of all other matches to be returned.
Each match is returned as a tuple of four values. The first is the start node, the second is the text offset index from the start of the node, the third is the end node, and the fourth is the end node offset index. It returns two nodes because some searches could span multiple nodes, so the start node is the node at the beginning of the text that was found and the end node is the node that is at the end of the found text. In most cases these would be the same. The offset indexes are largely internal.
I hope this helps! Let me know if you have any questions.
Thanks very much! It works as expected and provides the information I need.
By the way, something seems to have changed that causes find_text('')
to fail with the following error:
TypeError: cannot unpack non-iterable int object
Happy to help!
Thanks for noticing that bug; I just fixed it.
By the way, is there a reason why tkhtml/Linux/32-bit/Tkhtml3.0.so changed name to libTkhtml3.0.so ? (Ditto for 64-bit.) I don't think it makes any difference, but I initially got an error when building my installer because it was expecting the old name.
I renamed some of the files to match the output filename when compiling Tkhtml. I had someone ask why they were named differently and I figured I would rename some of the files used here to save folks having to rename their files after compiling. Sorry about your installer; it never occurred to me it would be an issue!
No worries ... I figured it out after a few minutes :-)