A chrome plugin to get simpler XPath of elemetns for using in independent html extractors.
##Prototype
When working in single mode, XPathMagic will try to extract XPath of selected element E in algorithm below:
-
If the element has attribute ID, generate a XPath segment as tag[@id="xxx"]
-
Apply the XPath to whole html, if only E is selected, return the XPath as result, otherwise go to step 3.
-
Change attribute to 'class' and retry step 1-2.
-
If step 3 missed, go to the parent element of current and retry 1-3. The parent 's XPath is added before.
-
If root element (/html) is processed but there are still more than one reuslt, choose the lowest element with the same number of result.
e.g. /html/body/div[@class="a"] will be converted to //div[@class="a"]
Sometimes the attributes of elements is relevant to the data of page.
For example, <div id="OSChina_News_43391"> in page http://www.oschina.net/news/43391/encryption-is-less-secure-than-we-thought uses the news id as a part of element id. In this page, although we extract the correct XPath //div[@id="OSChina_News_43391"],it will not work in other pages.
So users can add another different page for extracting. This two pages are in a same kind but with diffent content.
When select element in the left page, the XPath will be extracted by Algorithm I. Then we will use it to select elements in the right page. If matching, hightlight the matching elements and print answer. If not matching, consider the result as FAILURE and continue step 4 in Algorithm I.
##License