gawel/pyquery

Different selection results gained within different environments

Opened this issue · 3 comments

When I first started to learn how to use pyquery on windows10 with python3.9.5 and pyquery1.4.3, the selection results seems weird:

html =
 '''
<div id="container">
    <ul class="list">
        <li class="item-0">first item</li>
        <li class="item-1"><a href="link2.html">second item</a></li>
        <li class="item-0 active"><a href="link3.html"><span class="bold">third item</span></a></li>
        <li class="item-1 active"><a href="link4.html">fourth item</a></li>
        <li class="item-0"><a href="link5.html">fifth item</a></li>
    </ul>
</div>
'''
doc=pq(html)
print(doc('li'))

The result was

<li class="item-0">first item</li>
         <li class="item-1"><a href="link2.html">second item</a></li>
         <li class="item-0 active"><a href="link3.html"><span class="bold">third item</span></a></li>
         <li class="item-1 active"><a href="link4.html">fourth item</a></li>
         <li class="item-0"><a href="link5.html">fifth item</a></li>
     </ul>
 </div>
         <li class="item-1"><a href="link2.html">second item</a></li>
         <li class="item-0 active"><a href="link3.html"><span class="bold">third item</span></a></li>
         <li class="item-1 active"><a href="link4.html">fourth item</a></li>
         <li class="item-0"><a href="link5.html">fifth item</a></li>
     </ul>
 </div>
         <li class="item-0 active"><a href="link3.html"><span class="bold">third item</span></a></li>
         <li class="item-1 active"><a href="link4.html">fourth item</a></li>
         <li class="item-0"><a href="link5.html">fifth item</a></li>
     </ul>
 </div>
         <li class="item-1 active"><a href="link4.html">fourth item</a></li>
         <li class="item-0"><a href="link5.html">fifth item</a></li>
     </ul>
 </div>
         <li class="item-0"><a href="link5.html">fifth item</a></li>
     </ul>
 </div>

And when I tried the same codes on colab with python3.7 and pyquery1.4.3, the result seems right:

<li class="item-0">first item</li>
         <li class="item-1"><a href="link2.html">second item</a></li>
         <li class="item-0 active"><a href="link3.html"><span class="bold">third item</span></a></li>
         <li class="item-1 active"><a href="link4.html">fourth item</a></li>
         <li class="item-0"><a href="link5.html">fifth item</a></li>

Maybe I've met a similar scenario like this closed issue #223 , but there is no solution to that issue.
Were the different environments that caused the difference?

I have the same issue - whatever I select, something bleeds through at the end:

(from above)

print(doc('div ul li:nth-child(3)').html())
returns

<a href="link3.html"><span class="bold">third item</span></a></li>
        <li class="item-1 active"><a href="link4.html">fourth item</a></li>
        <li class="item-0"><a href="link5.html">fifth item</a></li>
    </ul>
</div>

Getting the text of an element works, but I need to automate taking a specific <div> from one HTML and appending it to another, and it doesn't work.

I'm using python 3.8 and pyquery 1.4.3

Seems I was able to get what I need with the .outer_html() method. Although this too bleeds through with some newline characters. Not ideal, but it'll do.

Edit: It worked for one element I was trying to extract, but then when trying to extract the destination element, it didn't work.

If it help those working the issue, I've got a reproducer for something very similar to this at https://github.com/rjsparks/test_pyquery. All tox tests pass on macos on an intel mac. All fail on an m1 pro.