carj/pyPreservica

xml.etree.ElementTree.findall('.//{*}ElementName') (and .find('.//{*}ElementName) does not match namespaces?

Closed this issue · 3 comments

Hi James!

I have a issue since upgrading to v6.2 and updating pyPreservica via pip for that too.

Using EntityAPI functions children() or descendants() throws an error for me:
builtins.AttributeError: 'NoneType' object has no attribute 'text'

Example code:

#!/usr/bin/python3
# -*- coding: utf-8 -*-
# I need to set environment variable for Requests to connect with self-signed certificates
import os
os.environ['REQUESTS_CA_BUNDLE'] = os.path.join(os.getcwd(), 'preservica.crt')
# Example:
import pyPreservica
print(pyPreservica.__version__) # prints "0.8.6"
client = pyPreservica.EntityAPI(username="rauno", password="redacted", tenant="EE", server="preservica62.rauno")
root_folders = client.children()
print(root_folders)

Output (error):

File "C:\Users\rauno\Desktop\2020-11-13 pyPreservica\02_pyPreservica.py", line 9, in <module>
  root_folders = client.children()
File "c:\Users\rauno\AppData\Local\Programs\Python\Python36-32\Lib\site-packages\pyPreservica\entityAPI.py", line 1027, in children
  return PagedSet(result, has_more, total_hits.text, url)

builtins.AttributeError: 'NoneType' object has no attribute 'text'

So I tried tracking the issue down and I think it's related to xml.etree.ElementTree.findall() usage in EntityAPI.py line 1010
I don't think "{*}" as namspace is working.
My testing:

#!/usr/bin/python3
# -*- coding: utf-8 -*-
# For using self-signed sertificate
import os
os.environ['REQUESTS_CA_BUNDLE'] = os.path.join(os.getcwd(), 'preservica.crt')
# Fetching XML
import requests
import lxml.etree # For pretty-printing

myobj = {"username" : "rauno",
         "password" : "redacted",
         "tenant" : "ee"}
base_url = "preservica62.rauno"
login_request = requests.post(f'https://{base_url}/api/accesstoken/login', data = myobj)
pat = login_request.json()["token"]
header_pat = {"Preservica-Access-Token" : pat}
root_children_request = requests.get(f'https://{base_url}/api/entity/root/children', 
                                     data={"start":0,"max":50}, headers=header_pat)
# Response-XML as Bytes
xml_content_response = root_children_request.content
print("XML Response:")
print(lxml.etree.tostring(lxml.etree.fromstring(xml_content_response), pretty_print=True).decode("UTF-8"))

# Showing problem
import xml.etree.ElementTree
xml_fromstring = xml.etree.ElementTree.fromstring(xml_content_response.decode("utf-8"))
children = xml_fromstring.find(".//{*}Child")
print("With {*}:",children)
children_with_namespace = xml_fromstring.find(".//{http://preservica.com/EntityAPI/v6.2}Child")
print("With namespace:", children_with_namespace)
children_with_prefix_namespace = xml_fromstring.find(".//{ent}Child", namespaces={"ent":"http://preservica.com/EntityAPI/v6.2"})
print("With prefix:", children_with_prefix_namespace)

Output (no errors, but I cropped out SubjectAltNameWarnings):

[...]
XML Response:
<ChildrenResponse xmlns="http://preservica.com/EntityAPI/v6.2" xmlns:xip="http://preservica.com/XIP/v6.2">
  <Children>
    <Child title="Testijuurikas" ref="bb5f4634-7610-49d7-8f1a-235b69f3eae2" type="SO">https://preservica62.rauno/api/entity/structural-objects/bb5f4634-7610-49d7-8f1a-235b69f3eae2</Child>
    <Child title="Vastuv&#245;tujuurikas" ref="61d64c16-6b74-4bb7-bd82-8b786782b868" type="SO">https://preservica62.rauno/api/entity/structural-objects/61d64c16-6b74-4bb7-bd82-8b786782b868</Child>
  </Children>
  <Paging>
    <TotalResults>2</TotalResults>
  </Paging>
  <AdditionalInformation>
    <Self>https://preservica62.rauno/api/entity/root/children</Self>
  </AdditionalInformation>
</ChildrenResponse>

With {*}: []
With namespace: [<Element '{http://preservica.com/EntityAPI/v6.2}Child' at 0x03FAF5D0>, <Element '{http://preservica.com/EntityAPI/v6.2}Child' at 0x03FAF600>]
With prefix: [<Element '{http://preservica.com/EntityAPI/v6.2}Child' at 0x03FAF5D0>, <Element '{http://preservica.com/EntityAPI/v6.2}Child' at 0x03FAF600>]

And for me, the {*} does not match anything, so functions findall() return empty list and find() returns None - which is where the error comes from. This {*} prefix is used on other find and findall functions too.

Or if it's fine on your end, could this be an environmental issue on my side?

All the best
Rauno

carj commented

Hi!
You're absolutely correct! I'm using version 3.6 and I also have 3.7 setup, where I've worked with xml.etree.ElementTree and variants, but:

Changed in version 3.8: Support for star-wildcards was added.

1) Python 3.8 docs on xml.etree.elementtee
2) What's new in Python 3.8

When searching for solutions I searched for Python .//{*}, but if I had searched just Python {*} I would've seen my mistake.
Time to upgrade for me :)

Sorry to make this issue and thanks for a quick response!
Rauno

carj commented

Hi,

Glad you found the issue. I would like to get some older python distributions installed at some point to check functionality. I have suite of system tests which i run on my 3.8.5 system before i create new releases.

For the time being i can only really support 3.8.x

~
James