HDFGroup/hsds

POST_Links does not return link information when following links recursively

mattjala opened this issue · 5 comments

When using POST_Links to retrieve a link by name with follow_links enabled, if the link is retrieved from a sub-group, the returned JSON is empty.

Implementing the follow_links parameter for POST_Links raises some questions - if multiple links are found by the same name, which should be returned? The first in iteration order? Do the domain crawlers guarantee any particular iteration order, and do we want to expose that behavior?

It might make sense to remove the follow_links param from POST_Links. Similarly, CreateOrder, Limit, and Marker seem to be holdovers from GET_Links and should probably be removed as well.

Test to elicit the bug:

    def testPostLinksFollowLinks(self):
        domain = self.base_domain + "/testPostLinksFollowLinks.h5"
        helper.setupDomain(domain)
        print("testPostLinksFollowLinks", domain)
        headers = helper.getRequestHeaders(domain=domain)

        req = helper.getEndpoint() + "/"
        rsp = self.session.get(req, headers=headers)
        self.assertEqual(rsp.status_code, 200)
        rspJson = json.loads(rsp.text)
        root_id = rspJson["root"]

        # create group "g1" in root group
        req = helper.getEndpoint() + "/groups"
        body = {"link": {"id": root_id, "name": "g1"}}
        rsp = self.session.post(req, data=json.dumps(body), headers=headers)
        self.assertEqual(rsp.status_code, 201)
        rspJson = json.loads(rsp.text)
        group_id = rspJson["id"]

        path = "/dummy_target"

        # create link "link1" in g1
        req = helper.getEndpoint() + "/groups/" + group_id + "/links/link1"
        body = {"h5path": path}
        rsp = self.session.put(req, data=json.dumps(body), headers=headers)
        self.assertEqual(rsp.status_code, 201)

        # make POST_Links request to root group, expect 404 
        body = {"titles": ["link1"]}
        req = helper.getEndpoint() + "/groups/" + root_id + "/links"
        rsp = self.session.post(req, data=json.dumps(body), headers=headers)
        self.assertEqual(rsp.status_code, 404)

        # make POST_Links request to root group with follow_links, expect to find link1
        req = helper.getEndpoint() + "/groups/" + root_id + "/links?follow_links=1"
        rsp = self.session.post(req, data=json.dumps(body), headers=headers)
        self.assertEqual(rsp.status_code, 200)
        rspJson = json.loads(rsp.text)
        self.assertTrue("links" in rspJson)
        links = rspJson["links"]
        self.assertTrue(len(links) == 1) # fails due to empty return
        link = links[0]
        self.assertTrue("title" in link)
        self.assertEqual(link["title"] == "link1")

Spoke to John about this - follow_links should return all links that match the name in the provided groups. Limit and CreateOrder are used in the case where titles is not provided, so they should stay. marker isn't actually checked for by POST_Links, so no change is needed there.

@jreadey I'm not sure I understand how this endpoint is supposed to work with the pattern parameter. testGetPattern expects group ids to be returned that don't contain any links that fit the pattern. Is it meant to return all searched group ids (empty or not) when using a pattern, and not without a pattern?

Here's the (expected) response in testGetPattern with empty groups:

{'links': 
   {'g-f81913f9-909037d3-7091-9b7118-18bf5b': [], 
   'g-f81913f9-909037d3-0b41-8c15a0-ba4fad': [], 
   'g-f81913f9-909037d3-2304-9163c1-15d503': [
      {'id': 'd-f81913f9-909037d3-cd25-bd6a92-8700e7', 'class': 'H5L_TYPE_HARD', 'created': 1707845214.4165003, 'title': 'dset2.1'}, 
      {'id': 'd-f81913f9-909037d3-be98-2bdfa6-91fa48', 'class': 'H5L_TYPE_HARD', 'created': 1707845214.4451396, 'title': 'dset2.2'}
   ],
    'g-f81913f9-909037d3-f6a8-00d8e6-4046b6': [
      {'id': 'd-f81913f9-909037d3-39d1-9c7ecc-a8b46b', 'class': 'H5L_TYPE_HARD', 'created': 1707845214.270042, 'title': 'dset1.1.1'}, 
      {'id': 'd-f81913f9-909037d3-697f-a976fe-b8c909', 'class': 'H5L_TYPE_HARD', 'created': 1707845214.2981899, 'title': 'dset1.1.2'}
   ], 
   'g-f81913f9-909037d3-6cba-d68216-f3d0b1': [], 
   'g-f81913f9-909037d3-cc63-497daf-2bf556': []
   }
}

Yes, that is the intent. You could make the case to not return group ids that contain no links, but I like the explicit empty list.

I tried out titles with follow_links and it doesn't return a sensible result -- links to subgroups won't get followed if the subgroup link is not in titles. Rather than fix this up, I think it makes more sense to return a 400 if titles is used with follow_links -- I've submitted a PR for this change. The PR also raises 400 if CreateOrder is used with titles or Limit is used with titles.

Resolved in #314