stac-utils/stac-fastapi-elasticsearch-opensearch

BUG with links in version 2.3.0

Closed this issue ยท 12 comments

Describe the bug
In version 2.1.0 of stac-fastapi.core when creating an item this item would automatically be assigned links (although one could not add aditional links) Example:

"links": [
		{
			"rel": "self",
			"type": "application/geo+json",
			"href": "http://localhost:8080/collections/test-collection/items/test-item96"
		},
		{
			"rel": "parent",
			"type": "application/json",
			"href": "http://localhost:8080/collections/test-collection"
		},
		{
			"rel": "collection",
			"type": "application/json",
			"href": "http://localhost:8080/collections/test-collection"
		},
		{
			"rel": "root",
			"type": "application/json",
			"href": "http://localhost:8080/"
		}
	]

But in version 2.3.0 when creating an item links is empty unless the user passes aditional links.

Passing the self root collection parent links has no effect aswell. They are ignored.

To Reproduce
Steps to reproduce the behavior:
Run versions 2.1.0 and 2.3.0 of stac-fastapi-elasticsearch and create 1 test collection and 1 item collection in both environments without providing links.

Expected behavior
Creating an item should auto generate links.

Screenshots
image

image
image

Desktop:

  • OS: WSL2
  • Version: Ubuntu 22

Guessing it's related to:

def stac_to_db(cls, stac_data: stac_types.Item, base_url: str) -> stac_types.Item:
"""Transform STAC item to database-ready STAC item.
Args:
stac_data (stac_types.Item): The STAC item object to be transformed.
base_url (str): The base URL for the STAC API.
Returns:
stac_types.Item: The database-ready STAC item object.
"""
item_links = resolve_links(stac_data.get("links", []), base_url)
stac_data["links"] = item_links
now = now_to_rfc3339_str()
if "created" not in stac_data["properties"]:
stac_data["properties"]["created"] = now
stac_data["properties"]["updated"] = now
return stac_data

and this new function from stac-fastapi:
https://github.com/stac-utils/stac-fastapi/blob/e7f82d6996af0f28574329d57f5a5e90431d66bb/stac_fastapi/types/stac_fastapi/types/links.py#L21-L26

def resolve_links(links: list, base_url: str) -> List[Dict]:
    """Convert relative links to absolute links."""
    filtered_links = filter_links(links)
    for link in filtered_links:
        link.update({"href": urljoin(base_url, link["href"])})
    return filtered_links

I believe I found the issue.

Version 2.1.0 of ItemSerializer.stac_to_db:

class ItemSerializer(Serializer):
    """Serialization methods for STAC items."""

    @classmethod
    def stac_to_db(cls, stac_data: stac_types.Item, base_url: str) -> stac_types.Item:
        """Transform STAC item to database-ready STAC item.

        Args:
            stac_data (stac_types.Item): The STAC item object to be transformed.
            base_url (str): The base URL for the STAC API.

        Returns:
            stac_types.Item: The database-ready STAC item object.
        """
        item_links = ItemLinks(
            collection_id=stac_data["collection"],
            item_id=stac_data["id"],
            base_url=base_url,
        ).create_links()
        stac_data["links"] = item_links

        now = now_to_rfc3339_str()
        if "created" not in stac_data["properties"]:
            stac_data["properties"]["created"] = now
        stac_data["properties"]["updated"] = now
        return stac_data

In this version any custom item links passed by stac_data are simply ignored and "base links" (self/parent/collection/root) are generated.

Version 2.3.0 of ItemSerializer.stac_to_db:

class ItemSerializer(Serializer):
"""Serialization methods for STAC items."""
@classmethod
def stac_to_db(cls, stac_data: stac_types.Item, base_url: str) -> stac_types.Item:
"""Transform STAC item to database-ready STAC item.
Args:
stac_data (stac_types.Item): The STAC item object to be transformed.
base_url (str): The base URL for the STAC API.
Returns:
stac_types.Item: The database-ready STAC item object.
"""
item_links = resolve_links(stac_data.get("links", []), base_url)
stac_data["links"] = item_links
now = now_to_rfc3339_str()
if "created" not in stac_data["properties"]:
stac_data["properties"]["created"] = now
stac_data["properties"]["updated"] = now
return stac_data

https://github.com/stac-utils/stac-fastapi/blob/e7f82d6996af0f28574329d57f5a5e90431d66bb/stac_fastapi/types/stac_fastapi/types/links.py#L21-L26

INFERRED_LINK_RELS = ["self", "item", "parent", "collection", "root", "items"]


def filter_links(links: List[Dict]) -> List[Dict]:
    """Remove inferred links."""
    return [link for link in links if link["rel"] not in INFERRED_LINK_RELS]


def resolve_links(links: list, base_url: str) -> List[Dict]:
    """Convert relative links to absolute links."""
    filtered_links = filter_links(links)
    for link in filtered_links:
        link.update({"href": urljoin(base_url, link["href"])})
    return filtered_links

In this version the "base links" are no longer generated and the custom links are added and "resolved"...

This is a possible solution:

def stac_to_db(cls, stac_data: stac_types.Item, base_url: str) -> stac_types.Item:
    item_links = resolve_links(stac_data.get("links", []), base_url)

    base_item_links = ItemLinks(
        collection_id=stac_data["collection"],
        item_id=stac_data["id"],
        base_url=base_url,
    ).create_links()

    stac_data["links"] = base_item_links + item_links

    now = now_to_rfc3339_str()
    if "created" not in stac_data["properties"]:
        stac_data["properties"]["created"] = now
    stac_data["properties"]["updated"] = now
    return stac_data

Hi Pedro. Thanks for identifying this issue! Would you be able to make a pr to fix this?

Hi Pedro. Thanks for identifying this issue! Would you be able to make a pr to fix this?

I'm not sure why this function was changed this way before, so I just want to make sure this fix makes sense.

Only noticed this now:

https://github.com/stac-utils/stac-fastapi/blob/e7f82d6996af0f28574329d57f5a5e90431d66bb/stac_fastapi/types/stac_fastapi/types/links.py#L10-14
image

So the endpoints that return data are intended to generate these links ["self", "item", "parent", "collection", "root"] ?
Guessing this affects any CREATE & UPDATE aswell as READS since they return the actual item when you create/update item.

Yes I think you're right. Trying to refresh my memory here too. When you create an item it should only store links that aren't in the INFERRED_LINK_RELS list. Then these links should be created when an item or collection is returned.

If I create an item in main branch right now, it only stores links not in INFERRED_LINK_RELS and then when I GET that item, I can see all the links created dynamically plus the additional links stored in the db.

I think one issue here is that if you POST a new item it returns a version of the item that is not the same as the version of the item you will get from a GET item request. POSTing an item will return a version without the links generated. This makes it look like the item is missing links when you create an item. This should be fixed I think.

I think one issue here is that if you POST a new item it returns a version of the item that is not the same as the version of the item you will get from a GET item request. POSTing an item will return a version without the links generated. This makes it look like the item is missing links when you create an item. This should be fixed I think.

When we CREATE/UPDATE a collection, the collection returned in the request response contains links and the collection links are also not stored in the database.

Any idea how we can fix this? Where is the piece of code that controls the response from the CREATE/UPDATE of items? @jonhealy1

So items and collections are behaving differently. I have to dig into this to remember.

This is the create collection function in core - it looks like it's using collection serializer. If you look in create_item above, it's using prep_create_item.

Ok the issue seems to be that in the this stac-fastapi.core.TransactionsClient.create_item the return is the stac_types.Item in Line 680

async def create_item(
self, collection_id: str, item: stac_types.Item, **kwargs
) -> Optional[stac_types.Item]:
"""Create an item in the collection.
Args:
collection_id (str): The id of the collection to add the item to.
item (stac_types.Item): The item to be added to the collection.
kwargs: Additional keyword arguments.
Returns:
stac_types.Item: The created item.
Raises:
NotFound: If the specified collection is not found in the database.
ConflictError: If the item in the specified collection already exists.
"""
base_url = str(kwargs["request"].base_url)
# If a feature collection is posted
if item["type"] == "FeatureCollection":
bulk_client = BulkTransactionsClient(
database=self.database, settings=self.settings
)
processed_items = [
bulk_client.preprocess_item(item, base_url, BulkTransactionMethod.INSERT) for item in item["features"] # type: ignore
]
await self.database.bulk_async(
collection_id, processed_items, refresh=kwargs.get("refresh", False)
)
return None
else:
item = await self.database.prep_create_item(item=item, base_url=base_url)
await self.database.create_item(item, refresh=kwargs.get("refresh", False))
return item

when it should be returning ItemSerializer.db_to_stac(item, base_url)

just like it does in create_collection/update_collection and update_item (I was wrong update_item actually returns the proper item with links)

async def update_item(
self, collection_id: str, item_id: str, item: stac_types.Item, **kwargs
) -> stac_types.Item:
"""Update an item in the collection.
Args:
collection_id (str): The ID of the collection the item belongs to.
item_id (str): The ID of the item to be updated.
item (stac_types.Item): The new item data.
kwargs: Other optional arguments, including the request object.
Returns:
stac_types.Item: The updated item object.
Raises:
NotFound: If the specified collection is not found in the database.
"""
base_url = str(kwargs["request"].base_url)
now = datetime_type.now(timezone.utc).isoformat().replace("+00:00", "Z")
item["properties"]["updated"] = now
await self.database.check_collection_exists(collection_id)
await self.delete_item(item_id=item_id, collection_id=collection_id)
await self.create_item(collection_id=collection_id, item=item, **kwargs)
return ItemSerializer.db_to_stac(item, base_url)

Updating stac-fastapi.core.TransactionsClient.create_item with:

            item = await self.database.prep_create_item(item=item, base_url=base_url)
            await self.database.create_item(item, refresh=kwargs.get("refresh", False))
            # return item
            return ItemSerializer.db_to_stac(item, base_url)

does seem to fix the issue, but I'm not sure if it creates bugs elsewhere... even though tests seem to pass.

Definitely makes sense I think