stac-utils/pgstac

How to create a STAC from a Postgres table based on the provided samples (s3 storing SAFE and pg dump)

MathewNWSH opened this issue · 1 comments

Hello,

I have a problem starting my adventure with STAC.
I find it quite challenging to begin.

At first, I thought I would be able to build a common base for two types of data (Sentinel1, Sentinel2) using stactools (https://github.com/stactools-packages/sentinel2), targeting the SAFE directory and doing this in a loop for all my rasters (stored on S3). Each iteration would create a new record in the database storing metadata and S3 paths for my rasters.
Then, I would use pgstac to make the database serve as a catalog, and stac-fastapi (https://github.com/stac-utils/stac-fastapi) to expose it.

I'm having trouble understanding the exact structure the database should have. So far, I've created something like a GDAL tile index. But if the database is in this form, how do I proceed to create a directory containing data from different missions? In its current form, I am unable to handle this. So far I was using CREODIAS resto api to extract metadata, geometry and product location information. Sample api request:
https://datahub.creodias.eu/resto/api/collections/Sentinel2/search.json?productType=S2MSI2A&startDate=2023-07-01T00:00:00Z&completionDate=2023-07-01T23:59:59Z&maxRecords=1000&box=13.7,44.0,41.0,55.0&page=1

I think the questions can be summarized as follows:
Is there a tool for creating a database - extracting metadata for different missions (S1, S2) stored on S3 bucket, or do I have to create the database schema myself? If so, I was thinking of something like columns:

timestamp
geometry 
mission identifier
product identifier
attributes (jsonb column type)

This way, theoretically, I could include multiple missions in one database. But is this a good approach?

Also attaching the sample data I mentioned:
the pg_dmp: https://s3.waw3-1.cloudferro.com/swift/v1/pg_dump/s2_jsonb_demo
sample S2 SAFE catalogues: https://s3.waw3-2.cloudferro.com/swift/v1/demo_data/

Sorry for a little bit of chaos and possible misunderstanding of basic concepts - as I mentioned, I'm just starting and looking for guidance.

bitner commented

PgSTAC has an internal schema and provides mechanisms to maintain partitions and indexes for data that is added. Data should not be managed directly in this table structure, rather STAC compliant records as json records should be added either using the pypgstac load utility, enabling use of the transactions API in STAC-FastAPI, or using the data management functions exposed as SQL functions by PgSTAC (create_item, update_item, delete_item, upsert_item, create_items, upsert_items).

PgSTAC does not manage the creation or validation of STAC JSON records. That should be done using tools like stactools or pystac. Once you have STAC records, then you can load them into PgSTAC to expose them via the STAC API.