sonata-nfv/son-catalogue-repos

How is a package (the file) identified in the Catalogue?

jbonnet opened this issue · 15 comments

By the trio name+version+vendor? If so, is the filename in the metadata?

Hi @jbonnet,
Package files (son-packages) are currently identified by the a uuid (same way as descriptors are). This uuid is now included in the package descriptor meta-data when it is stored in the catalogue. It takes a new field called "son_package_uuid". Thus, every stored package descriptor now is linked to its stored package file.

Regarding son-packages metadata, it includes the filename, but not necessarily using the trio name+vendor+version. Is the sender (in this case the GK) who defines the filename of the file when filling the HTTP headers of the submission, as shown below:

headers[HTTP_CONTENT_DISPOSITION] = "attachment; filename=<filename>"

So, if the file is 'named tc.sonata.1.0.son', then its metadata filename will be called that way. The Catalogues stores the filename internally in a field called "grid_fs_name" (due to the gridFS lib).
Here is an example of the metadata for sonata-demo.son file (used in integration tests)

[{"created_at":"2017-02-16T13:04:46.048+00:00","grid_fs_id":"58a5a36eaf8bef220b000000","grid_fs_name":"sonata-demo.son","md5":"9c8a66d5caa756bbb5d2ed4c56e27214","updated_at":"2017-02-16T13:04:46.048+00:00","uuid":"6dc04296-ea9b-4113-bf4b-8e9d2d42c9b3"}]

The son-packages API does not support yet queries based on filename (it does for "grid_fs_name"). This will be addressed soon.

Best regards

Hello again, @dang03

The son-packages API does not support yet queries based on filename (it does for "grid_fs_name").
You mean but it does support searches on grid_fs_id?
Also: who is generating the value for the md5 field shown in the above example? The Catalogue?

Hello @jbonnet,

Exactly, right now the API support queries based on current fields, such grid_fs_id for the file uuid or grid_fs_name for the filename.

However, to get the file itself, now it is only available through /son-packages/:id/? using grid_fs_id (named son_package_uuid in the package descriptor).
If we need to retrieve a file by name then some changes are required.

The 'md5' is generated by the Catalogue, using GridFS lib.

@dang03
Fetching by name might get the wrong package file, due to the wrong file name being used, no? What happens when a file is uploaded with the same name as one existing file (but from another package)?

Better would be to support fetching package files through the trio vendor/name/version...

@jbonnet
Package file meta-data does not have these fields yet (vendor, name, version). But this is not intended to be like this. The problem we found is when you POST a file, the filename must be defined in the attachment; filename HTTP header when sending binary data through a HTTP call.
Thus, if you need vendor, name, version fields on the file meta-data, there's no other option than passing these as query parameters, e.g.:

POST /api/v2/son-packages?vendor=x&name=y&version=z
Header HTTP_content_disposition: attachment; filename=<filename.son>

To enable this, some work is required on Catalogue side.

@dang03
Hmm...
What about passing those query parameters (vendor, name and version) as HTTP headers as well? Not the most beautiful thing in the world, but a POST with query parameters look... strange (it's not really a query, is it?).
From the outside (GK's API), I was thinking of providing an interface like:

  • POST /api/v2/packages package file (already in place), returning the package meta-data, including it's UUID;
  • GET /api/v2/packages/:uuid gets the meta-data
  • GET /api/v2/packages/:uuid/file (or .../download) gets the package file

What do you think? By the way, is it son-access who needs to use this API?

Hi @jbonnet,
Yes, using custom HTTP headers for these fields (vendor, name and version) is feasible. Let's do it that way then.
Having all packages-related under the same API makes sense (instead of using another for /api/v2/son-packages).

In deed, son-access is the component that requires this API. In order to be in line, son-access will have to be updated to work with the packages GK API.

@dang03
Still another option would be to include those parameters in the url, like in:
POST /api/v2/packages/vendor/:vendor_name/name/:name/version/:version

@dang03
I'm back to this, sorry...
Should we have the package file name in the package meta-data? Or even in the data?

@dang03
Forget it, we've got it in the grid_fs_name field, sorry...

@jbonnet
No problem. I'll be back to this too, as vendor, name, version meta-data fields are not yet implemented.
I need to know which way (from the ones we discussed) do you prefer in order to send vendor, name, version with the package file:

a) POST /api/v2/son-packages/vendor/:vendor_name/name/:name/version/:version

b) Included in form of custom headers:
att = request.env['HTTP_CONTENT_DISPOSITION']
vendor = request.env['HTTP_VENDOR']
name = request.env['HTTP_NAME']
version = request.env['HTTP_VERSION']

@dang03
Sorry, I've missed this one...
Now that I know a bit more of the Catalogue's API, I don't think we should have POST with the trio... because it is already within the package, right? What should happen if we POST a package with one vendor+name+version into a different trio?

Hi @jbonnet,
It makes sense to avoid POSTing with the trio, because if you save a package with a different trio it will create a inconsistent package.
However this was more like an option, not to store a package file itself by the name trio, but to add some more info in its Catalogue meta-data.