clearlydefined/service

Nuget packages namespace

Opened this issue · 1 comments

The vast majority of nuget packages have "-" as namespace, and rightfully so, because nuget doesn't support namespacing (short of id prefix reservation: https://learn.microsoft.com/en-us/nuget/nuget-org/id-prefix-reservation). However there are about 900 packages in of nuget type in which namespace is present and not equal to "-". I've put those packages into a gist here: https://gist.github.com/RomanIakovlev/d6e3e36175c184c802d17f088c829d1b.

I think this is a bug in the crawler. I could try finding and fixing the problem, if given some guidance as of where to start.

Given any coordinates, crawler attempts to fetch as it is specified, and mark it missing when the package is not found. In NuGetFetch (in the crawler), only name and version is used, so the crawler was able to fetch the package even when the namespace provided does not exist. There is a mechanism for crawler to rewrite the coordinates as what is actually fetched (casedSpec). This casedSpec determines how the harvested information is stored. The drawback of using casedSpec to fix this issue is that components nuget/nuget/a/alphafs/2.0.1 and nuget/nuget/b/alphafs/2.0.1 will trigger two separate harvests, which is not ideal.

There is CoordinatesMapper in service to normalize coordinates to what actually exist in the component registry. Using this approach, all coordinates are corrected before sent to the crawler, the crawler can just go ahead and do its job in harvesting components. No change is necessary in the crawler with this approach.