siderolabs/image-factory

[bug] http 500 on some pulls

Closed this issue · 15 comments

When the listed systemExtentions have a different order than the one from the GUI factory, they have a different hash.

However, when this is the case, they also often fail with HTTP error 500:

◰ watching nodes: [192.168.10.141]
    * 192.168.10.141: rpc error: code = Unknown desc = error validating installer image "factory.talos.dev/installer/f7ae046f90d0ff4ac315a1810c77e5cf7357fe68e2181aad4a6175f5d6defc0b:v1.5.5": 2 error(s) occurred:
    failed to pull image "factory.talos.dev/installer/f7ae046f90d0ff4ac315a1810c77e5cf7357fe68e2181aad4a6175f5d6defc0b:v1.5.5": failed to resolve reference "factory.talos.dev/installer/f7ae046f90d0ff4ac315a1810c77e5cf7357fe68e2181aad4a6175f5d6defc0b:v1.5.5": pulling from host factory.talos.dev failed with status code [manifests v1.5.5]: 500 Internal Server Error

I've seen this happen multiple times, with multiple different selections of extentions

Note:
Previously filed as: siderolabs/talos#7986

When the listed systemExtentions have a different order than the one from the GUI factory, they have a different hash.

This is expected, order of system extensions might matter as they are layered one on top of another, this is not a bug.

The error might have been more helpful, but it looks like f7ae046f90d0ff4ac315a1810c77e5cf7357fe68e2181aad4a6175f5d6defc0b has never been registered as a schematic.

When the listed systemExtentions have a different order than the one from the GUI factory, they have a different hash.

This is expected, order of system extensions might matter as they are layered one on top of another, this is not a bug.

Correct, that's just the context, not the bug.
Hence the "however"

The error might have been more helpful, but it looks like f7ae046f90d0ff4ac315a1810c77e5cf7357fe68e2181aad4a6175f5d6defc0b has never been registered as a schematic.

There is nothing in the docs about something needed to be "registered" as a schematic.

How it's writen in the readme, is that registration happens at pull (as they get build to order, as listed in the section about docker pulls), not on get-request.

If the schematic is not registered, it's invalid: https://github.com/siderolabs/image-factory#post-schematics

If the bug reproduces with registered schematics, please feel free to re-open.

If the schematic is not registered, it's invalid: https://github.com/siderolabs/image-factory#post-schematics

If the bug reproduces with registered schematics, please feel free to re-open.

can this at least actually get documented?
The link you posted doesn't anywhere state this.

:schematic is a schematic ID returned by POST /schematic

I don't know how it can be more clear? How that ID got acquired in the first place?

I don't know how it can be more clear?

That doesn't say anything about registration, just that it's returned by something.
A lot of API docs say "returned by x", even for simple itterative numbers that exist with or without calling that function.

ill pr a change to make this more clear.

This is the call which registers the schematic, if it's not registered, the ID is meaningless, as it's not clear where it came from. I'm probably missing something, but I don't see how it can be otherwise. You're like asking for schematic 5, while 5 only exists in your imagination, how would image factory know about it?

This is the call which registers the schematic, if it's not registered, the ID is meaningless, as it's not clear where it came from. I'm probably missing something, but I don't see how it can be otherwise. You're like asking for schematic 5, while 5 only exists in your imagination, how would image factory know about it?

Talhelper generated it in this case, which is their mistake tbh, see above referenced issue.
But how do we, as users know, this registration is a thing from the docs? We don't because it isn't explicitly stated. If something/syntax/registration is a requirement it should be explicitly stated.

But the mistake is specifying this in the docker section:
If the image hasn't been created yet, it will be built on demand automatically.

So, its not odd to read this and think "ohh if I call a certain id, it gets build automatically".
Combined with the above post/get not specifying this is, in fact, a registration system and id's do not exist before registration, this assumes the docker pull IS the registation.

I've already filed a PR to make the above post/get more clear

I think you should redirect this to talhelper, Image Factory docs are clear about what :schematic is.

I get the feeling you're either very busy and not reading carefully...

I think you should redirect this to talhelper

As you would've seen above the issue was already forwarded to talhelper, even linked this issue.
The thing is: Both I and the dev of talhelper, didn't read your text the way you intended it to be read.

Image Factory docs are clear about what :schematic is.

It's clear what :schematic refers to, however your docs currently no-where mentions they don't exist before being generated and the docker section even heavily suggests the other way around.

To be clear: I get if from your docs, now that you explain what you meant.
But the text is vague at best and can be read in multiple different ways. You basically assume that people read it the way you do.

It's no shame of writhing docs that can be read in multiple ways, happens to all of us. But you can at least take feedback about it not being clear to, at least, two other developers and not be defensive about it. Because it turns out to not-be as clear as you assumed it to be.

See:
#62

Talos has extensive documentation for you to follow. I hope that helps to understand things.

Talos has extensive documentation for you to follow. I hope that helps to understand things.

As I already wrote:
To be clear: I get if from your docs, now that you explain what you meant.


Anyway, That article has some of the same problematic language, that caused confusion to me here:

generates Talos boot assets on-demand

On-demand allows people to assume the same about the schematic. Yes, it's refering to the boot-assets... But easy to misunderstand

to the Image Factory to retrieve its ID:

Retrieve != create/register
Combined with "on demand" it might lead to people assuming that it exists without calling the API, hence it's retrieved not created.


TLDR:
It's not that I don't understand now that you explain it, but your choice of language and documentation is NOT clear on this for everyone. You can at least take seriously that for some people this isn't clear enough.

I perfectly got your point, I'm leaving links to other people who might hit this issue so that the things are more clear.