juhaku/utoipa

Collecting recursive schemas fails with a stack overflow in 5.x

jrudolph opened this issue · 9 comments

We have an api that uses a setup that boils to this:

use utoipa::{OpenApi, ToSchema};

#[derive(ToSchema)]
pub struct Instance {
    kind: Kind,
}

#[derive(ToSchema)]
pub enum Kind {
    MultipleNested(Vec<Instance>),
}

#[derive(ToSchema)]
pub struct Error {
    instance: Instance,
}

#[derive(OpenApi)]
#[openapi(components(schemas(Error)))]
pub struct ApiDoc {}

fn main() {
    let api = ApiDoc::openapi();
    println!("{:?}", api.to_json().unwrap());
}

This fails like that:

thread 'main' has overflowed its stack
fatal runtime error: stack overflow

in 5.x while it worked in 4.x.

Here's a backtrace from gdb:

[...]
#3762 0x0000555555565f1e in utoipa_recursion::{impl#3}::schemas (schemas=0x7fffffffced0) at src/main.rs:8
#3763 0x00005555555657be in utoipa_recursion::{impl#1}::schemas (schemas=0x7fffffffced0) at src/main.rs:3
#3764 0x0000555555565f1e in utoipa_recursion::{impl#3}::schemas (schemas=0x7fffffffced0) at src/main.rs:8
#3765 0x00005555555657be in utoipa_recursion::{impl#1}::schemas (schemas=0x7fffffffced0) at src/main.rs:3
#3766 0x000055555556653e in utoipa_recursion::{impl#5}::schemas (schemas=0x7fffffffced0) at src/main.rs:13
#3767 0x00005555555668d4 in utoipa_recursion::{impl#6}::openapi () at src/main.rs:18
#3768 0x0000555555564f66 in utoipa_recursion::main () at src/main.rs:23

Was there any change in how (bi-)recursive schemas are handled?

Cargo.toml for reference:

[package]
name = "utoipa-recursion"
version = "0.1.0"
edition = "2021"

[dependencies]
utoipa = "5.0.0"

Looking at the cargo expand output, this is probably a direct consequence of now recursively collecting subschemas in ToSchema::schemas? It could probably be fixed by keeping track of which schemas are in the process of being defined and pass them around while collecting subschemas to break the recursion. But that sounds like a breaking change again.

Oh no, this needs some further experimenting to see what can be done with the recursion issue.

#[derive(ToSchema)]
pub struct Instance {
   kind: Kind,
}

#[derive(ToSchema)]
pub enum Kind {
   MultipleNested(Vec<Instance>),
}

Like to code above, the issue is that there is a loop. Kind -> Instance -> Kind. And since utoipa 5.0.0 the schemas will be collected recursively, which causes forever loop. This worked in 4 because there was no automatic schema collection.

I need to see is there way to recognize compile time whether schema references is already collected thus not trying to collect it again.

It could probably be fixed by keeping track of which schemas are in the process of being defined and pass them around while collecting subschemas to break the recursion. But that sounds like a breaking change again.

Yeah fingers crossed, not necessarily, I have a hunch that I can check before calling T::schemas(....) that schemas.contains(T::name()) if it does not, then calling T::schemas(...). But this still needs some testing and experimenting with also generic types as well. Namely whether the T::name() will give correct result for looking for schemas.

Yeah fingers crossed, not necessarily, I have a hunch that I can check before calling T::schemas(....) that schemas.contains(T::name()) if it does not, then calling T::schemas(...).

Interesting idea, I think right now, the Vec is empty during the recursing call, because the calling schema is added only after all the dependencies.

(scrap that, it wasn't correct)

There's also some concern about code size blow-up because the calling schema has to do these kinds of recursion checks for all of its fields. It would probably be better if the schemas call would only add itself to the list of collected schemas and then recurse to its children. That way only a single check (for itself) would have to be added to the list. (That change could keep the interface of schemas strictly compatible but the behavior would still be incompatible).

As long as the user facing behavior / API does not change it's all good. But hard to say as of yet, this still needs some investigation. But quick "fix" for now is to avoid looped references.

Maybe one alternative would be to provide some sort of attribute e.g. schema(no_recurse) which could be used to break from the recursion.

#[derive(ToSchema)]
pub struct Instance {
   #[schema(no_recurse)]
    kind: Kind,
}

#[derive(ToSchema)]
pub enum Kind {
   MultipleNested(Vec<Instance>),
}

Maybe one alternative would be to provide some sort of attribute e.g. schema(no_recurse) which could be used to break from the recursion.

That would work for us.

@jrudolph Here is a PR for this feature #1137 Perhaps automated solution can be experimented in future, and optionally enabled via config flag eventually (so people get to choose whether they want to pay the performance cost of checking schema existence). But for now it seems to do the job very well.