FasterXML/jackson-modules-base

Metaspace/class leak using blackbird

Closed this issue · 40 comments

In a standard application, we see class loading level out around ~10-20k classes, however after applying the blackbird module we've seen our loaded classes metric grow until it reaches ~2 million and the JVM is restarted due to an OME. This is running on java 15, although I don't know if it's exclusive to java 15, or which codepath is creating the classes, or why they're not unloaded yet. I'm still working through debugging the problem.

What I know so far:
Encountered only when blackbird is in use.
Metaspace is filled over time.
Loaded classes as reported by the JVM approaches 2 million (where I'd expect tens of thousands)
Impacts java 15. Maybe others, but I don't have a reproducer to validate whether or not only java 15 is impacted.

HI @carterkozak,

Was the OME you received java.lang.OutOfMemoryError: Compressed class space? If so, I can confirm that we too saw similar behavior with an app that we attempted to move from Afterburner to Blackbird.

Thanks,
Jeremy

Hi @carterkozak and @norrisjeremy , any possibility of supplying a reproducer case? We run blackbird in production for a long time now and haven't noticed any such leaks. Do you generate classes dynamically at runtime and serialize them, or reload apps in an application server, or anything like that?

we are seeing this consistently in our application (a bunch of services, all affected) in java 17.0.3 with kotlin 1.6.21

Hi @carterkozak and @norrisjeremy , any possibility of supplying a reproducer case? We run blackbird in production for a long time now and haven't noticed any such leaks. Do you generate classes dynamically at runtime and serialize them, or reload apps in an application server, or anything like that?

FYI, in our case, our application strictly deserializes to Jackson annotated POJOs.
Not sure if it makes any difference, but we are also strictly dealing with CSV content utilizing jackson-dataformat-csv.
We do not generate any classes dynamically at runtime, nor do we any make use of any sort of application server.

Any chance of a reproducer case, or maybe a heap dump of an affected application? I would be happy to look into this, but without some idea of where to look, I am not sure what the problem might be.

I should be able to get a heap dump of one of our services next week.

I've increased the reuse ObjectReader/ObjectWriter instances and also removed Spring's MessageConverter stuff, my load tests are currently unable to reproduce the issue, I might test again with the older version from a month ago.

Could not closing JsonParser/JsonGenerator objects have such an effect? The kotlin module might have an issue there. A few functions in the kotlin module's Extensions.kt seem to not close parsers/generators they create.

Hi @stevenschlansker,

I was able to finally write a simple test case to demonstrate the issue.

If I execute the following example with BlackbirdModule and -XX:CompressedClassSpaceSize=128m then it crashes fairly rapidly with a java.lang.OutOfMemoryError: Compressed class space, whilst it runs to completion with AfterburnerModule.

Thanks,
Jeremy

package bb147;

import com.fasterxml.jackson.databind.DeserializationFeature;
import com.fasterxml.jackson.databind.MappingIterator;
import com.fasterxml.jackson.databind.Module;
import com.fasterxml.jackson.dataformat.csv.CsvMapper;
import com.fasterxml.jackson.dataformat.csv.CsvParser;
import com.fasterxml.jackson.module.afterburner.AfterburnerModule;
import com.fasterxml.jackson.module.blackbird.BlackbirdModule;
import java.util.List;
import java.util.stream.Collectors;
import java.util.stream.StreamSupport;
import lombok.Data;

@SuppressWarnings("UnusedVariable")
public class Bb147 {

  private static final String csv =
      """
      a,b,c
      1,"2,3",4
      5,6,7
      8,,9
      ,"10,11",""";
  private static final int iterations = 1_000_000;

  public static void main(String[] args) throws Exception {
    Module mod;
    if (args.length == 0) {
      throw new IllegalArgumentException("no module");
    } else if (args[0].equalsIgnoreCase("afterburner")) {
      mod = new AfterburnerModule();
    } else if (args[0].equalsIgnoreCase("blackbird")) {
      mod = new BlackbirdModule();
    } else {
      throw new IllegalArgumentException("invalid module");
    }

    for (int i = 0; i < iterations; ++i) {
      var mapper = new CsvMapper();
      mapper.enable(CsvParser.Feature.TRIM_SPACES);
      mapper.enable(CsvParser.Feature.WRAP_AS_ARRAY);
      mapper.enable(DeserializationFeature.READ_UNKNOWN_ENUM_VALUES_AS_NULL);
      mapper.disable(DeserializationFeature.ADJUST_DATES_TO_CONTEXT_TIME_ZONE);
      mapper.disable(DeserializationFeature.FAIL_ON_UNKNOWN_PROPERTIES);
      mapper.registerModule(mod);

      var schema =
          mapper.schemaWithHeader().withColumnSeparator(',').withArrayElementSeparator(",");
      var reader = mapper.readerFor(Pojo.class).with(schema);

      MappingIterator<Pojo> iterator = reader.readValues(csv);
      Iterable<Pojo> iterable = () -> iterator;
      var list = StreamSupport.stream(iterable.spliterator(), false).collect(Collectors.toList());
    }
  }

  @Data
  public static class Pojo {

    private Long a;
    private List<Integer> b;
    private String c;
  }
}

In our case this might have been caused by accidentally creating an ObjectMapper instance per request, similar to @norrisjeremy's example

Reuse is certainly ideal, but I wonder if we could/should hold a static cache to prevent runaway memory utilization when non-performance-sensitive codepaths create per-use mappers over time?

Based on some testing I performed, it appears that every CallSite instance created by LambdaMetafactory.metafactory() instantiates a dynamically generated Class instance by the JVM.

I'll see if we can reorganize our application to maintain persistent ObjectMapper & ObjectReader instances and confirm that it prevents us from hitting the Compressed class space OOM.

I wonder if they aren't unloaded b/c the generated classes are attached to the POJOs (Pojo$$Lambda...), and since the POJO classes aren't ever unloaded, they end up remaining for the life of the application?

I should have been more precise, sorry about that. Static cache in the sense that we should be able to reuse the lambdas in some fashion that’s tied to statics (classes themselves) as opposed to individual mappers. I agree that we don’t want to prevent class unloading.

@norrisjeremy Makes sense; just one clarification -- while use of a small number of ObjectMappers makes sense (singleton, static, or whatever), there is no need to worry about ObjectReaders/ObjectWriters since they are created from ObjectMapper and are designed to be cheap to create, throw away (you can of course retain them for reuse too). So they can be (but don't have to be) constructed on per-call basis, unlike mappers. All actual caching state is passed on by ObjectMapper and ObjectReaders/ObjectWriters do not own any those.

@cowtowncoder You are correct, I confirmed with a tweaked version of my test case that repeatedly creating ObjectReaders didn't result in an OOM, only when repeatedly creating ObjectMappers.

In the case with our application, it was easier for me to modify it to keep a small set of ObjectReaders rather than ObjectMappers.
I can successfully report with this change (maintaining the ObjectReaders), that after 24h our application has run without issue using Blackbird on a QA instance, where-as prior it threw an OOM within minutes.

@norrisjeremy cool, yeah that's fine. Just thought I'd mention it in case it helped in some ways. So the construction (and throwing away) of ObjectMappers is the issue, but if you only create limited number, then construct ObjectReaders and/or ObjectWriters to hold on to (as sort of surrogates), you don't need to keep a ref to mapper.
And you can also create new readers and writers from existing instances too.

Hi!
We have similar issue with memory leak.
It is legal to have one static instance of BlackbirdModule per JVM to resolve this issue?

@john9x Usually it's not the Module in question that matters but ObjectMappers. But I'll let @stevenschlansker confirm.

It is legal to reuse the BlackbirdModule itself but it does not help the problem at all. The Module holds no state.

The right solution is to reuse ObjectMapper (or the ObjectReader/ObjectWriter instances per above) for a long time in your application. If you create a new Mapper every request, or every document, you will lose all performance benefits and in fact likely have much worse performance than if you didn't even use Blackbird at all.

We may make improvements in order that such misconfiguration doesn't crash your application, but it is not likely to ever be fast.

Should some documentation be added indicating the potential pitfalls of using BlackbirdModule whenever users are creating large numbers of ObjectMappers (since it is pretty easy to trigger an OOM if you aren't careful), especially since it wouldn't happen with AfterburnerModule?

It'd definitely be useful to outline specific problems. README's can be updated with PRs so if anyone could come up with something I'd be happy to merge.
(and/or give access to Wiki if need be).

It is true that Afterburner did not run OOM when used this way, but it still looks like it was never very performant. In particular for each new ObjectMapper, Afterburner still needs to re-do all reflective operations and generate the bytecode (although it does skip loading a new class). So even if you continue to use Afterburner (and still even if you only ever use vanilla), it is much better to re-use ObjectMapper instances.

I think everyone is in agreement that it's best to reuse object mappers. However, it's common for infrequent, non-performance-sensitive operations to create object mappers in library code for one-off tasks (for example, a new client is created with an updated configuration, perhaps changing json ser/de settings). It's entirely reasonable for such operations to be slow, however currently blackbird effectively puts a limit on the number of times that can happen, thereby limiting the maximum application uptime.

I think the fact, that the generated class stay loaded beyond the lifetime of an ObjectMapper, is still a bug worth fixing.

+1 for ObjectMapper reuse; not sure how much to emphasize it but possibly worth mentioning for both -- if you use throw-away/one-use ObjectMapper, you almost certainly should not register Afterburner either. There's plenty of overhead for single use. Ditto for Blackbird.

@pschichtel I don't think anyone disagrees that ideally lifetime would not extend past that of mappers. But in case that is not easily achievable (or until this can be done which may be ways off at any rate, for already published versions etc), an additional warning would make sense.

Thank you!
We have limited number of cached ObjectMapper`s but also

it's common for infrequent, non-performance-sensitive operations to create object mappers in library code for one-off tasks

I'm pretty sure that this behavior must be reflected in README with bold style.

Just an update that this issue is not lost nor forgotten, I did manage to reproduce it but didn't find an obvious reason that the loaded lambdas are not eligible for GC. I intend to take another look soonish :)

I think everyone is in agreement that it's best to reuse object mappers

Yes, but there are legitimate use cases for creating one-off ObjectMappers even for frequent operations. For example, Apache Druid creates a new ObjectMapper per response depending on the serialization options specified in the query.

I think everyone is in agreement that it's best to reuse object mappers

Yes, but there are legitimate use cases for creating one-off ObjectMappers even for frequent operations. For example, Apache Druid creates a new ObjectMapper per response depending on the serialization options specified in the query.

That example seems to use one of 2 pre-constructed mappers (or actually one of 4, for json/smile varinat) tho, and does NOT create mappers on per-request basis.

But I agree that there are often cases where there is a need to use a larger if bounded number of mappers.

And for some cases even throw-away ones.

That example seems to use one of 2 pre-constructed mappers (or actually one of 4, for json/smile varinat) tho, and does NOT create mappers on per-request basis.

I think decorateObjectMapper() can (and should) return a new ObjectMapper in some cases. See here

That example seems to use one of 2 pre-constructed mappers (or actually one of 4, for json/smile varinat) tho, and does NOT create mappers on per-request basis.

I think decorateObjectMapper() can (and should) return a new ObjectMapper in some cases. See here

Ah. Yes indeed. I don't doubt that some usage like this is needed. Nor that it'd be best to avoid unintended memory retention. Even if ideally number of mappers should be bounded most of the time.

I think I have a lead on why this is happening. Unfortunately, the current implementation of LambdaMetafactory explicitly asks (by passing the STRONG option) that the generated methods stay around until the classloader in InnerClassLambdaMetafactory.generateInnerClass:

Lookup lookup;
if (useImplMethodHandle) {
    lookup = caller.defineHiddenClassWithClassData(classBytes, implementation, !disableEagerInitialization,
                                                   NESTMATE, STRONG);
} else {
    lookup = caller.defineHiddenClass(classBytes, !disableEagerInitialization, NESTMATE, STRONG);
}
return lookup.lookupClass();

So, this is why the generated lambdas are never collected. I think this means unfortunately to fix this problem, Blackbird might have to remember generated class across ObjectMappers.

Hmmh. Holding on to state across all mappers is a pretty big no-no (not done for anything else). But needs must I suppose.

Or, we could just decide that this issue will not be fixed, and tell affected users that if they do not use a limited number of ObjectMappers, then Blackbird is not right for them.

Right, that could be the answer as well.

@carterkozak @norrisjeremy @pschichtel @john9x @sirianni ,

Per the above comment, I am inclined to say that this issue will not be fixed in Blackbird, because the Java libraries insist that constructed lambdas belong to the class they purport to be in. This means that each different Blackbird instance will construct lambdas that "belong to" your type. Fixing this will not be simple and would likely involve static caches that coordinate between different ObjectMappers, which we would like to avoid.

In the case where you are repeatedly constructing ObjectMapper instances, you do not have much to gain from Blackbird anyway, since each new instance will have to re-do codegen and compilation, which is much more expensive than reflection in the first place.

I would propose that we document this caveat in the readme and close this issue. Please let us know if you think there's anything we've missed here, or have a reason to think we should explore such static caching.

I'm ok with documenting it as a requirement for Blackbird.
We've since adjusted our code to maintain lifetime ObjectMapper instances instead of creating them repeatedly.

john9x commented

Thank you for your investigation! I'm sure that well documenting will be enough.