Metaspace/class leak using blackbird

Question

Metaspace/class leak using blackbird

Closed this issue a year ago · 40 comments

In a standard application, we see class loading level out around ~10-20k classes, however after applying the blackbird module we've seen our loaded classes metric grow until it reaches ~2 million and the JVM is restarted due to an OME. This is running on java 15, although I don't know if it's exclusive to java 15, or which codepath is creating the classes, or why they're not unloaded yet. I'm still working through debugging the problem.

What I know so far:
Encountered only when blackbird is in use.
Metaspace is filled over time.
Loaded classes as reported by the JVM approaches 2 million (where I'd expect tens of thousands)
Impacts java 15. Maybe others, but I don't have a reproducer to validate whether or not only java 15 is impacted.

Answer 1 · 2021-09-23T13:52:48.000Z

HI @carterkozak,

Was the OME you received java.lang.OutOfMemoryError: Compressed class space? If so, I can confirm that we too saw similar behavior with an app that we attempted to move from Afterburner to Blackbird.

Thanks,
Jeremy

Answer 2 · 2022-01-04T02:11:48.000Z

Hi @carterkozak and @norrisjeremy , any possibility of supplying a reproducer case? We run blackbird in production for a long time now and haven't noticed any such leaks. Do you generate classes dynamically at runtime and serialize them, or reload apps in an application server, or anything like that?

Answer 3 · 2022-05-17T15:21:07.000Z

we are seeing this consistently in our application (a bunch of services, all affected) in java 17.0.3 with kotlin 1.6.21

Answer 4 · 2022-05-17T19:59:13.000Z

Hi @carterkozak and @norrisjeremy , any possibility of supplying a reproducer case? We run blackbird in production for a long time now and haven't noticed any such leaks. Do you generate classes dynamically at runtime and serialize them, or reload apps in an application server, or anything like that?

FYI, in our case, our application strictly deserializes to Jackson annotated POJOs.
Not sure if it makes any difference, but we are also strictly dealing with CSV content utilizing jackson-dataformat-csv.
We do not generate any classes dynamically at runtime, nor do we any make use of any sort of application server.

Answer 5 · 2022-05-25T18:24:30.000Z

Any chance of a reproducer case, or maybe a heap dump of an affected application? I would be happy to look into this, but without some idea of where to look, I am not sure what the problem might be.

Answer 6 · 2022-05-25T18:43:45.000Z

I should be able to get a heap dump of one of our services next week.

Answer 7 · 2022-06-19T23:30:02.000Z

I've increased the reuse ObjectReader/ObjectWriter instances and also removed Spring's MessageConverter stuff, my load tests are currently unable to reproduce the issue, I might test again with the older version from a month ago.

Answer 8 · 2022-06-20T01:37:13.000Z

Could not closing JsonParser/JsonGenerator objects have such an effect? The kotlin module might have an issue there. A few functions in the kotlin module's Extensions.kt seem to not close parsers/generators they create.

Answer 9 · 2022-06-20T14:14:54.000Z

Hi @stevenschlansker,

I was able to finally write a simple test case to demonstrate the issue.

If I execute the following example with BlackbirdModule and -XX:CompressedClassSpaceSize=128m then it crashes fairly rapidly with a java.lang.OutOfMemoryError: Compressed class space, whilst it runs to completion with AfterburnerModule.

Thanks,
Jeremy

package bb147;

import com.fasterxml.jackson.databind.DeserializationFeature;
import com.fasterxml.jackson.databind.MappingIterator;
import com.fasterxml.jackson.databind.Module;
import com.fasterxml.jackson.dataformat.csv.CsvMapper;
import com.fasterxml.jackson.dataformat.csv.CsvParser;
import com.fasterxml.jackson.module.afterburner.AfterburnerModule;
import com.fasterxml.jackson.module.blackbird.BlackbirdModule;
import java.util.List;
import java.util.stream.Collectors;
import java.util.stream.StreamSupport;
import lombok.Data;

@SuppressWarnings("UnusedVariable")
public class Bb147 {

  private static final String csv =
      """
      a,b,c
      1,"2,3",4
      5,6,7
      8,,9
      ,"10,11",""";
  private static final int iterations = 1_000_000;

  public static void main(String[] args) throws Exception {
    Module mod;
    if (args.length == 0) {
      throw new IllegalArgumentException("no module");
    } else if (args[0].equalsIgnoreCase("afterburner")) {
      mod = new AfterburnerModule();
    } else if (args[0].equalsIgnoreCase("blackbird")) {
      mod = new BlackbirdModule();
    } else {
      throw new IllegalArgumentException("invalid module");
    }

    for (int i = 0; i < iterations; ++i) {
      var mapper = new CsvMapper();
      mapper.enable(CsvParser.Feature.TRIM_SPACES);
      mapper.enable(CsvParser.Feature.WRAP_AS_ARRAY);
      mapper.enable(DeserializationFeature.READ_UNKNOWN_ENUM_VALUES_AS_NULL);
      mapper.disable(DeserializationFeature.ADJUST_DATES_TO_CONTEXT_TIME_ZONE);
      mapper.disable(DeserializationFeature.FAIL_ON_UNKNOWN_PROPERTIES);
      mapper.registerModule(mod);

      var schema =
          mapper.schemaWithHeader().withColumnSeparator(',').withArrayElementSeparator(",");
      var reader = mapper.readerFor(Pojo.class).with(schema);

      MappingIterator<Pojo> iterator = reader.readValues(csv);
      Iterable<Pojo> iterable = () -> iterator;
      var list = StreamSupport.stream(iterable.spliterator(), false).collect(Collectors.toList());
    }
  }

  @Data
  public static class Pojo {

    private Long a;
    private List<Integer> b;
    private String c;
  }
}

Answer 10 · 2022-06-22T14:38:44.000Z

In our case this might have been caused by accidentally creating an ObjectMapper instance per request, similar to @norrisjeremy's example

Answer 11 · 2022-06-22T14:53:05.000Z

Reuse is certainly ideal, but I wonder if we could/should hold a static cache to prevent runaway memory utilization when non-performance-sensitive codepaths create per-use mappers over time?

Answer 12 · 2022-06-23T13:28:27.000Z

Based on some testing I performed, it appears that every CallSite instance created by LambdaMetafactory.metafactory() instantiates a dynamically generated Class instance by the JVM.

I'll see if we can reorganize our application to maintain persistent ObjectMapper & ObjectReader instances and confirm that it prevents us from hitting the Compressed class space OOM.

Answer 13 · 2022-06-24T20:10:47.000Z

Yes, it is expected that the generated classes are not shared between ObjectMappers. Blackbird assumes that mappers are long lived and reused. I am surprised that the generated classes aren’t garbage collected once the mapper goes out of scope in the example program. Perhaps there’s no safepoint since the method doesn’t exit. I’ll play around with it a bit later. I don’t think Blackbird should share a static cache by default, since that could defeat class unloading as all classes would be referenced forever. Perhaps we could add a configuration option, but I think generally it is up to the end user program to establish the lifetime of sharing by keeping the mapper around.

…

On Jun 23, 2022, at 9:28 AM, Jeremy Norris ***@***.***> wrote: Based on some testing I performed, it appears that every CallSite instance created by LambdaMetafactory.metafactory() instantiates a dynamically generated Class instance by the JVM. I'll see if we can reorganize our application to maintain persistent ObjectMapper & ObjectReader instances and see if that prevents us from hitting the Compressed class space OOM. — Reply to this email directly, view it on GitHub, or unsubscribe. You are receiving this because you were mentioned.

Answer 14 · 2022-06-24T20:19:34.000Z

I wonder if they aren't unloaded b/c the generated classes are attached to the POJOs (Pojo$$Lambda...), and since the POJO classes aren't ever unloaded, they end up remaining for the life of the application?

Answer 15 · 2022-06-24T22:00:25.000Z

I should have been more precise, sorry about that. Static cache in the sense that we should be able to reuse the lambdas in some fashion that’s tied to statics (classes themselves) as opposed to individual mappers. I agree that we don’t want to prevent class unloading.

Answer 16 · 2022-06-24T22:07:59.000Z

@norrisjeremy Makes sense; just one clarification -- while use of a small number of ObjectMappers makes sense (singleton, static, or whatever), there is no need to worry about ObjectReaders/ObjectWriters since they are created from ObjectMapper and are designed to be cheap to create, throw away (you can of course retain them for reuse too). So they can be (but don't have to be) constructed on per-call basis, unlike mappers. All actual caching state is passed on by ObjectMapper and ObjectReaders/ObjectWriters do not own any those.

Answer 17 · 2022-06-24T22:23:12.000Z

@cowtowncoder You are correct, I confirmed with a tweaked version of my test case that repeatedly creating ObjectReaders didn't result in an OOM, only when repeatedly creating ObjectMappers.

In the case with our application, it was easier for me to modify it to keep a small set of ObjectReaders rather than ObjectMappers.
I can successfully report with this change (maintaining the ObjectReaders), that after 24h our application has run without issue using Blackbird on a QA instance, where-as prior it threw an OOM within minutes.

Answer 18 · 2022-06-24T22:28:08.000Z

@norrisjeremy cool, yeah that's fine. Just thought I'd mention it in case it helped in some ways. So the construction (and throwing away) of ObjectMappers is the issue, but if you only create limited number, then construct ObjectReaders and/or ObjectWriters to hold on to (as sort of surrogates), you don't need to keep a ref to mapper.
And you can also create new readers and writers from existing instances too.

Answer 19 · 2022-06-28T14:08:03.000Z

Hi!
We have similar issue with memory leak.
It is legal to have one static instance of BlackbirdModule per JVM to resolve this issue?

Answer 20 · 2022-06-28T15:36:35.000Z

@john9x Usually it's not the Module in question that matters but ObjectMappers. But I'll let @stevenschlansker confirm.

Answer 21 · 2022-06-28T18:56:07.000Z

It is legal to reuse the BlackbirdModule itself but it does not help the problem at all. The Module holds no state.

The right solution is to reuse ObjectMapper (or the ObjectReader/ObjectWriter instances per above) for a long time in your application. If you create a new Mapper every request, or every document, you will lose all performance benefits and in fact likely have much worse performance than if you didn't even use Blackbird at all.

We may make improvements in order that such misconfiguration doesn't crash your application, but it is not likely to ever be fast.

Answer 22 · 2022-06-28T19:08:54.000Z

Should some documentation be added indicating the potential pitfalls of using BlackbirdModule whenever users are creating large numbers of ObjectMappers (since it is pretty easy to trigger an OOM if you aren't careful), especially since it wouldn't happen with AfterburnerModule?

Answer 23 · 2022-06-28T20:56:58.000Z

It'd definitely be useful to outline specific problems. README's can be updated with PRs so if anyone could come up with something I'd be happy to merge.
(and/or give access to Wiki if need be).

Answer 24 · 2022-06-28T21:27:24.000Z

It is true that Afterburner did not run OOM when used this way, but it still looks like it was never very performant. In particular for each new ObjectMapper, Afterburner still needs to re-do all reflective operations and generate the bytecode (although it does skip loading a new class). So even if you continue to use Afterburner (and still even if you only ever use vanilla), it is much better to re-use ObjectMapper instances.

Answer 25 · 2022-06-28T21:37:35.000Z

I think everyone is in agreement that it's best to reuse object mappers. However, it's common for infrequent, non-performance-sensitive operations to create object mappers in library code for one-off tasks (for example, a new client is created with an updated configuration, perhaps changing json ser/de settings). It's entirely reasonable for such operations to be slow, however currently blackbird effectively puts a limit on the number of times that can happen, thereby limiting the maximum application uptime.

Answer 26 · 2022-06-28T22:14:34.000Z

I think the fact, that the generated class stay loaded beyond the lifetime of an ObjectMapper, is still a bug worth fixing.

Answer 27 · 2022-06-28T22:35:21.000Z

+1 for ObjectMapper reuse; not sure how much to emphasize it but possibly worth mentioning for both -- if you use throw-away/one-use ObjectMapper, you almost certainly should not register Afterburner either. There's plenty of overhead for single use. Ditto for Blackbird.

@pschichtel I don't think anyone disagrees that ideally lifetime would not extend past that of mappers. But in case that is not easily achievable (or until this can be done which may be ways off at any rate, for already published versions etc), an additional warning would make sense.

Answer 28 · 2022-06-29T03:27:49.000Z

Thank you!
We have limited number of cached ObjectMapper`s but also

it's common for infrequent, non-performance-sensitive operations to create object mappers in library code for one-off tasks

I'm pretty sure that this behavior must be reflected in README with bold style.

Answer 29 · 2022-11-02T21:50:51.000Z

Just an update that this issue is not lost nor forgotten, I did manage to reproduce it but didn't find an obvious reason that the loaded lambdas are not eligible for GC. I intend to take another look soonish :)

Answer 30 · 2023-01-04T19:55:41.000Z

I think everyone is in agreement that it's best to reuse object mappers

Yes, but there are legitimate use cases for creating one-off ObjectMappers even for frequent operations. For example, Apache Druid creates a new ObjectMapper per response depending on the serialization options specified in the query.

Answer 31 · 2023-01-04T22:19:30.000Z

I think everyone is in agreement that it's best to reuse object mappers

Yes, but there are legitimate use cases for creating one-off ObjectMappers even for frequent operations. For example, Apache Druid creates a new ObjectMapper per response depending on the serialization options specified in the query.

That example seems to use one of 2 pre-constructed mappers (or actually one of 4, for json/smile varinat) tho, and does NOT create mappers on per-request basis.

But I agree that there are often cases where there is a need to use a larger if bounded number of mappers.

And for some cases even throw-away ones.

Answer 32 · 2023-01-04T23:10:39.000Z

That example seems to use one of 2 pre-constructed mappers (or actually one of 4, for json/smile varinat) tho, and does NOT create mappers on per-request basis.

I think decorateObjectMapper() can (and should) return a new ObjectMapper in some cases. See here

Answer 33 · 2023-01-05T00:05:58.000Z

That example seems to use one of 2 pre-constructed mappers (or actually one of 4, for json/smile varinat) tho, and does NOT create mappers on per-request basis.

I think decorateObjectMapper() can (and should) return a new ObjectMapper in some cases. See here

Ah. Yes indeed. I don't doubt that some usage like this is needed. Nor that it'd be best to avoid unintended memory retention. Even if ideally number of mappers should be bounded most of the time.

Answer 34 · 2023-07-12T02:16:52.000Z

I think I have a lead on why this is happening. Unfortunately, the current implementation of LambdaMetafactory explicitly asks (by passing the STRONG option) that the generated methods stay around until the classloader in InnerClassLambdaMetafactory.generateInnerClass:

Lookup lookup;
if (useImplMethodHandle) {
    lookup = caller.defineHiddenClassWithClassData(classBytes, implementation, !disableEagerInitialization,
                                                   NESTMATE, STRONG);
} else {
    lookup = caller.defineHiddenClass(classBytes, !disableEagerInitialization, NESTMATE, STRONG);
}
return lookup.lookupClass();

So, this is why the generated lambdas are never collected. I think this means unfortunately to fix this problem, Blackbird might have to remember generated class across ObjectMappers.

Answer 35 · 2023-07-12T03:19:13.000Z

Hmmh. Holding on to state across all mappers is a pretty big no-no (not done for anything else). But needs must I suppose.

Answer 36 · 2023-07-12T03:21:51.000Z

Or, we could just decide that this issue will not be fixed, and tell affected users that if they do not use a limited number of ObjectMappers, then Blackbird is not right for them.

Answer 37 · 2023-07-12T03:24:34.000Z

Right, that could be the answer as well.

Answer 38 · 2023-07-14T22:44:53.000Z

@carterkozak @norrisjeremy @pschichtel @john9x @sirianni ,

Per the above comment, I am inclined to say that this issue will not be fixed in Blackbird, because the Java libraries insist that constructed lambdas belong to the class they purport to be in. This means that each different Blackbird instance will construct lambdas that "belong to" your type. Fixing this will not be simple and would likely involve static caches that coordinate between different ObjectMappers, which we would like to avoid.

In the case where you are repeatedly constructing ObjectMapper instances, you do not have much to gain from Blackbird anyway, since each new instance will have to re-do codegen and compilation, which is much more expensive than reflection in the first place.

I would propose that we document this caveat in the readme and close this issue. Please let us know if you think there's anything we've missed here, or have a reason to think we should explore such static caching.

Answer 39 · 2023-07-14T22:49:10.000Z

I'm ok with documenting it as a requirement for Blackbird.
We've since adjusted our code to maintain lifetime ObjectMapper instances instead of creating them repeatedly.

Answer 40 · 2023-07-17T04:11:10.000Z

Thank you for your investigation! I'm sure that well documenting will be enough.