fastify/fast-json-stringify

v3.0.0 leaks memory

mattbishop opened this issue · 9 comments

Prerequisites

  • I have written a descriptive issue title
  • I have searched existing issues to ensure the regression has not already been reported

Last working version

2.7.13

Stopped working in version

3.0.0

Node.js version

16.13.2

Operating system

macOS

Operating system version (i.e. 20.04, 11.3, 10)

12.1

💥 Regression Report

I have been using fast-json-stringify directly in a web app the streams JSON data into ndjson. This app is about a year old. The app runs in 512mb RAM and has been stable, memory-wise, for the entire year.

Earlier this week I upgraded to fast-json-stringify 3.0.0 and my load tests have been causing the app to fail with OOM errors. I profiled the memory using Chrome dev tools and found that the validateSchema method was consuming more and more RAM as the test progressed before crashing due to inability to allocate memory.

I reverted back to 2.7.13, reran my tests and they passed as before.

Steps to Reproduce

I tried using leakage to create a test (https://www.npmjs.com/package/leakage) but it would not compile on my environment.

Expected Behavior

Same memory usage as 2.7.13.

Eomm commented

clinicjs will help you debugging this:

https://clinicjs.org/documentation/heapprofiler/

It's kind of impossible to help with this without a way to reproduce the problem you are facing. Could you create a small server that reproduce your problem?

How are you using this module? You should work to create a minimal, reducible example of your leak just with a simplified case so that we can help.

Note that heap profiling show you what function is allocating this much and what kind of data is leaking.

Since the major change from 3.0.0 is updating ajv@6 to ajv@8.
I am thinking are we suffering from this issue ajv-validator/ajv#1413

We may need to identify a new way to initialize the ajv instance.

https://www.poberezkin.com/posts/2021-02-11-ajv-version-7-big-changes-and-improvements.html#caching-compiled-schemas

But if you pass the new instance of the schema, even if the contents of the object is deeply equal, ajv would compile it again. In version 6 Ajv used a serialized schema as a cache key, and it partially protected from the incorrect usage of compiled validation functions, but it had both performance and memory costs. Some users had this problem when migrating to version 7.

I think it can be confirmed the issue should be related to how we use ajv.

valid = $validateWithAjv(${JSON.stringify(i)}, obj)

${index === 0 ? 'if' : 'else if'}($validateWithAjv(${JSON.stringify(location.schema)}, ${testValue}))

${index === 0 ? 'if' : 'else if'}($validateWithAjv(${JSON.stringify(location.schema)}, ${testValue}))

Based on the lines above and the information inside the issue.
We are actually compiling the validation function again and again.

Good spot! Looks like we should definitely fix this.

It seems this function is really incorrect too:

const $validateWithAjv = (function() {
const cache = new Set()
return function (schema, target) {
const id = schema.$id
if (!id) {
return ajv.validate(schema, target)
}
const cached = cache.has(id)
if (cached) {
return ajv.validate(id, target)
} else {
cache.add(id)
return ajv.validate(schema, target)
}
}
})()
.

I think the solution should be to move the JSON.stringify(location.schema) elsewhere and just keep the references around. We could use the same logic we used for

const arrayItemsReferenceSerializersMap = new Map()
.

MSE99 commented

@mcollina I wrote that function, at the time it was a hack to get around ajv throwing if a schema is compiled twice with the same id, I'm sorry. a better solution would be to load all schemas at build time using ajv.addSchema(schema, key) we would use the $id value of the schema as the key argument if the schema does not have an $id we could generate one using crypto.randomUUID(), this way we don't do any additional book keeping and ajv.validate does all the work.