Roave/BetterReflection

Prevent memory leaks by not holding onto AST nodes

ondrejmirtes opened this issue ยท 21 comments

Hi, I'm in the process of reducing the memory amount consumed by PHPStan, and I have some promising results.

Current stable version consumes ~850 MB when runing PHPStan on PHPStan.

After I removed the NodeConnectingVisitor (this isn't related to BetterReflection) that creates cycles between nodes by setting parent/previous/next attributes, I saved around 100 MB. So now we're at ~750 MB.

I continued to debug the problem by using php-meminfo but there was a lot of noise because of CachedParser (equivalent of MemoizingParser). So I removed parser caching to see what else holds onto AST nodes. I also removed static reflection capabilities altogether to see if there's something else in PHPStan that leaks. I consider the floor of what's possible to achieve at around ~200 MB.

When I returned static reflection capabilities, it went back to ~750 MB. And most of the memory is held in AST nodes held by various reflection classes like ReflectionClass etc.

So I refactored some classes to extract all of the needed information in constructor so that the AST node is not saved in a property: ondrejmirtes/BetterReflection@cc581ec...35db84f (the work isn't done, there's still more to do)

Right now PHPStan consumes around 340 MB without a CachingParser and around 430 MB with a CachingParser.

The best part is that there's very little performance impact time-wise so it looks really promising. I don't have production-ready code yet as there are some problems to work out but I hope this will make PHPStan much more resource-effective.

One downside is that I had to remove some features that need the AST after a Reflection class is created - it can be seen in the diff.

Is this something you'd like to do here too?

AFAIK, BR nodes are acyclic: the CachedParser was introduced on purpose, to prevent double-parsing of data (which happens a lot when jumping around a codebase, since we don't cache reflected symbols either).

IMO, a better approach would be to make the codebase as @immutable as possible:

  • ReflectionClass
  • ReflectionProperty
  • ReflectionType
  • ...

Besides some rare edge cases in their API, we can likely make these completely immutable, by deferring creation of stuff that requires I/O to the reflector itself.

This may potentially change the reflection API radically though ๐Ÿค”

I suppose the code removals you mentioned are mostly around getAst()?

BR nodes are acyclic

It might be. What I want to solve is when MemoizingReflector is involved, Reflection instances aren't freed from memory on purpose, but alongside they're pulling the AST nodes which also cannot be freed. So we have caching of Reflection instances which is a good thing, but the memory occupied is more than it could be - we could hold the same amount of Reflection information and free some memory.

What's also interesting that with my refactoring, there's also no difference in performance and occupied memory with gc_disable() called or not - everything is freed with refcounting and garbage collector doesn't have anything to do, which is what I meant with my tweet a few days ago (https://twitter.com/OndrejMirtes/status/1510904762711089157).

What I also do in PHPStan successfully and need to try to do it on various Memoizing* things in BR is to cap the cache size at certain limit. Same as with server-side development, it's not wise to treat cache as unlimited resource, so I evict the oldest items from the cache with this array_slice trick in my CachedParser: https://github.com/phpstan/phpstan-src/blob/6d25967ca2b174895bb8e40f7bcffad874d88d6a/src/Parser/CachedParser.php#L33-L42

I suppose the code removals you mentioned are mostly around getAst()?

Yeah. @kukulich noted we don't have to lose those features - we still have LocatedSource and the AST can be recreated if necessary.

Overall, I'm on board with changing library performance, but not as hacks/shuffling, but rather by sitting back and checking architecture with more care.

I'll put a new major as milestone, as I feel like such a feature/improvement is also leaving us space for major BC breaks.

Alright, so what do you think about removing the methods that won't have the data anymore, like getAst() and getDeclaringNamespaceAst()? I don't think I removed anything else, checking my diff at ondrejmirtes/BetterReflection@cc581ec...e594f06.

Also - it's still useful to keep the value expr of constants and default expr of parameters and properties for later access by library users. Also it's useful because it's beneficial to have CompileNodeToValue evaluated lazily, otherwise you run into infinite recursion problems in constructors.

Do you agree? @kukulich is interested in starting doing this :)

They would most likely be removed in an immutable version of the library.

We can also decide to drop them in a new major (without going through the architectural change).

Also: OK with dropping problematic functionality ๐Ÿ‘

Ok, it looks I will prepare PR with removing getAst() (and dependant) methods first.

Only ReflectionFunction* left, AFAIK!

Unfortunatelly also ReflectionClass, ReflectionMethod, ReflectionEnum and ReflectionObject :)

Ah, we reduced our MT score while working on this:

Error: ] The minimum required MSI percentage should be 97%, but actual is       
         96.87%. Improve your tests!     

Oh well ๐Ÿคท

Ah, we reduced our MT score while working on this:

Will look at it later.

What I'm missing in the current PRs are getters for the underlying expression values. These were previously accessible via the removed getAst() methods which were removed at my request, but the reflection classes should still give access to things like:

  • Constant value expression
  • Class constant value expression
  • Property default expression
  • Parameter default expression
  • Attributes arguments expressions
  • Enum case value expression

These are already saved in private properties because they're needed for CompileNodeToValue in the lazy getValue() etc. methods.

I need them because PHPStan completely avoids using CompileNodeToValue. Previously it used CompileNodeToValue to go "Expr -> runtime value -> Type" but it's no longer possible because these expressions can contain new and enum case references which means that obtaining runtime value relies on autoloading.

PHPStan now uses its own code (https://github.com/phpstan/phpstan-src/blob/b0babd082514303ec62deb5f16cf330f845c1821/src/Reflection/InitializerExprTypeResolver.php) to go "Expr -> Type" directly.

@Ocramius Are you fine with @kukulich adding these getters? Thank you.

Yes, it's OK to add them, but beware that we still cannot support objects as default values, such as in function foo($bar = new Baz()) {} ๐Ÿค”

This code is fine as long as the user doesn't call getValue() ๐Ÿ˜‚

@ondrejmirtes could you create a rough issue to track that? What API you want / how it should look like?

BTW, getAst() needs to be removed from docs too

Done: #1222

@ondrejmirtes waiting for your feedback for a 6.0.0 release BTW :)

Awesome, please give me a few days, I need to rebase my fork and make PHPStan work with that, after that I'll make some memory measurements to make sure it's the same or better :)

Creating a signpost issue for that ๐Ÿ‘