New architecture proposal to reduce memory usage
Closed this issue · 4 comments
Hello.
I noticed that unicode-emoji
gem takes more memory than I expected from such a library. Just requiring the gem takes 7-8MB. I know that for today's standards it isn't a huge amount, but if many such gems were used then it could add up to unnecessary memory usage.
Here is a method I used to measure memory usage (I use get_process_mem
gem):
require 'get_process_mem'
def mem(&block)
raise ArgumentError, 'missing block' unless block
mem = GetProcessMem.new
before = mem.mb
block.call
after = mem.mb
return after - before
end
puts mem { require "unicode/emoji" }
Running multiple times this script, gives me numbers between 7-9MB (most of the time something around 7.6MB).
I also used memory_profiler
:
require 'memory_profiler'
report = MemoryProfiler.report do
require 'unicode/emoji'
end
report.pretty_print
It gives the information where memory is allocated but also how much of it is retained (which in most cases means it will never be freed).
I'm not sure how exactly people use this gem but looking at the content I suspect that most probably they use one of provided regex constants. And this is the case in the application I work on. We literally use single regex from this library (Unicode::Emoji::REGEX
).
What can be done to lower memory usage? Here is the idea:
- instead of generating all the Regex constants, they could be generated offline and included directly in a file
- every constant could go to different file and
autoload :FOO, File.expand_path('emoji/foo', __dir__)
could be used to lazy load it when it is used INDEX
could be lazy loaded too. If all regexes were generated offline then it would be only used by methods likeproperties
. No method calls? No constant loaded.- some regexes are quite big (
require "object_space"; ObjectSpace.memsize_of(...)
), for exampleREGEX_VALID_INCLUDE_TEXT
is almost 0.5MB. I didn't look closely but I think that some big unions like "||..." could be replaced by range "[char1-charN]" (if it is sequence of subsequent characters of course).
If done properly then for usage scenario like mine (single constant), memory usage would be reduce from 7-8MB to a size of that constant (in our case it is 120kB).
Do you think it is worth looking into it?
Hi @radarek,
thanks for brining this up and doing the researches.I think it would be great to optimize the index structure and memory behavior for typical use cases.
Some feedback to your thoughts:
instead of generating all the Regex constants, they could be generated offline and included directly in a file
Sounds good.
every constant could go to different file and autoload :FOO, File.expand_path('emoji/foo', dir) could be used to lazy load it when it is used
INDEX could be lazy loaded too. If all regexes were generated offline then it would be only used by methods like properties. No method calls? No constant loaded.
Is the autoload currently encouraged? I have always liked it, and if concurrency issues can be ruled out, I am all for it
some regexes are quite big (require "object_space"; ObjectSpace.memsize_of(...)), for example REGEX_VALID_INCLUDE_TEXT is almost 0.5MB. I didn't look closely but I think that some big unions like "||..." could be replaced by range "[char1-charN]" (if it is sequence of subsequent characters of course).
I haven't looked into optimizing the generated regexes, so this sounds exciting.
Hi @janlelis
thanks for brining this up and doing the researches.I think it would be great to optimize the index structure and memory behavior for typical use cases.
So foar I'm focused on lazy loaded constants and optimizing size of regexes. Index indeed could be optimized too.
Is the autoload currently encouraged? I have always liked it, and if concurrency issues can be ruled out, I am all for it
I thought that too but it looks that it is still a valid way to load ruby code:
- https://bugs.ruby-lang.org/issues/921 - it is an old issue about thread unsafety of autoloading in Ruby but it was resolved more that 10 years ago
- there was a plan to remove it from the language but Matz decided to not do this (at least not in a near future). See https://bugs.ruby-lang.org/issues/5653
- it is used by rails itself (zeitwerk uses autoload underhood)
- autoload is used in bundler https://github.com/rubygems/bundler/search?q=autoload
- autoload is used in Ruby's core/stdlib
Having all the above I think it is safe to use it.
Released with v3.0.0!