zombocom/heapy

Count string length as memory footprint

stereobooster opened this issue · 2 comments

memsize for string is 40b (as any ruby object). But there is also memory consumed for storing actual string content

Maybe use something like this?

x["memsize"] + x["value"].to_s.bytes.count 

That's a really good point, however we need to be careful with the calculations.

Ruby strings that are 23 characters or under can store their entire contents inside of the Ruby object and will take up 40b. Any characters over that will trigger additional storage.

The problem is that neither memsize or bytes gives us this info easily:

require 'objspace'
ObjectSpace.memsize_of("foooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooo")
# => 40

Uses more than 40, but only lists 40.

"foo".bytes.count
# => 3

Takes up more than 3 bytes.

The suggested calculation isn't correct out of the box. For example, this is a heap dump line of a string object that has fewer than 23 chars:

hash = JSON.parse('{"address":"0x7fb473814578", "type":"STRING", "class":"0x7fb4740dced0", "frozen":true, "embedded":true,
 "fstring":true, "bytesize":18, "value":"block in decrement", "memsize":40, "flags":{"wb_protected":true, "old":true, "long_lived":true, "m
arked":true}}')
irb(main):006:0> hash["memsize"] + hash["value"].to_s.bytes.count
# => 58

It should only be reported as taking up 40bytes, but this is showing 58 bytes which is incorrect.

I generated an example to see what the memsize of a large string would be in Ruby 2.7.1

require 'objspace'
ObjectSpace.trace_object_allocations_start


value = "helllllllllllllllllllllllooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooo"

io=File.open("/tmp/my_dump", "w")
ObjectSpace.dump_all(output: io); 
io.close

This gives a line like this:

{"address":"0x7f8ce5034278", "type":"STRING", "class":"0x7f8ce10b7578", "frozen":true, "fstring":true, "bytesize":298, "value":"helllllllllllllllllllllllooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooo", "encoding":"UTF-8", "memsize":339, "flags":{"wb_protected":true}}

You can see the memsize is 339 and not 40 so I believe the current calculations are correct.