awslabs/damo

Is there a reason to choose the largest System RAM region for paddr monitoring?

honggyukim opened this issue · 7 comments

Hi SeongJae,

I have set the qemu memory configuration with NUMA node0=4G, node1=4G, node2=8G.

The more details about memory are as follows.

# cat /proc/iomem | grep System.RAM  
00001000-0009fbff : System RAM  
00100000-bffdffff : System RAM  
100000000-43fffffff : System RAM 

In the above log, there are 3 System RAM sections, but default_paddr_region in _damo_paddr_layout.py and damon_find_biggest_system_ram in mm/damon/core.c only find the largest System RAM region and set the start and end at

  • /sys/kernel/mm/damon/admin/kdamonds/0/contexts/0/targets/0/regions/0/start
  • /sys/kernel/mm/damon/admin/kdamonds/0/contexts/0/targets/0/regions/0/end

In this case, it looks like the scope outside of the largest System RAM is not monitored. As a result, the first two lines of the following scope are not monitored but the last scope is only monitored.

# cat /proc/iomem | grep System.RAM  
00001000-0009fbff : System RAM (NOT monitored)  
00100000-bffdffff : System RAM (NOT monitored)  
100000000-43fffffff : System RAM (MONITORED)

However, the missing scope is included in NUMA node 0 as follows.

# dmesg | grep NUMA 
[    0.008144] NUMA: Initialized distance table, cnt=3 
[    0.008146] NUMA: Node 0 [mem 0x00000000-0x0009ffff] + [mem 0x00100000-0xbfffffff] -> [mem 0x00000000-0xbfffffff] 
[    0.008148] NUMA: Node 0 [mem 0x00000000-0xbfffffff] + [mem 0x100000000-0x13fffffff] -> [mem 0x00000000-0x13fffffff] 
 
# dmesg | grep "node  " 
[    0.009121]   node   0: [mem 0x0000000000001000-0x000000000009efff] 
[    0.009124]   node   0: [mem 0x0000000000100000-0x00000000bffdffff] 
[    0.009125]   node   0: [mem 0x0000000100000000-0x000000013fffffff] 
[    0.009126]   node   1: [mem 0x0000000140000000-0x000000023fffffff] 
[    0.009127]   node   2: [mem 0x0000000240000000-0x000000043fffffff]

Could you please explain why the start and end of regions are set in this way?

I'm asking this because I see some damo schemes are not properly applied in paddr mode and the logic for physical address region selection looks suspicious.

sj-aws commented

Hi Honggyu,

There was no deep discussion for that, but I just wanted to make it simple. Maybe we could change the default region selection mechanism, but I'm wondering if it could make unexpected behavioral changes to other users. I think you could specify the target region address ranges using --regions option in the case. Would that work for you?

Thanks for the confirmation. The --regions option sounds good to me. I can parse /proc/iomem then pass the range info to the option.

But I think damo and DAMON in kernel have to keep a list of the System RAM regions and set nr_regions properly at /sys/kernel/mm/damon/admin/kdamonds/0/contexts/0/targets/0/regions/nr_regions. Then iterate the list of regions and set each start and end inside of /sys/kernel/mm/damon/admin/kdamonds/0/contexts/0/targets/0/regions/N/{start,end}.

I will use --regions option until the ideal solution is implemented. Thanks.

Maybe we could change the default region selection mechanism, but I'm wondering if it could make unexpected behavioral changes to other users.

It will make behavioral changes, but the fact that the current region only covers a partial system memory causes different execution behaviors whenever the same program is exceuted.

For example, if there is a simple program that allocates a large block of memory, then causes a prefault so that it will reside in the physical memory. In this case, if pageout DAMOS action is used in damo then the amount of memory, which is swaped out will be different depending on whether the allocated memory block is inside of the monitoring region or not. It could also be partially monitored, which makes only the partial region of memory is being swaped out.

So I think this should be fixed.

sj-aws commented

But I think damo and DAMON in kernel have to keep a list of the System RAM regions and set nr_regions properly at /sys/kernel/mm/damon/admin/kdamonds/0/contexts/0/targets/0/regions/nr_regions. Then iterate the list of regions and set each start and end inside of /sys/kernel/mm/damon/admin/kdamonds/0/contexts/0/targets/0/regions/N/{start,end}.

Thank you for nice suggestion.

It will make behavioral changes, but the fact that the current region only covers a partial system memory causes different execution behaviors whenever the same program is exceuted.

I think this makes sense. I will fix this. Thank you for patiently helping me understand the real issue! 👍

I think this makes sense. I will fix this. Thank you for patiently helping me understand the real issue!

Thanks very much for your support. That helps us finding the right way to solve our problems.