awslabs/damo

damo record for storing DAMOS actions applied hisotory

honggyukim opened this issue · 26 comments

Hi SeongJae,

Sorry for throwing many draft ideas, but I'm just wondering if it's possible to keep the history of DAMOS actions applied.

The current damo schemes is very useful when operating our custom DAMOS actions, but we would like to keep the history when and how some regions are affected by the registered actions.

I don't have a strong idea how to display such data as of now, but I'm just leaving the idea for the future record.

Thanks!

sj-aws commented

Hi Honggyu,

Sorry for throwing many draft ideas

Your brilliant ideas are really helpful, please don't say that.

we would like to keep the history when and how some regions are affected by the registered actions.

Partly for this purpose, we have developed damo status and damo show --tried_regions_of. As always saying, the features are not yet stable, and therefore the interface could be changed, but we will support such capability anyway. I guess you also aware of the feature, right? And if so, I guess you think those are insufficient for your case, because those show only a snapshot? For recording, I think you could use the feature repeatedly and save the output. I believe this will work if you are using sufficiently large aggregation interval.

I agree that it might not be sufficient for your case. That is, I think you might want to use small aggregation interval and/or think the snapshot retrieving overhead is too high. If that's the case, I think we can make DAMOS to apply the actions in its own time interval rather than aggregation interval, or add yet another tracepoint for DAMOS tried regions, so that you can record entire tried regions for every trial.

Hi SeongJae,

And if so, I guess you think those are insufficient for your case, because those show only a snapshot?

Yes, I would like to keep the record and examine it when needed.

For recording, I think you could use the feature repeatedly and save the output.

I'm afraid that it affects the performance because I saw that writing commit to state looked a bit costly.

I think it would be useful to capture only when some regions are affected by some DAMOS actions and the detailed information of those regions.

or add yet another tracepoint for DAMOS tried regions, so that you can record entire tried regions for every trial.

That might be the one I was looking for. But it looks we should insert custom tracepoints inside DAMON kernel code.

Thanks very much for your explanation!

sj-aws commented

I'm afraid that it affects the performance

Agreed. Unless the max_nr_regions is small enough and aggr_interval is small, the overhead could be significant. We're planning on more snapshot overhead controlling for the reason, but the feature is obviously not designed for recording of every findings in such case.

That might be the one I was looking for. But it looks we should insert custom tracepoints inside DAMON kernel code.

Glad to hear that. And sure, it needs update of DAMON (kernel) change. I will start work on it.

Glad to hear that. And sure, it needs update of DAMON (kernel) change. I will start work on it.

Thanks very much for your support. It will be very helpful when collecting stats from how the DAMOS actions are applied and we can examine the final performance result.

sj-aws commented

An RFC patch for the kernel part change has posted: https://lore.kernel.org/damon/20230827004045.49516-1-sj@kernel.org/

Thanks very much for your support. I will apply the patch and see how I can use it in damo as well.

sj-aws commented

Support of the feature in damo is still a todo item. I'm planning to add an option to damo record for specifying which scheme's tried regions to report, like damo show --tried_regions_of. At the moment, you could use perf to record it and show the results via perf script like command.

_damon_result would need some update to deal with the traceevent format.

Thanks. I think it'd be useful if the tracepoint is recorded with --record in schemes command as follows.

$ damo schemes --record -c config.json ...

We could also think about support running damo schemes and damo record separately as well.

sj-aws commented

it'd be useful if the tracepoint is recorded with --record in schemes command

I'm not sure if it would be a good option, since it might make the roles of schemes and record a little bit confusing. I think adding --record option to start command instead might make sense. That said, record command supports --damos_* options. So, adding --record option to start command might not make much sense. From here, one question might follow. Why letting scheme do recording is not good while letting record do DAMOS control is ok? That's since DAMOS is the part of DAMON, while "recording" is somewhat related with DAMON-external components including perf and damo's logics.

I agree this is somewhat confusing, and I was willing to even deprecate schemes command. I changed the mind recently, and now thinking keeping schemes command for only beginners or people who shown the past demonstration of the command, with limited capabilities.

I think the absence of the documents for the commands might made you confused. Sorry for the inconvenience.

support running damo schemes and damo record separately

Good point. Nevertheless, this is already supported by damo. Executing damo record without monitoring target argument makes it to check if DAMON is running, and record its monitoring results. I think this would better to be clearly documented, but we don't have good such document yet. Sorry for your inconvenience. I'm gonna write some.

sj-aws commented

Implemented a basic support of this feature via[1]. It passed only a minimal test, and the interface (option name and etc) might be changed in near future, though.

[1] 15fc065

Implemented a basic support of this feature via[1]. It passed only a minimal test, and the interface (option name and etc) might be changed in near future, though.

Sorry for the late response and thanks very much for the support.

I was willing to even deprecate schemes command.

I have a problem when running damo schemes background, it breaks the terminal. So it'd be much better if there is a way to run damo start for the equivalent to damo schemes -c action.json, but non-blocking, which means immediately returns so that I can run the other commands right away.

This is especially needed when writing a automated script because damo schemes goes to blocking status waiting for Ctrl-C is pressed.

I have a problem when running damo schemes background, it breaks the terminal.

Let's say there is a script file as follows.

$ cat script.sh
#!/bin/bash -x

sudo ./damo schemes -c pageout.json &
DAMO_PID=$!
echo "damo pid: $DAMO_PID"
sleep 3     # Do something here!

If I run it, then it terminatesand I saw it stops the running kdamond properly.

$ ./script.sh
+ DAMO_PID=1256368
+ echo 'damo pid: 1256368'
damo pid: 1256368
+ sleep 3
+ sudo ./damo schemes -c pageout.json
Press Ctrl+C to stop
+ sudo kill 1256368

signal 15 received

However, it breaks my terminal and show the output weird as follows. I have run ps and pwd but I don't see the characters that I typed and it shows the output in a broken way.

    PID TTY          TIME CMD
                             1254435 pts/4    00:00:00 bash
                                                           1256463 pts/4    00:00:00 ps
                                                                                       $ 
/home/honggyu/work/damo
                       $ 

To avoid this problem, I think it'd be useful if there is a way to run damo schemes in a non-blocking way so that I can avoid running the command in background in my shell script.

sj-aws commented

Hi Honggyu,

it'd be much better if there is a way to run damo start for the equivalent to damo schemes -c action.json, but non-blocking

damo start supports the -c option. You should be able to do that with the option. e.g., damo start -c action.json. Please let me know if it doesn't work.

I have a problem when running damo schemes background, it breaks the terminal.

I tried your script on my test machine but the issue doesn't reproduce. I guess some more things involved?

Hello,

I also had the same problem. I'm not 100% sure, but it seems like that problem is related to
https://askubuntu.com/questions/1459049/bash-script-launching-background-process-breaks-terminal-output-and-kills-backgr
, not damo. How about trying to use sudo -b instead of & ?

Right. The screen breaking problem is not from damo and even not related to this issue so we don't have to talk about it here.

I was willing to even deprecate schemes command.

I started to mention it because of this comment and I also think that we don't need schemes command separately.

sj-aws commented

Hi Honggyu, have you had a chance to test the feature[1] that we implemented for this issue? If so, could you please confirm if it works, or some bugs found?

[1] 15fc065

Hi Honggyu, have you had a chance to test the feature[1] that we implemented for this issue? If so, could you please confirm if it works, or some bugs found?

[1] 15fc065

Hi SeongJae, I thought that the following RFC patch was needed to test this feature.

An RFC patch for the kernel part change has posted: https://lore.kernel.org/damon/20230827004045.49516-1-sj@kernel.org/

If no, then I need to know the command sequence for testing. Could you please give more guide or update the document for this usage and expected output? Thanks.

sj-aws commented

Hi Honggyu, sorry for late response.

I thought that the following RFC patch was needed to test this feature.

You're right. I was thinking that you could test that from damon/next tree or mm tree. The pull request containing that has recently sent[1] to Linus, and merged into the mainline.

$ ../lazybox/git_helpers/find_change_from.py --subject "mm/damon/core: use nr_accesses_bp as a source of damos_before_apply tracepoint" linus/master
a72217ad596e ("mm/damon/core: use nr_accesses_bp as a source of damos_before_apply tracepoint")

I need to know the command sequence for testing.

You could use damo record, with --schemes_target_regions option. You could check the results with damo show or damo report as usual. Of course DAMON with a scheme should running.

[1] https://lore.kernel.org/mm-commits/20231101145447.60320c9044e7db4dba2d93e3@linux-foundation.org/

Hi SeongJae,

Thanks for your comment.

I've recorded masim with --schemes_target_regions as follows.

$ ./damo record --schemes_target_regions "./masim/masim masim/configs/stairs_30secs.cfg"                                                                 
Press Ctrl+C to stop                                                        
initial phase:                87,119 accesses/msec, 5001 msecs run
phase 0:                      94,896 accesses/msec, 2500 msecs run          
phase 1:                      91,556 accesses/msec, 2501 msecs run     
phase 2:                      94,948 accesses/msec, 2500 msecs run
phase 3:                      93,795 accesses/msec, 2500 msecs run
phase 4:                      61,970 accesses/msec, 2500 msecs run
phase 5:                      92,956 accesses/msec, 2500 msecs run
phase 6:                      93,795 accesses/msec, 2500 msecs run          
phase 7:                      92,798 accesses/msec, 2500 msecs run      
phase 8:                      93,480 accesses/msec, 2500 msecs run          
phase 9:                      93,443 accesses/msec, 2501 msecs run  
[ perf record: Woken up 1 times to write data ]                             
[ perf record: Captured and wrote 0.145 MB damon.data ]  

But I don't know how to see the result with report command. I saw the help message of damo report then tried each as follows.

$ ./damo report raw
no monitoring result in the file

$ ./damo report nr_regions
# <percentile> <# regions>

$ ./damo report wss
# <percentile> <wss>

$ ./damo report heats --heatmap stdout
Traceback (most recent call last):
  File "/home/root/damo/./damo", line 116, in <module>
    main()
  File "/home/root/damo/./damo", line 113, in main
    subcmd.execute(args)
  File "/home/root/damo/_damo_subcmds.py", line 31, in execute
    self.module.main(args)
  File "/home/root/damo/damo_report.py", line 38, in main
    subcmd.execute(args)
  File "/home/root/damo/_damo_subcmds.py", line 31, in execute
    self.module.main(args)
  File "/home/root/damo/damo_heats.py", line 314, in main
    set_missed_args(args, records)
  File "/home/root/damo/damo_heats.py", line 200, in set_missed_args
    guide = guides[0]
IndexError: list index out of range

Could you help me this out by showing the exact command sequence and output that I can expect with the feature? Thanks.

I used the kernel version as follows.

$ uname -r
6.6.0-14651-gd2f51b3516da
sjp38 commented

Hi Honggyu,

The command you used ($ ./damo record --schemes_target_regions "./masim/masim masim/configs/stairs_30secs.cfg") wouldn't install any DAMOS scheme. Hence there is no scheme target regions and nothing to record.

sj-aws commented

I confirmed installing scheme using --damos_* command line arguments like below works.

$ sudo ./damo record --damos_action pageout --damos_access_rate 0% 0% --damos_age 2s max --schemes_target_regions "../masim/masim ../masim/configs/stairs_30secs.cfg"
[...]
$ sudo ./damo report raw
base_time_absolute: 55 m 13.210 s

monitoring_start:                0 ns
monitoring_end:               3.722 s
monitoring_duration:          3.722 s
target_id: 0
nr_regions: 19
# start_addr     end_addr        length  nr_accesses   age
563226a71000-56322741e000 (   9.676 MiB)           0    20
56322741e000-5632279b9000 (   5.605 MiB)           0    20
7fffde5a9000-7fffde5c8000 ( 124.000 KiB)           0    20
7fffde5ca000-7fffde5f4000 ( 168.000 KiB)           0    20
56322718f000-5632279b9000 (   8.164 MiB)           0    20
56322659f000-5632271e2000 (  12.262 MiB)           0    20
7fffde5a9000-7fffde5c8000 ( 124.000 KiB)           0    20
[...]

Could you please test again like above?

Hi SeongJae,

Sorry for the late answer and also thanks for your help as always.

I've just tested what you suggested, then found damo record works fine.

$ ./damo record --damos_action pageout --damos_access_rate 0% 0% --damos_age 2s max --schemes_target_regions "./masim/masim ./masim/configs/stairs_30secs
.cfg"                                                                                                                                                    
Press Ctrl+C to stop                                                        
initial phase:                90,185 accesses/msec, 5001 msecs run          
    ...
phase 9:                      93,008 accesses/msec, 2500 msecs run          
[ perf record: Woken up 1 times to write data ]                             
[ perf record: Captured and wrote 0.215 MB damon.data (150 samples) ]  

Then it also shows damo report raw as follows.

$ ./damo report raw                                                         
base_time_absolute: 2 h 30 m 48.782 s                                       
                                                                            
monitoring_start:                0 ns                                       
monitoring_end:               2.829 s                                       
monitoring_duration:          2.829 s                                       
target_id: 0                                                                
nr_regions: 13                                                              
# start_addr     end_addr        length  nr_accesses   age                  
55d6fd215000-55d6fde5c000 (  12.277 MiB)           0    20                  
7ffda9ecd000-7ffda9ee7000 ( 104.000 KiB)           0    20                  
55d6fde5c000-55d6feb50000 (  12.953 MiB)           0    20                  
55d6feb50000-55d6fefc1000 (   4.441 MiB)           0    20                  
7f8fea640000-7f8feab4d000 (   5.051 MiB)           0    20                  
55d6fd215000-55d6fde5c000 (  12.277 MiB)           0    20                  
55d6fde5c000-55d6feb50000 (  12.953 MiB)           0    20                  
7f8fea642000-7f8feab4d000 (   5.043 MiB)           0    20
55d6feb50000-55d6fefc1000 (   4.441 MiB)           0    20
7ffda9ecd000-7ffda9f25000 ( 352.000 KiB)           0    20
55d6fd215000-55d6fdee0000 (  12.793 MiB)           0    20
55d6fdee0000-55d6febe9000 (  13.035 MiB)           0    20
7f8fea642000-7f8feab4d000 (   5.043 MiB)           0    20
    ...

The only thing I would like to confirm is that if it's correct if those above the list of region information only shows the regions that fit into the DAMOS scheme rule. Thanks.

sj-aws commented

if those above the list of region information only shows the regions that fit into the DAMOS scheme rule.

It should be. If not, that's something we need to investigate. Please let us know if you find such case.

Thanks. I will use this feature then tell you when something looks incorrect. But it looks working fine as I've tested so far.