awslabs/damo

`--output_type` parameter not working

denisgabriel5 opened this issue · 9 comments

Environment

I am running damo on a virtual machine. Below are the details
Host:

$ lsb_release -a
No LSB modules are available.
Distributor ID: Ubuntu
Description: Ubuntu 23.04
Release: 23.04
Codename: lunar

$ uname -r
6.2.0-24-generic

Virtual machine:

$ lsb_release -a
No LSB modules are available.
Distributor ID: Ubuntu
Description: Ubuntu 23.04
Release: 23.04
Codename: lunar

$ uname -r
6.1.6+

$ grep DAMON linux/.config
CONFIG_DAMON=y
CONFIG_DAMON_VADDR=y
CONFIG_DAMON_PADDR=y
CONFIG_DAMON_SYSFS=y
CONFIG_DAMON_DBGFS=y
# CONFIG_DAMON_RECLAIM is not set
# CONFIG_DAMON_LRU_SORT is not set

Issue

So, when I try to run damo as in the example from USAGE I get the following error:

$ damo record "sleep 5"
Press Ctrl+C to stop
[ perf record: Woken up 1 times to write data ]
failed to mmap file
[ perf record: Captured and wrote 0.085 MB damon.data ]

WARNING: 'record' file type support will be deprecated by 2023-Q3.
use json_compressed type instead.
Please report your usecase to Github issues[1], sj@kernel.org,
damon@lists.linux.dev and/or linux-mm@kvack.org if you depend on those.

[1] https://github.com/awslabs/damo/issues

converting format from perf_data to json_compressed failed (snapshot reading failead: unpack requires a buffer of 8 bytes)

So, I was thinking to specify the output type, but it still doesn't work:

$ damo record --output_type json "sleep 5"
Press Ctrl+C to stop
[ perf record: Woken up 1 times to write data ]
failed to mmap file
[ perf record: Captured and wrote 0.084 MB damon.data ]

WARNING: 'record' file type support will be deprecated by 2023-Q3.
use json_compressed type instead.
Please report your usecase to Github issues[1], sj@kernel.org,
damon@lists.linux.dev and/or linux-mm@kvack.org if you depend on those.

[1] https://github.com/awslabs/damo/issues

converting format from perf_data to json failed (snapshot reading failead: unpack requires a buffer of 8 bytes)

Am I doing something wrong?

sj-aws commented

Hi Denis,

Thank you so much for this nice issue reporting. failed to mmap file at the middle of the output is what not expected, and seems it failed at creating the record file.

And from the prompt of your command line, I'm wondering if you executed damo with sudo permission or not. Could you please confirm that? damo should check that on its own, though.

Also, damo record uses perf internally, so I'm wondering if it could be a problem of perf. Could you please try below?

$ # ensure sudo permission
$ sudo su
# # start perf to record DAMON events in background
# perf record -e damon:damon_aggregated &
# # start DAMON for sleep 5
# ./damo start "sleep 5"
# # wait for 5 seconds
# sleep 5
# # stop perf
# killall perf

Then, could please further share a few output of perf script?

I can confirm that I executed damo with sudo permission, since the virtual machine that I am using only has the root user.

Initially I had some issues wiht perf, but now it works without issues.

By the way, I do not know if it matters, but I installed damo via pip.

I will post below the output from the commands you provided.

$ perf record -e damon:damon_aggregated &
[1] 242
$ damo start "sleep 5"
$ sleep 5
$ killall perf
[ perf record: Woken up 1 times to write data ]
failed to mmap file
[ perf record: Captured and wrote 0.085 MB perf.data ]

[1]+ Terminated perf record -e damon:damon_aggregated
$ perf script
File perf.data not owned by current user or root (use -f to override)
$ perf script -f
kdamond.0 248 [000] 130.449087: damon:damon_aggregated: target_id=0 nr_regions=10 94788519636992-94788521140224: 0 1
kdamond.0 248 [000] 130.452191: damon:damon_aggregated: target_id=0 nr_regions=10 94788521140224-94788522643456: 0 1
kdamond.0 248 [000] 130.452268: damon:damon_aggregated: target_id=0 nr_regions=10 94788522643456-94788524146688: 0 1
kdamond.0 248 [000] 130.452284: damon:damon_aggregated: target_id=0 nr_regions=10 94788524146688-94788525649920: 0 1
kdamond.0 248 [000] 130.452295: damon:damon_aggregated: target_id=0 nr_regions=10 94788525649920-94788527169536: 0 1
kdamond.0 248 [000] 130.452306: damon:damon_aggregated: target_id=0 nr_regions=10 140137289535488-140137290989568: 0 1
kdamond.0 248 [000] 130.452317: damon:damon_aggregated: target_id=0 nr_regions=10 140137290989568-140137292443648: 0 1
kdamond.0 248 [000] 130.452328: damon:damon_aggregated: target_id=0 nr_regions=10 140137292443648-140137293897728: 0 1
kdamond.0 248 [000] 130.452339: damon:damon_aggregated: target_id=0 nr_regions=10 140137293897728-140137295360000: 0 1
kdamond.0 248 [000] 130.452350: damon:damon_aggregated: target_id=0 nr_regions=10 140729077768192-140729078140928: 0 1
kdamond.0 248 [000] 130.546266: damon:damon_aggregated: target_id=0 nr_regions=17 94788519636992-94788520538112: 0 2
kdamond.0 248 [000] 130.546316: damon:damon_aggregated: target_id=0 nr_regions=17 94788520538112-94788521140224: 0 2
kdamond.0 248 [000] 130.546334: damon:damon_aggregated: target_id=0 nr_regions=17 94788521140224-94788522491904: 0 2
kdamond.0 248 [000] 130.546345: damon:damon_aggregated: target_id=0 nr_regions=17 94788522491904-94788523692032: 0 2
kdamond.0 248 [000] 130.546356: damon:damon_aggregated: target_id=0 nr_regions=17 94788523692032-94788524146688: 0 2
kdamond.0 248 [000] 130.546367: damon:damon_aggregated: target_id=0 nr_regions=17 94788524146688-94788525346816: 0 2
kdamond.0 248 [000] 130.546377: damon:damon_aggregated: target_id=0 nr_regions=17 94788525346816-94788526710784: 0 2
kdamond.0 248 [000] 130.546388: damon:damon_aggregated: target_id=0 nr_regions=17 94788526710784-94788527169536: 0 2
kdamond.0 248 [000] 130.546398: damon:damon_aggregated: target_id=0 nr_regions=17 140137289535488-140137290260480: 0 2
kdamond.0 248 [000] 130.546409: damon:damon_aggregated: target_id=0 nr_regions=17 140137290260480-140137290989568: 0 2
kdamond.0 248 [000] 130.546419: damon:damon_aggregated: target_id=0 nr_regions=17 140137290989568-140137291714560: 0 2
kdamond.0 248 [000] 130.546430: damon:damon_aggregated: target_id=0 nr_regions=17 140137291714560-140137292443648: 0 2
kdamond.0 248 [000] 130.546440: damon:damon_aggregated: target_id=0 nr_regions=17 140137292443648-140137293168640: 0 2
kdamond.0 248 [000] 130.546451: damon:damon_aggregated: target_id=0 nr_regions=17 140137293168640-140137293897728: 1 0
kdamond.0 248 [000] 130.546461: damon:damon_aggregated: target_id=0 nr_regions=17 140137293897728-140137294774272: 0 2
kdamond.0 248 [000] 130.546472: damon:damon_aggregated: target_id=0 nr_regions=17 140137294774272-140137295360000: 1 0
kdamond.0 248 [000] 130.546482: damon:damon_aggregated: target_id=0 nr_regions=17 140729077768192-140729078140928: 0 2
kdamond.0 248 [000] 130.645962: damon:damon_aggregated: target_id=0 nr_regions=14 94788519636992-94788520898560: 0 3
kdamond.0 248 [000] 130.646010: damon:damon_aggregated: target_id=0 nr_regions=14 94788520898560-94788521545728: 0 3
kdamond.0 248 [000] 130.646027: damon:damon_aggregated: target_id=0 nr_regions=14 94788521545728-94788522491904: 0 3
kdamond.0 248 [000] 130.646038: damon:damon_aggregated: target_id=0 nr_regions=14 94788522491904-94788523692032: 0 3
kdamond.0 248 [000] 130.646049: damon:damon_aggregated: target_id=0 nr_regions=14 94788523692032-94788524863488: 0 3
kdamond.0 248 [000] 130.646060: damon:damon_aggregated: target_id=0 nr_regions=14 94788524863488-94788525752320: 0 3
kdamond.0 248 [000] 130.646070: damon:damon_aggregated: target_id=0 nr_regions=14 94788525752320-94788526845952: 0 3
kdamond.0 248 [000] 130.646080: damon:damon_aggregated: target_id=0 nr_regions=14 94788526845952-94788527169536: 0 3
kdamond.0 248 [000] 130.646091: damon:damon_aggregated: target_id=0 nr_regions=14 140137289535488-140137290551296: 0 3
kdamond.0 248 [000] 130.646101: damon:damon_aggregated: target_id=0 nr_regions=14 140137290551296-140137291857920: 0 3
kdamond.0 248 [000] 130.646112: damon:damon_aggregated: target_id=0 nr_regions=14 140137291857920-140137293168640: 0 3
kdamond.0 248 [000] 130.646123: damon:damon_aggregated: target_id=0 nr_regions=14 140137293168640-140137294508032: 0 1

sj-aws commented

Thank you for kindly sharing the details and the command outputs.

Seems your perf setup has some problem at saving the recorded data, and hence results in the output file having incorrect owner. It apparently records the data successfully, though, as we can show from perf script -f output you attached.

damo internally uses perf record -e damon:damon_aggregated to record DAMON monitoring results, and then it parses the data using perf script. Since running it without -f option fails on your system, damo on your system would assumed the output file is not perf output but a binary record format file, which is deprecated now. Hence it warned you about the deprecation, and tried to parse the file using the deprecated binary record format. But the parsing fails of course, and resulted in your results.

To confirm if the theory is true, could you please try the command again with below change?

--- a/_damon_result.py
+++ b/_damon_result.py
@@ -388,7 +388,7 @@ def parse_records_file(result_file, monitoring_intervals=None):
         try:
             with open(os.devnull, 'w') as fnull:
                 perf_script_output = subprocess.check_output(
-                        [PERF, 'script', '-i', result_file],
+                        [PERF, 'script', '-f', '-i', result_file],
                         stderr=fnull).decode()
         except:
             # Should be record format file

If above theory is true, this change should work around your issue.

If the above change works for you, we could report this to perf community, or add this workaround to damo. I feel like the first option is a better way. What do you think?

Okay, so I modified as you said and I get this:

$ ./damo record "sleep 5"
Press Ctrl+C to stop
[ perf record: Woken up 1 times to write data ]
failed to mmap file
[ perf record: Captured and wrote 0.063 MB damon.data ]

But then I ran this:

$ ./damo report raw
base_time_absolute: 9 m 37.481 s

monitoring_start: 0 ns
monitoring_end: 101.512 ms
monitoring_duration: 101.512 ms
target_id: 0
nr_regions: 12
# start_addr end_addr length nr_accesses age
560030ca0000-560030eb0000 ( 2.062 MiB) 0 17
560030eb0000-5600310be000 ( 2.055 MiB) 0 17
5600310be000-5600312c1000 ( 2.012 MiB) 0 17
5600312c1000-5600314be000 ( 1.988 MiB) 0 17
5600314be000-560031690000 ( 1.820 MiB) 0 17
560031690000-560031890000 ( 2.000 MiB) 0 17
560031890000-560031a9d000 ( 2.051 MiB) 0 17
560031a9d000-560031b78000 ( 876.000 KiB) 0 17
7ff4f4b4b000-7ff4f4d61000 ( 2.086 MiB) 0 16
7ff4f4d61000-7ff4f4f6c000 ( 2.043 MiB) 0 14
7ff4f4f6c000-7ff4f50d9000 ( 1.426 MiB) 0 14
7ffe0556d000-7ffe055fe000 ( 580.000 KiB) 0 17

monitoring_start: 101.512 ms
monitoring_end: 204.789 ms
monitoring_duration: 103.277 ms
target_id: 0
nr_regions: 13
# start_addr end_addr length nr_accesses age
560030ca0000-560030eb0000 ( 2.062 MiB) 0 18
560030eb0000-5600310be000 ( 2.055 MiB) 0 18
5600310be000-5600312cb000 ( 2.051 MiB) 0 18
5600312cb000-5600314be000 ( 1.949 MiB) 0 18
5600314be000-560031690000 ( 1.820 MiB) 0 18
560031690000-560031890000 ( 2.000 MiB) 0 18
560031890000-560031a9d000 ( 2.051 MiB) 0 18
560031a9d000-560031b78000 ( 876.000 KiB) 0 18
7ff4f4b4b000-7ff4f4d61000 ( 2.086 MiB) 0 17
7ff4f4d61000-7ff4f4dee000 ( 564.000 KiB) 0 15
7ff4f4dee000-7ff4f4fb5000 ( 1.777 MiB) 1 0
7ff4f4fb5000-7ff4f50d9000 ( 1.141 MiB) 1 0
7ffe0556d000-7ffe055fe000 ( 580.000 KiB) 0 18

monitoring_start: 204.789 ms
monitoring_end: 306.645 ms
monitoring_duration: 101.856 ms
target_id: 0
nr_regions: 13
# start_addr end_addr length nr_accesses age
560030ca0000-560030eb0000 ( 2.062 MiB) 0 19
560030eb0000-5600310be000 ( 2.055 MiB) 0 19
5600310be000-5600312cb000 ( 2.051 MiB) 0 19
5600312cb000-5600314be000 ( 1.949 MiB) 0 19
5600314be000-560031690000 ( 1.820 MiB) 0 19
560031690000-560031890000 ( 2.000 MiB) 0 19
560031890000-560031a9d000 ( 2.051 MiB) 0 19
560031a9d000-560031b78000 ( 876.000 KiB) 0 19
7ff4f4b4b000-7ff4f4d61000 ( 2.086 MiB) 0 18
7ff4f4d61000-7ff4f4ea4000 ( 1.262 MiB) 0 6
7ff4f4ea4000-7ff4f509e000 ( 1.977 MiB) 0 0
7ff4f509e000-7ff4f50d9000 ( 236.000 KiB) 0 0
7ffe0556d000-7ffe055fe000 ( 580.000 KiB) 0 19

monitoring_start: 306.645 ms
monitoring_end: 404.766 ms
monitoring_duration: 98.121 ms
target_id: 0
nr_regions: 12
# start_addr end_addr length nr_accesses age
560030ca0000-560030eb0000 ( 2.062 MiB) 0 20
560030eb0000-5600310be000 ( 2.055 MiB) 0 20
5600310be000-5600312cb000 ( 2.051 MiB) 0 20
5600312cb000-5600314be000 ( 1.949 MiB) 0 20
5600314be000-56003169f000 ( 1.879 MiB) 0 20
56003169f000-560031890000 ( 1.941 MiB) 0 20
560031890000-560031a9d000 ( 2.051 MiB) 0 20
560031a9d000-560031b78000 ( 876.000 KiB) 0 20
7ff4f4b4b000-7ff4f4d61000 ( 2.086 MiB) 0 19
7ff4f4d61000-7ff4f4f09000 ( 1.656 MiB) 0 5
7ff4f4f09000-7ff4f50d9000 ( 1.812 MiB) 0 1
7ffe0556d000-7ffe055fe000 ( 580.000 KiB) 0 20

monitoring_start: 404.766 ms
monitoring_end: 506.484 ms
monitoring_duration: 101.718 ms
target_id: 0
nr_regions: 12
# start_addr end_addr length nr_accesses age
560030ca0000-560030eb0000 ( 2.062 MiB) 0 21
560030eb0000-5600310be000 ( 2.055 MiB) 0 21
5600310be000-5600312cb000 ( 2.051 MiB) 0 21
5600312cb000-5600314be000 ( 1.949 MiB) 0 21
5600314be000-56003169f000 ( 1.879 MiB) 0 21
56003169f000-560031890000 ( 1.941 MiB) 0 21
560031890000-560031a9d000 ( 2.051 MiB) 0 21
560031a9d000-560031b78000 ( 876.000 KiB) 0 21
7ff4f4b4b000-7ff4f4d61000 ( 2.086 MiB) 0 20
7ff4f4d61000-7ff4f4f09000 ( 1.656 MiB) 0 6
7ff4f4f09000-7ff4f50d9000 ( 1.812 MiB) 0 2
7ffe0556d000-7ffe055fe000 ( 580.000 KiB) 0 21

So, does that mean that it works and that I should ignore that failed to mmap file?

Also, to answer your question I think that reporting this to perf community is the more complete solution. In the meanwhile I can use the workaround until a fix will be available. Do you think it would be helpful to post a disclaimer/warning on your README/USAGE page until the fix will be available?

One more thing! I tried to run this command: ./damo record --output_type json "sleep 5" and no json file was created. So maybe it doesn't exactly work yet?

sj-aws commented

Hi Denis,

Thank you for sharing the results!

does that mean that it works and that I should ignore that failed to mmap file?

The damo report raw output means it works. I'm not pretty sure if we can ignore failed to mmap file completely, but at least the output of damo report raw is what we expect as a result of a successful operation.

Do you think it would be helpful to post a disclaimer/warning on your README/USAGE page until the fix will be available?

That makes sense. I think it could be good to be added to FAQ section of README. Please feel free to send PR, or wait until I make it.

./damo record --output_type json "sleep 5" and no json file was created.

It saves the output file to damon.data by default. You can change that using --out option. For example,

$ sudo ./damo record --output_type json "sleep 5"
Press Ctrl+C to stop
[ perf record: Woken up 1 times to write data ]
[ perf record: Captured and wrote 0.284 MB damon.data (578 samples) ]
$ sudo head damon.data
[
    {
        "kdamond_idx": null,
        "context_idx": null,
        "intervals": {
            "sample_us": "5000",
            "aggr_us": "100000",
            "ops_update_us": "1000000"
        },
        "scheme_idx": null,
$
$ sudo ./damo record --output_type json "sleep 5" --out for_denis.json
Press Ctrl+C to stop
[ perf record: Woken up 1 times to write data ]
[ perf record: Captured and wrote 0.284 MB for_denis.json (580 samples) ]
$ sudo head for_denis.json
[
    {
        "kdamond_idx": null,
        "context_idx": null,
        "intervals": {
            "sample_us": "5000",
            "aggr_us": "100000",
            "ops_update_us": "1000000"
        },
        "scheme_idx": null,

Okay, so at least we got something good out of this.

Do you think you could report the perf bug to their community? I think it would be better if you would do it since you have a better understanding of the issue. What do you think?

Regarding the disclaimer/warning I am in no rush, so you can do it in your own time, as you see fit.

And yes, now the json option works.

Thanks for all your support. Greatly appreciated!

sj-aws commented

Do you think you could report the perf bug to their community? I think it would be better if you would do it since you have a better understanding of the issue. What do you think?

Unfortunately I also don't know well about the internal of perf ;) I could report it on behalf of you, of course. Nevertheless, since I'm unable to reproduce the issue on my own, I think it could be better for you to report the bug, in my opinion.

Looks all is good other than that. Thank you for reporting the issue and bear in mind with me :)

sj-aws commented

So, it's unclear if the issue is due to setup or perf's internal bug. Whatever the reason is, there is no reason for damo to be that strict for concerns from perf. As long as we can, we want to just parse the file and use it. Hence, I made a commit[1] adding --force option by default to damo's perf script command.

Closing this issue, as remaining issue is not the issue of damo but perf or your setup. Please feel free to open this again if you need any help from damo. Of course, you're welcome to reach out to me for any help for further root cause of the perf issue. I'm not experienced with perf internal, so I'm not pretty sure how much help I could provide for that, though.

[1] f5b12a9