cmu-db/ottertune

Cold Start Doesn't Work

arifiorino opened this issue · 10 comments

Running "fab loop" on the first try without doing "fab lhs_samples" and "fab run_lhs" doesn't work. It sets the knobs to extreme values which crashed postgres.

Fixed in PR #210.

I encontered the same issue: after running "fab loop" once, ottertune will set "max_wal_size" and "min_wal_size" to extremely high (e.g. 28 PB). I tried to run "fab run_lhs" but it cann't fix this issue. Also I tried to edit the jason file inside the "fixtures" fold in order to change the min/max range settings, however, the issue still there. Do you have any suggestions?

Try running "fab lhs_samples", then "fab run_lhs", then "fab loop". This issue was fixed in PR #210 but it hasn’t been merged in yet.

Thank you for your reply. But still not working, it keeps saying that "no trainning data". I tried to upload the example tranning data you upload here:https://github.com/bohanjason/ottertune-example-data. Seems uploading is successful, but when I run fab loop, it still says "no trainning data"

When you do "fab lhs_samples", it should make samples in the "config" folder, and then "fab run_lhs" should try each of those samples. If this doesn't work for some reason, try pulling from https://github.com/arifiorino/ottertune where I made the random generating to be in range.

I tried your code, yes, it can set "max_wal_size" to reasonable value. Thank you! This part fixed. However, it will set the "shared_buffer_size" to a very small number (e.g. 47kb). Do you know how could I modify the range of random generation?

When I ran "sudo systemctl start postgresql@9.6-main.service", it gives me:

Jul 10 23:31:05 server-cl2 systemd[1]: Starting PostgreSQL Cluster 9.6-main...
Jul 10 23:31:06 server-cl2 postgresql@9.6-main[25146]: The PostgreSQL server failed to start. Please check the log output:
Jul 10 23:31:06 server-cl2 postgresql@9.6-main[25146]: 2019-07-10 23:31:06.144 UTC [25153] LOG: 5 is outside the valid range for parameter "shared_buffers" (16 .. 1073741823)
Jul 10 23:31:06 server-cl2 postgresql@9.6-main[25146]: 2019-07-10 23:31:06.144 UTC [25153] FATAL: configuration file "/etc/postgresql/9.6/main/postgresql.conf" contains errors

Yes, you can edit ottertune/server/website/website/fixtures/postgres-96_knobs.json and then reinstall the server. Also I would reset postgres to the defaults so that it works.

I see, last time when I edit "postgres-96_knobs.json", I just reboot the server, I think that is the reason why it won't work. Also, I saw someone said that I need at least run 1000 loops to generate enough random data for trainning the model. After that I can start tuning, is that correct? (#190)
PS: According to my understanding, the number of the loops should depend on the number of knobs we select. Because knobs are features, more knobs mean higher dimensionality, and fitting the model will need more samples when it has higher dimensionality. However, too many knobs might result in over fitting.

Fixed in #210