Add Kate

Question

Add Kate

tfriedel opened this issue 6 months ago · 6 comments

note that you can't lift the code as-is. Parameters are passed over in a "cfg" variable and also there's an issue with it not checking if gradients are none and how the "device" value is read. Maybe more issues than that.

Answer 1 · 2024-07-06T06:06:26.000Z

@tfriedel thanks for the request! As you said, the official code was incorrectly implemented and had several issues, unlike the paper. I corrected the implementation and checked the optimizer working.

Answer 2 · 2024-07-06T10:31:25.000Z

@kozistr
Cool! Did you try it and get decent results? I had fixed the obvious issues I mentioned and tried it out, but results were poor. However it would need more testing to check if it's an implementation issue or not. Haven't checked your implementation yet.

Answer 3 · 2024-07-06T11:03:36.000Z

@kozistr Cool! Did you try it and get decent results? I had fixed the obvious issues I mentioned and tried it out, but results were poor. However it would need more testing to check if it's an implementation issue or not. Haven't checked your implementation yet.

actually, I didn't test it on the other benchmark datasets (just tested with the toy example in the test cases).

it seems there are differences between the original code and the pseudocode in the paper. So, I re-implemented it based on the paper though.

Answer 4 · 2024-07-07T08:18:55.000Z

@kozistr Cool! Did you try it and get decent results? I had fixed the obvious issues I mentioned and tried it out, but results were poor. However it would need more testing to check if it's an implementation issue or not. Haven't checked your implementation yet.

just added the visualizations here.

ran the visualization on Rosenbrock and Rastrigin functions. here's the result of Kate optimizer, and the result seems fine (not diverged, maybe working).

if I'm available, will add more tests on the benchmark datasets like imagenet, mnist too.

Answer 5 · 2024-07-07T15:06:20.000Z

Interesting! I didn't know about Rastigrin.
Having some more relalistic benchmarks for all these optimiziers would be nice. Do you know about
https://github.com/mlcommons/algorithmic-efficiency
?
I know prodigy and schedule-free were submitted for this benchmark.

Also you may be interested in this twitter thread about optimizers:
https://x.com/_clashluke/status/1808590060654108910

For example there was a mention of a grafted Lion#Adam optimizer:
https://x.com/dvruette/status/1627663196839370755

Answer 6 · 2024-07-08T10:44:20.000Z

Interesting! I didn't know about Rastigrin. Having some more relalistic benchmarks for all these optimiziers would be nice. Do you know about https://github.com/mlcommons/algorithmic-efficiency ? I know prodigy and schedule-free were submitted for this benchmark.

Also you may be interested in this twitter thread about optimizers: https://x.com/_clashluke/status/1808590060654108910

For example there was a mention of a grafted Lion#Adam optimizer: https://x.com/dvruette/status/1627663196839370755

thanks for the resources! I'll try some