[IDAS ERROR] IDACalcIC Newton/Linesearch algorithm failed to converge.

Question

[IDAS ERROR] IDACalcIC Newton/Linesearch algorithm failed to converge.

Closed this issue 6 months ago · 4 comments

When running the reset() and get_next_step() functions I very rarely get the error:
[IDAS ERROR] IDACalcIC
Newton/Linesearch algorithm failed to converge.

This is happening while Reinforcement Learning. So I run the reset once, and then run get_next_step until the kite crashes (around 285 times). That is one episode, and this is repeated x times.
I get the error for the first time after around 76 episodes (each running reset once, and get_next_step 285 times on average).

One episode pseudocode:

reset()
while not crashed:
    get_next_step(action)

Output.txt containing the rollout of one episode and the first occurrance error:

---------------------------------
| rollout/           |          |
|    ep_len_mean     | 285      |
|    ep_rew_mean     | -200     |
| time/              |          |
|    episodes        | 76       |
|    fps             | 456      |
|    time_elapsed    | 84       |
|    total_timesteps | 38448    |
| train/             |          |
|    actor_loss      | 4.3e+11  |
|    critic_loss     | 3.32e+23 |
|    ent_coef        | 2.06e+10 |
|    ent_coef_loss   | -437     |
|    learning_rate   | 0.0456   |
|    n_updates       | 768      |
---------------------------------

[IDAS ERROR]  IDACalcIC
  Newton/Linesearch algorithm failed to converge.

Environment.jl:

module Environment

using Timers; tic()
using KiteModels
using KiteUtils
# using PyCall #removed pycall!!
# using Plots


const Model = KPS4

set_data_path(joinpath(@__DIR__, "../../Simulator/data"))
kcu = KCU(se());
kps4 = Model(kcu);
dt = 1/se().sample_freq
steps = 1000
step = 0
logger = Logger(se().segments + 5, steps) 

GC.gc();
toc();

integrator = KiteModels.init_sim!(kps4, stiffness_factor=0.04);

function get_next_step(depower, steering)
    global step
    depower = Float32(depower)
    steering = Float32(steering)

    v_ro = 0.0

    if depower < 0.22; depower = 0.22; end
    set_depower_steering(kps4.kcu, depower, steering)

    t_sim = 0.0
    open("next_step_io.txt", "w") do io
        redirect_stdout(io) do
            t_sim = @elapsed KiteModels.next_step!(kps4, integrator, v_ro=v_ro, dt=dt)
        end
    end

    GC.gc(false)
    
    sys_state = SysState(kps4)
    step += 1

    return sys_state.orient[1], sys_state.orient[2], sys_state.orient[3], sys_state.orient[4], sys_state.force
end

function reset()
    global kcu
    global kps4
    global integrator
    global step
    global sys_state
    update_settings()
    save_log(logger)
    kcu = KCU(se());
    kps4 = Model(kcu);
    integrator = KiteModels.init_sim!(kps4, stiffness_factor=0.04)
    step = 0
    sys_state = SysState(kps4)
    GC.gc();
    return sys_state.orient[1], sys_state.orient[2], sys_state.orient[3], sys_state.orient[4], sys_state.force
end

function render()
    global sys_state, logger, step, steps
    if(step < steps)
        log!(logger, SysState(kps4))
    end
end


end

System:
I am running the code on IDUN High Performance Computing: https://www.hpc.ntnu.no/idun/
inside an apptainer ubuntu container.
I made a system image with Environment as a precompiled package.

Answer 1 · 2024-03-05T14:27:27.000Z

There are many possible reasons why the solver can fail.

The first thing I would try is to change the
solver settings:

solver:
    abs_tol: 0.0006        # absolute tolerance of the DAE solver [m, m/s]
    rel_tol: 0.001         # relative tolerance of the DAE solver [-]
    linear_solver: "GMRES" # can be GMRES or Dense
    max_order: 4           # maximal order, usually between 3 and 5
    max_iter:  200         # max number of iterations of the steady-state-solver

This can be changed globally in settings.yaml, but also in a case-by-case way
e.g. by doing:

se().abs_tol=0.000006
se().rel_tol=0.0000001

The second thing to try is to reduce the stiffness of the tether

tether:
    c_spring

At the beginning of a simulation I always use a low stiffness and increase it to the nominal
value when an equilibrium is reached.

Does this answers your question?

Answer 2 · 2024-03-05T14:30:18.000Z

Yes, thank you!

…

On Tue, Mar 5, 2024 at 3:27 PM Uwe Fechner ***@***.***> wrote: There are many possible reasons why the solver can fail. The first thing I would try is to change the solver settings: `` solver: abs_tol: 0.0006 # absolute tolerance of the DAE solver [m, m/s] rel_tol: 0.001 # relative tolerance of the DAE solver [-] linear_solver: "GMRES" # can be GMRES or Dense max_order: 4 # maximal order, usually between 3 and 5 max_iter: 200 # max number of iterations of the steady-state-solver This can be changed globally in settings.yaml, but also in a case-by-case way e.g. by doing: se().abs_tol=0.000006 se().rel_tol=0.0000001 The second thing to try is to reduce the stiffness of the tether tether: c_spring At the beginning of a simulation I always use a low stiffness and increase it to the nominal value when an equilibrium is reached. Does this answers your question? — Reply to this email directly, view it on GitHub <#44 (comment)>, or unsubscribe <https://github.com/notifications/unsubscribe-auth/AIBR55GL3ZTQJXDJAUXW723YWXI53AVCNFSM6AAAAABCI3ONSGVHI2DSMVQWIX3LMV43OSLTON2WKQ3PNVWWK3TUHMYTSNZYHA4TONRUHE> . You are receiving this because you authored the thread.Message ID: ***@***.***>

Answer 3 · 2024-04-03T14:34:37.000Z

I just added the option to use the DFBDF solver, which - in general - works much better, much more stable, in average 4 times faster and half the memory usage. Please try it out and tell me if this fixes your problem.

Answer 4 · 2024-04-07T15:15:07.000Z

Thanks, it fixed the problem!

…

On Wed, Apr 3, 2024 at 4:34 PM Uwe Fechner ***@***.***> wrote: I just added the option to use the DFBDF solver, which - in general - works much better, much more stable, in average 4 times faster and half the memory usage. Please try it out and tell me if this fixes your problem. — Reply to this email directly, view it on GitHub <#44 (comment)>, or unsubscribe <https://github.com/notifications/unsubscribe-auth/AIBR55HH6OB3YJ6AJQ43QB3Y3QHRFAVCNFSM6AAAAABCI3ONSGVHI2DSMVQWIX3LMV43OSLTON2WKQ3PNVWWK3TUHMZDAMZUHAYDCMBQGI> . You are receiving this because you authored the thread.Message ID: ***@***.***>