base knowledge bellman RL_model SARAS model(on policy) base q_learning model(depend on q table/ off policy) deep q learning(DQN)