microsoft/AutonomousDrivingCookbook

TestModel notebook stuck after connecting with airsim

wonjoonSeol opened this issue · 17 comments

Problem description

After running second box - despite printing out "connection successful"
the next box does not run. The following boxes simply have * mark.

Problem details

screenshot 2019-01-21 at 22 00 30

Next lines do not run, stuck.

Running separate window of Airsim using the following command:
`.\AD_Cookbook_Start_AirSim.ps1 landscape -windowed'

Model runs fine on DistributedLearningRL version.

What is the purpose of these lines?
`if ('../../PythonClient/' not in sys.path):
sys.path.insert(0, '../../PythonClient/')'

I don't have this directory.

Experiment/Environment details

  • Tutorial used: AirSimE2EDeepLearning
  • Environment used: landscape, neighborhood
  • Versions of artifacts used (if applicable): Python 3.6.8, Keras 2.1.2, CuDDN 7.4

Just to let you know python script that overrides Carclient call in Share.scripts_downpour.app.airsim_client is the cause for indefinite hang.

Running script is fine if using the latest airsim client API instead of this script.
Note to other people experiencing issue with the notebooks. Scripts here needs an update in general (Including the keras version it supports) so don't use the codes here as it is.

I tried this on two separate machines, and was unable to reproduce. Are you using the binaries linked in the tutorial, or the ones from the AirSim repo? They are not the same.

@mitchellspryn This tutorial is really awesome! Much better than the documentation from CARLA. I have the same issue with the notebook. I first open a powershell and start the simulator (works and car appears) then I run the jupyter notebook for testing and it tells me the connection could be established but the notebook gets stuck after running the third cell. I had no problems with the first and second notebook after downgrading keras to 2.1.2 (It would be nice to add this information to the main jupyter notebook https://github.com/Microsoft/AutonomousDrivingCookbook/tree/master/AirSimE2EDeepLearning with Install Keras).

Is it necessary to pip install airsim as proposed in the main documentation?

Environment

  • Windows 10, Python 3.4.5, keras=2.1.2, tensorflow-gpu on Nvidia Quadro P2000
  • I did not use Anaconda but pure Python 3.4.5 with a virtual environment with all the necessary installations
  • Downloaded the binaries linked in the tutorial (https://airsimtutorialdataset.blob.core.windows.net/e2edl/AD_Cookbook_AirSim.7z) extracted it on my desktop
  • Downloaded the AirSim-master repo from github for the PythonClient stuff and replaced the first if statement by
if ('c:/Users/username/Desktop/AirSim/AirSim-master/PythonClient/' not in sys.path):
    sys.path.insert(0, 'c:/Users/username/Desktop/AirSim/AirSim-master/PythonClient/')
  • My current jupyter notebook

Cell 1

from keras.models import load_model
import sys
import numpy as np
import glob
import os
# import airsim

### commenting the following portion into the cell does not chage the problem
# if ('c:/Users/user/Desktop/AirSim/AirSim-master/PythonClient/' not in sys.path):
#     sys.path.insert(0, 'c:/Users/user/Desktop/AirSim/AirSim-master/PythonClient/')
from AirSimClient import *

# << Set this to the path of the model >>
# If None, then the model with the lowest validation loss from training will be used
MODEL_PATH = None

if (MODEL_PATH == None):
    models = glob.glob('c:/airsim/model/models/*.h5') 
    best_model = max(models, key=os.path.getctime)
    MODEL_PATH = best_model
    
print('Using model {0} for testing.'.format(MODEL_PATH))

Cell 2 (prints connection established)

model = load_model(MODEL_PATH)

# # connect to the AirSim simulator with new API and import airsim did not work
# client = airsim.CarClient()
# client.confirmConnection()
# client.enableApiControl(True)
# car_controls = airsim.CarControls()

client = CarClient()
client.confirmConnection()
client.enableApiControl(True)
car_controls = CarControls()
print('Connection established!')

Cell 3 (gets stuck)

car_controls.steering = 0
car_controls.throttle = 0
car_controls.brake = 0

image_buf = np.zeros((1, 59, 255, 3))
state_buf = np.zeros((1,4))

@wonjoonSeol Could you post your notebook which was executable?

I didn't realize airsim is now pip installable. Don't install it.

Please use the AirSimClient.py in the repo. Don't use the one in the AirSim master repo.

Here's why: AirSim is rapidly changing. As you've noticed, many of the changes are not backwards compatible. Our goal here is to create a stable build that works everywhere. Hence, we have created a binary from a snapshot of the AirSim repo, and took a snapshot of their client libraries. You should have everything you need to run inside this repo.

@mitchellspryn Thank you a lot for your support! But what about his part?

if ('../../PythonClient/' not in sys.path):
    sys.path.insert(0, '../../PythonClient/')

Do I have to change something in these lines?

BTW I am very greatful to you guys for sharing AirSim and your experience with us! The tutorial is the best tutorial about self driving cars that I have seen. Only the part with the data cooking seems to be magic for programming beginners like myself :D.

I'm not sure why that is there. It shouldn't do anything. Does it break something?

@mitchellspryn Hmm ... I only changed the path to the model. And the remaining code is copied from your notebook. I then go into the folder of the AD_Cookbook_AirSim I open the powershell in this folder and execute .\AD_Cookbook_Start_AirSim.ps1 landscape -windowed. The simulator opens up with the car standing still (I can still control it with the keyboard). Then I run cell 1 and cell 2 of the notebook and everything works without errors the second cell generates this output

WARNING:tensorflow:From c:\pyenvs\airsim\lib\site-packages\keras\backend\tensorflow_backend.py:1264: calling reduce_prod (from tensorflow.python.ops.math_ops) with keep_dims is deprecated and will be removed in a future version.
Instructions for updating:
keep_dims is deprecated, use keepdims instead
WARNING:tensorflow:From c:\pyenvs\airsim\lib\site-packages\keras\backend\tensorflow_backend.py:1349: calling reduce_mean (from tensorflow.python.ops.math_ops) with keep_dims is deprecated and will be removed in a future version.
Instructions for updating:
keep_dims is deprecated, use keepdims instead
Waiting for connection: 
Connection established!

But when running cell 3 the notebook gets stuck.

gsh89 commented

@New2Coding2019 Did this issue resolve for you ? I am encountering the same problem as you. In my case, I was using the repo for this project and not the master but still having issues running cell 3. As @wonjoonSeol had mentioned that some files in the Deep Reinforcement learning folder were causing the hang, I tried removing that folder from the copy of the repo I have on my computer. After that cell 2 started having the same problem (getting stuck and never executing).

@mitchellspryn Could you please provide some guidance on this. I can upload a screenshot of my Jupyter notebook but the issue is exactly the same as @New2Coding2019 described. The only difference being that I was not using the master.

Apologies for replying late, I have been working on this on a leisurely pace.

I have managed to update the whole distributedRL codebase for the latest Airsim (v1.2.1). This version is superior in my opinion because it has fewer bugs and has supports for clock speed. Which I think is a pretty big deal for RL learning.

I wasn't able to update AirSimE2EDeepLearning because the latest landscape map does not have any snows there. We cannot use the same training images in the newer version.

In terms of sharing the codes, I may not be able to merge my codebase for a few months because I am currently working on this for a university project. I will ask my supervisor whether I could share my personal project on a public repository before the deadline. But if you need to update the current codes for the newest version before uploading my updates, I can give you some guidance.

Other than updating all new APIs for the existing codebase, you will need to update rewards.txt and roadlines.txt because of the change in the map base coordinates. I have worked out the offset. [80.45703888, 122.9101944]. Add this offset for v1.2.1 version. Furthermore the reward graph has some error. The coordinates of the bottom lane in the reward graph does not correspond to the centre points of the street. You need to further offset the reward coordinates belonging to the bottom lane by -2 in the y axis.

Finally, you need to updates all directory paths in distributed_agent.py and update config for powershell scripts.

Those who have problems with Jupyter getting stuck, I suggest you copy the codes to a separate python file. Either 1) most of your problems will disappear 2) Give you proper error messages.

The codes work fine for both AirSimE2EDeepLearning and Distributed RL Local training with the exception of TestModel for AirSimE2EDeepLearning.

That one alone gives "cuDNN failed to initialize" error. Apparently, this error is raised when there is a compatibility issue between tensorflow-gpu, cudnn and cuda versions. However, I don't get the same error when I run distributed RL.

@mitchellspryn, You have given us the version for Keras (2.1.2) from another issue.
Can you please also give us your tensorflow-gpu, cudnn and cuda versions for running AirSimE2EDeepLearning.

In case that also doesn't work, if you could kindly upload Youtube video on running the TestModel results, that would be very helpful indeed.

For the provided codebase, I am currently using:

  • Tensorflow 1.13
  • Keras 2.1.2 (But in my updated code I am using Keras 2.2.4 now)
  • CUDA 10
  • Cuda toolkit 10.0

@New2Coding2019

if ('../../PythonClient/' not in sys.path):
    sys.path.insert(0, '../../PythonClient/')

You can remove this line. The codebase uses modified airsim_client.py in the Share/scriptsdownpour/apps anyway.

@mitchellspryn
What is the purpose of D:\agent.agent logging in Distributed_agent.py?
It doesn't seem to do anything.

@wonjoonSeol @Gurtaj @mitchellspryn Did you manage to solve the issue with the system crashing after it says the connection is successful? I am using AirSimE2EDeepLearning.

What do you mean by system crashing. I am unable to connect to E2EDeepLearning's Airsim Client using testmodel notebook.

Copying notebook code to separte python file provides an additional information:

Capture

TransportError: Retry connection over the limit

This issue doesn't happen with DistributeRL version.

@wonjoonSeol I mean the same issue you experienced with AirSim being stuck after connecting it via the TestModel. Mine says 'connection established' then when running the cells after this, AirSim freezes.

@tfromb yes, if you copy all the code and run it on python script instead you will probably see the same information as above.

Is there a fix for this issue?

Did anyone solve this issue?

@wonjoonSeol I mean the same issue you experienced with AirSim being stuck after connecting it via the TestModel. Mine says 'connection established' then when running the cells after this, AirSim freezes.

Did you solve the issue?