TempleRAIL/SOGMP

Take too much time

Closed this issue · 2 comments

I am running this program on a device with an Nvidia GeForce RTX 3070 GPU and 32GB of memory. And it takes approximately 1.5-1.6s to decode 10 frames. I found that most of the time was spent by the prediction of pytorch(the part for t in range(T):).

It takes around 0.3s for a single for loop of the part for t in range(T):. So if I want to predict 10 frames it's gonna be (1+2+3+...+9+10)*0.03≈1.65s.

Is there any way to make the decode demo run faster? As we want to deploy the program on mobile robots so the real-time performance is highly demanded.

`

        start = time.time()
        # multi-step prediction: 10 time steps:
        for j in range(SEQ_LEN):
            print('----------------------Decode the %dth frame-----------------------------'%(j+1))
            # Create input grid maps:
            input_gridMap = copy.deepcopy(input_gridMap_bak)
            # current position and velocities: 
            obs_pos_N = positions[:, SEQ_LEN-1]
            vel_N = velocities[:, SEQ_LEN-1]
            # Predict the future origin pose of the robot: t+n 
            T = j+1 #int(t_pred)
            noise_std = [0, 0, 0]#[0.00111, 0.00112, 0.02319]
            pos_origin = input_gridMap.origin_pose_prediction(vel_N, obs_pos_N, T, noise_std)
            # robot positions:
            pos = positions[:,:SEQ_LEN]
            # Transform the robot past poses to the predicted reference frame.
            x_odom, y_odom, theta_odom = input_gridMap.robot_coordinate_transform(pos, pos_origin)
            # Lidar measurements:
            distances = scans[:,:SEQ_LEN]
            # the angles of lidar scan: -135 ~ 135 degree
            angles = torch.linspace(-(135*np.pi/180), 135*np.pi/180, distances.shape[-1]).to(device)
            # Lidar measurements in X-Y plane: transform to the predicted robot reference frame
            distances_x, distances_y = input_gridMap.lidar_scan_xy(distances, angles, x_odom, y_odom, theta_odom)
            # discretize to binary maps:
            input_binary_maps = input_gridMap.discretize(distances_x, distances_y)
            # binary occupancy maps:
            input_binary_maps = input_binary_maps.unsqueeze(2)
            # feed the batch to the network:
            num_samples = 32 #1
            inputs_samples = input_binary_maps.repeat(num_samples,1,1,1,1)
            timestamp8 = time.time()
            for t in range(T):
                prediction, kl_loss = model(inputs_samples)
                prediction = prediction.reshape(-1,1,1,IMG_SIZE,IMG_SIZE)
                inputs_samples = torch.cat([inputs_samples[:,1:], prediction], dim=1) 
            timestamp9 = time.time()
            print('pytorch prediction takes %.4f' %(timestamp9 - timestamp8))
            predictions = prediction.squeeze(1)
            # mean and std:
            pred_mean = torch.mean(predictions, dim=0, keepdim=True)
            prediction_maps[j, 0] = pred_mean.squeeze()
        end = time.time()
        print('prediction 10 time steps takes:%.4f s'%(end - start))`

image
I would appreciate for any help.

zzuxzt commented

Hi, there are some general ways to increase inference speed:

  1. Use a smaller-sized map as input;
  2. Reduce the sample size: e.g. num_samples = 1 instead of 32.

However, the inference speed depends on your specific application because the SOGMP uses a rollout way to predict multiple timesteps. Do you want prediction results for all 10 timesteps or just one specific timestep (such as the 5th or 10th timestep)?

  1. If you need all 10 future timesteps, you can transform the observed OGMs to the fixed global coordinate frame like the map frame or the robot’s current local coordinate frame. In that case, you can easily remove the first for loop "for j in range(SEQ_LEN):", and it will run much faster for predicting all 10 future timesteps. Then, you can continue to speed it up by 2).

  2. If you only need one specific timestep, then the first for loop "for j in range(SEQ_LEN):" can be removed. Then, there is another step to further increase the speed:
    a) How long does your application need? 0.5s prediction or 1s prediction? If long-term predictions are required, the sampling period can be increased (e.g. from current 0.1s to 0.5s), and the number of prediction time steps will decrease. It means the second for loop "for t in range(T)" can be a short loop and run much faster.

I'll try that. Thanks a lot. @zzuxzt