(Again) How did you arrive to the normalization constants?

First of all, thank you for making this implementation publicly available. I find your code really elegant.

I have some questions regarding how you arrived to the normalization constants. I already saw a really similar issue (#1), but it did not completely clarify what I am wondering.

I see that normalization is now performed in model.data.py, with these means and standard deviations:

waymo-motion-prediction-challenge-2022-multipath-plus-plus/code/model/data.py

Lines 30 to 48 in 4636641

    
           elif features == ("xy", "yaw", "speed", "width", "length", "valid"): 
        
               normalizarion_means = { 
        
                   "target/history/lstm_data": np.array([-2.9633283615112305,0.005309064872562885,-0.003220283193513751,6.059159278869629,1.9252972602844238,4.271720886230469,0.0,0.0,0.0,0.0,0.0,0.0,0.0], dtype=np.float32), 
        
                   "target/history/lstm_data_diff": np.array([0.5990215539932251,-0.0018718164646998048,0.0006288147415034473,0.0017819292843341827,0.0,0.0,0.0,0.0,0.0,0.0,0.0], dtype=np.float32), 
        
                   "other/history/lstm_data": np.array([5.601348876953125,1.4943491220474243,-0.013019951991736889,1.44475519657135,1.072572946548462,2.4158480167388916,0.0,0.0,0.0,0.0,0.0,0.0,0.0], dtype=np.float32), 
        
                   "other/history/lstm_data_diff": np.array([0.025991378352046013,-0.0008657555445097387,9.549396054353565e-05,0.001465122913941741,0.0,0.0,0.0,0.0,0.0,0.0,0.0], dtype=np.float32), 
        
                   "target/history/mcg_input_data": np.array([-2.9633283615112305,0.005309064872562885,-0.003220283193513751,6.059159278869629,1.9252972602844238,4.271720886230469,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0], dtype=np.float32), 
        
                   "other/history/mcg_input_data": np.array([5.601348876953125,1.4943491220474243,-0.013019951991736889,1.44475519657135,1.072572946548462,2.4158480167388916,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0], dtype=np.float32), 
        
                   "road_network_embeddings": np.array([77.35582733154297,0.12082172930240631,0.05486442521214485,0.004187341313809156,-0.0015162595082074404,2.011558771133423,0.9601883888244629,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0], dtype=np.float32) 
        
               } 
        
               normalizarion_stds = { 
        
                   "target/history/lstm_data": np.array([3.738459825515747,0.11283490061759949,0.10153655707836151,5.553133487701416,0.5482628345489502,1.6044323444366455,1.0,1.0,1.0,1.0,1.0,1.0,1.0], dtype=np.float32), 
        
                   "target/history/lstm_data_diff": np.array([0.5629324316978455,0.03495170176029205,0.04547161981463432,0.5762772560119629,1.0,1.0,1.0,1.0,1.0,1.0,1.0], dtype=np.float32), 
        
                   "other/history/lstm_data": np.array([33.899658203125,25.64937973022461,1.3623465299606323,3.8417460918426514,1.0777146816253662,2.4492409229278564,1.0,1.0,1.0,1.0,1.0,1.0,1.0], dtype=np.float32), 
        
                   "other/history/lstm_data_diff": np.array([0.36061710119247437,0.1885228455066681,0.08698483556509018,0.43648791313171387,1.0,1.0,1.0,1.0,1.0,1.0,1.0], dtype=np.float32), 
        
                   "target/history/mcg_input_data": np.array([3.738459825515747,0.11283490061759949,0.10153655707836151,5.553133487701416,0.5482628345489502,1.6044323444366455,1.0,1.0,1.0,1.0,1.0,1.0,1.0,1.0,1.0,1.0,1.0,1.0,1.0,1.0,1.0,1.0,1.0,1.0], dtype=np.float32), 
        
                   "other/history/mcg_input_data": np.array([33.899658203125,25.64937973022461,1.3623465299606323,3.8417460918426514,1.0777146816253662,2.4492409229278564,1.0,1.0,1.0,1.0,1.0,1.0,1.0,1.0,1.0,1.0,1.0,1.0,1.0,1.0,1.0,1.0,1.0,1.0], dtype=np.float32), 
        
                   "road_network_embeddings": np.array([36.71162414550781,0.761500358581543,0.6328969597816467,0.7438802719116211,0.6675100326538086,0.9678668975830078,1.1907216310501099,1.0,1.0,1.0,1.0,1.0,1.0,1.0,1.0,1.0,1.0,1.0,1.0,1.0,1.0,1.0,1.0,1.0,1.0,1.0,1.0], dtype=np.float32) 
        
               }

And to the predicted coordinates during training with these (if specified):

waymo-motion-prediction-challenge-2022-multipath-plus-plus/code/train.py

Lines 83 to 85 in 4636641

    
           if config["train"]["normalize_output"]: 
        
               # assert not (config["train"]["normalize_output"] and config["train"]["trainable_cov"]) 
        
               xy_future_gt = (data["target/future/xy"] - torch.Tensor([1.4715e+01, 4.3008e-03]).cuda()) / 10.

For a different model I am developing, I tried to calculate similar constants (mainly for various features of target/history and target/future), and I arrive to considerably different values. For some features, I get fairly similar values, but for other they are a lot higher (a factor of 10-100, especially noticeable in the coordinates).

My approach to compute these values has been to first prerender the dataset using MultiPathPPRenderer without normalization and filtering only interesting agents. Then traversing all the prerendered scenarios from the training split and computing the mean and standard deviation of each feature for the target agent. How come I am getting such different values? Could you elaborate on the part of the data you used to compute these constants? In particular: (1) did you use a subset of agents? (e.g. only interesting, or fully observed), (2) did you use a subset of the scenarios?

Thank you in advance!

Hi @manolotis, thanks for pointing this out! I have similar questions about this. Btw how do you calculate the constants for target/future/xy? I do not quite understand why we need this constant.

Hi @manolotis, thanks for pointing this out! I have similar questions about this. Btw how do you calculate the constants for target/future/xy? I do not quite understand why we need this constant.

I calculate the constants for target/future/xy in a similar fashion as for target/history/xy: basically aggregating the mean and std for the future coordinates (ofc without counting the invalid timestamps).

Generally it's beneficial to normalize the outputs if they have a significantly different magnitude. Here I suppose it's beneficial given the differences in resulting future x and y (since the inputs are processed such that the target is always facing the positive x direction at prediction time).

I experimented not normalizing the inputs as suggested by #3 (comment), but I always run into issues with infinite values in the first epoch, and when changing the running_mean_mode as suggested by #3 (comment) the same issues appear after a few epochs. After a little digging, the problem arises from torch.logdet of the covariances during the loss computation, since the determinant ends up negative at some point. I will continue with the normalized output, but I am still curious how the normalization constants were computed exactly.

Thanks for your explanation! I meet the same issue when setting the normalize_output as False.

	elif features == ("xy", "yaw", "speed", "width", "length", "valid"):
	normalizarion_means = {
	"target/history/lstm_data": np.array([-2.9633283615112305,0.005309064872562885,-0.003220283193513751,6.059159278869629,1.9252972602844238,4.271720886230469,0.0,0.0,0.0,0.0,0.0,0.0,0.0], dtype=np.float32),
	"target/history/lstm_data_diff": np.array([0.5990215539932251,-0.0018718164646998048,0.0006288147415034473,0.0017819292843341827,0.0,0.0,0.0,0.0,0.0,0.0,0.0], dtype=np.float32),
	"other/history/lstm_data": np.array([5.601348876953125,1.4943491220474243,-0.013019951991736889,1.44475519657135,1.072572946548462,2.4158480167388916,0.0,0.0,0.0,0.0,0.0,0.0,0.0], dtype=np.float32),
	"other/history/lstm_data_diff": np.array([0.025991378352046013,-0.0008657555445097387,9.549396054353565e-05,0.001465122913941741,0.0,0.0,0.0,0.0,0.0,0.0,0.0], dtype=np.float32),
	"target/history/mcg_input_data": np.array([-2.9633283615112305,0.005309064872562885,-0.003220283193513751,6.059159278869629,1.9252972602844238,4.271720886230469,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0], dtype=np.float32),
	"other/history/mcg_input_data": np.array([5.601348876953125,1.4943491220474243,-0.013019951991736889,1.44475519657135,1.072572946548462,2.4158480167388916,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0], dtype=np.float32),
	"road_network_embeddings": np.array([77.35582733154297,0.12082172930240631,0.05486442521214485,0.004187341313809156,-0.0015162595082074404,2.011558771133423,0.9601883888244629,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0], dtype=np.float32)
	}
	normalizarion_stds = {
	"target/history/lstm_data": np.array([3.738459825515747,0.11283490061759949,0.10153655707836151,5.553133487701416,0.5482628345489502,1.6044323444366455,1.0,1.0,1.0,1.0,1.0,1.0,1.0], dtype=np.float32),
	"target/history/lstm_data_diff": np.array([0.5629324316978455,0.03495170176029205,0.04547161981463432,0.5762772560119629,1.0,1.0,1.0,1.0,1.0,1.0,1.0], dtype=np.float32),
	"other/history/lstm_data": np.array([33.899658203125,25.64937973022461,1.3623465299606323,3.8417460918426514,1.0777146816253662,2.4492409229278564,1.0,1.0,1.0,1.0,1.0,1.0,1.0], dtype=np.float32),
	"other/history/lstm_data_diff": np.array([0.36061710119247437,0.1885228455066681,0.08698483556509018,0.43648791313171387,1.0,1.0,1.0,1.0,1.0,1.0,1.0], dtype=np.float32),
	"target/history/mcg_input_data": np.array([3.738459825515747,0.11283490061759949,0.10153655707836151,5.553133487701416,0.5482628345489502,1.6044323444366455,1.0,1.0,1.0,1.0,1.0,1.0,1.0,1.0,1.0,1.0,1.0,1.0,1.0,1.0,1.0,1.0,1.0,1.0], dtype=np.float32),
	"other/history/mcg_input_data": np.array([33.899658203125,25.64937973022461,1.3623465299606323,3.8417460918426514,1.0777146816253662,2.4492409229278564,1.0,1.0,1.0,1.0,1.0,1.0,1.0,1.0,1.0,1.0,1.0,1.0,1.0,1.0,1.0,1.0,1.0,1.0], dtype=np.float32),
	"road_network_embeddings": np.array([36.71162414550781,0.761500358581543,0.6328969597816467,0.7438802719116211,0.6675100326538086,0.9678668975830078,1.1907216310501099,1.0,1.0,1.0,1.0,1.0,1.0,1.0,1.0,1.0,1.0,1.0,1.0,1.0,1.0,1.0,1.0,1.0,1.0,1.0,1.0], dtype=np.float32)
	}

	if config["train"]["normalize_output"]:
	# assert not (config["train"]["normalize_output"] and config["train"]["trainable_cov"])
	xy_future_gt = (data["target/future/xy"] - torch.Tensor([1.4715e+01, 4.3008e-03]).cuda()) / 10.