assert 0 <= indices.min()

我使用PER的时候，
important_sampling()函数中常常会因为assert 0 <= indices.min()而报错，相问下为啥会超出范围？

I have also encountered this issue, so I took the initiative to add an assert statement here. The problem I encountered at that time was that the batch size was larger than the data temporarily collected in the PER.

我也遇到过，所以主动地在这里写了assert
我当时遇到的问题是：batch size 比 PER里面暂时收集到的数据还要多。
解决方案：可以先往 replay buffer 存入更多的数据。

ElegantRL/elegantrl/train/replay_buffer.py

Lines 256 to 267 in 55d0273

    
           def important_sampling(self, batch_size: int, beg: int, end: int, per_beta: float) -> Tuple[Tensor, Tensor]: 
        
               # get random values for searching indices with proportional prioritization 
        
               values = (torch.arange(batch_size) + torch.rand(batch_size)) * (self.tree[0] / batch_size) 
        
               # get proportional prioritization 
        
               leaf_ids, leaf_values = list(zip(*[self.get_leaf_id_and_value(v) for v in values])) 
        
               leaf_ids = torch.tensor(leaf_ids, dtype=torch.long) 
        
               leaf_values = torch.tensor(leaf_values, dtype=torch.float32) 
        
               indices = leaf_ids - (self.buf_len - 1) 
        
               assert 0 <= indices.min() 
        
               assert indices.max() < self.buf_len

The solution is to first store more data in the replay buffer. However, it is also possible that there are other problems causing this issue, so you will need to provide more code for us to analyze together.

也有可能是其他问题导致的，这需要你提供更多代码我们一起分析。

谢谢作者的解答，我训练SAC时已经运行了将近4.49e+04步才报的错，不过也很奇怪我训练了三次，有两次在4.49e+04步报的错，另一次在4.39e+04报的错？下面函数找leaf_id 中深度的循环少一层，好像与莫凡的代码有点区别，不知道是不是我理解有误？

def get_leaf_id_and_value(self, v) -> Tuple[int, float]:
        """Tree structure and array storage:
        Tree index:
              0       -> storing priority sum
            |  |
          1     2
         | |   | |
        3  4  5  6    -> storing priority for transitions
        Array type for storing: [0, 1, 2, 3, 4, 5, 6]
        """
        p_id = 0  # the leaf's parent node

        for depth in range(self.depth - 2):  # propagate the change through tree
            l_id = min(2 * p_id + 1, self.max_len - 1)  # the leaf's left node
            r_id = l_id + 1  # the leaf's right node
            if v <= self.tree[l_id]:
                p_id = l_id
            else:
                v -= self.tree[l_id]
                p_id = r_id
    return leaf_idx, self.tree[leaf_idx]

# 莫凡
def get_leaf(self, v):
        parent_idx = 0
        while True:     # the while loop is faster than the method in the reference code
            cl_idx = 2 * parent_idx + 1         # this leaf's left and right kids
            cr_idx = cl_idx + 1
            if cl_idx >= len(self.tree):        # reach bottom, end search
                leaf_idx = parent_idx
                break
            else:       # downward search, always search for a higher priority node
                if v <= self.tree[cl_idx]:
                    parent_idx = cl_idx
                else:
                    v -= self.tree[cl_idx]
                    parent_idx = cr_idx

        data_idx = leaf_idx - self.capacity + 1
        return leaf_idx, self.tree[leaf_idx], self.data[data_idx]

好的，我周末检查一下。能否提供你创建 ReplayBuffer的最大容量是多少？是 2 ** 16 吗？

可能是 replay buffer 的size 满了，然后指针从 index 的末尾复位到前列，如果刚好指针p 指向了最后一个元素，那么有可能导致复位发生错误。

也许我添加下面这个逻辑语句可以解决问题。谢谢你提供的代码。

            if cl_idx >= len(self.tree):        # reach bottom, end search
                leaf_idx = parent_idx
                break

您好，我将我的ReplayBuffer的最大容量改为了10**6。

请问这个bug现在解决了吗？
我运行demo_PER_prioritized_experience_replay.py报了同样的错误
Traceback (most recent call last):
File "E:\ElegantRL\examples\demo_PER_prioritized_experience_replay.py", line 99, in
train_ddpg_td3_sac_for_lunar_lander_continuous()
File "E:\ElegantRL\examples\demo_PER_prioritized_experience_replay.py", line 51, in train_ddpg_td3_sac_for_lunar_lander_continuous
train_agent(args)
File "E:\ElegantRL\elegantrl\train\run.py", line 89, in train_agent
logging_tuple = agent.update_net(buffer)
File "E:\ElegantRL\elegantrl\agents\AgentTD3.py", line 45, in update_net
obj_critic, state = self.get_obj_critic(buffer, self.batch_size)
File "E:\ElegantRL\elegantrl\agents\AgentTD3.py", line 71, in get_obj_critic_per
states, actions, rewards, undones, next_ss, is_weights, is_indices = buffer.sample_for_per(batch_size)
File "E:\ElegantRL\elegantrl\train\replay_buffer.py", line 134, in sample_for_per
_is_indices, _is_weights = sum_tree.important_sampling(batch_size, beg, end, self.per_beta)
File "E:\ElegantRL\elegantrl\train\replay_buffer.py", line 293, in important_sampling
assert 0 <= indices.min()
AssertionError
我修改了get_leaf_id_and_value函数如下：
`

    for depth in range(self.depth - 2):  # propagate the change through tree
        l_id = min(2 * p_id + 1, self.max_len - 1)  # the leaf's left node
        r_id = l_id + 1  # the leaf's right node
        if l_id >= len(self.tree):
            p_id = 0
            break
        else:
            if v <= self.tree[l_id]:
                p_id = l_id
            else:
                v -= self.tree[l_id]
                p_id = r_id
    return p_id, self.tree[p_id]  # leaf_id and leaf_value

`
但是还是会报相同的错误

	def important_sampling(self, batch_size: int, beg: int, end: int, per_beta: float) -> Tuple[Tensor, Tensor]:
	# get random values for searching indices with proportional prioritization
	values = (torch.arange(batch_size) + torch.rand(batch_size)) * (self.tree[0] / batch_size)

	# get proportional prioritization
	leaf_ids, leaf_values = list(zip(*[self.get_leaf_id_and_value(v) for v in values]))
	leaf_ids = torch.tensor(leaf_ids, dtype=torch.long)
	leaf_values = torch.tensor(leaf_values, dtype=torch.float32)

	indices = leaf_ids - (self.buf_len - 1)
	assert 0 <= indices.min()
	assert indices.max() < self.buf_len