/episodic-memory

Graph-based episodic memory

Primary LanguageJupyter NotebookGNU General Public License v3.0GPL-3.0

Episodic Memory

Graph-based memory used by self-supervised robot pythia

1. Idea

The memory should be able to replace the classical "episodic buffer" commonly used in reinforcement settings.

Having a graph-based memory allows to store sequences of (action, observation) tuples in such a way that it should be possible to use planning algorithm on top of it (Search on the replay buffer). It can also be used as a goal-space memory (Learning latent plans from play).

One other problem that can resolve a graph-based memory is the storage limit. The classical way of handling that is to remove oldest tuples from the memory. This is a hard limitation, because such a system becomes subject to the catastrophic forgetting problem. Graph-based memory can emulate a "natural decay" wich reinforce useful and importants memories and discard progressively the other ones.

2. Features

  • Store high-dimensional vectors
  • Keep sequences of actions and observations as a multi-directed graph
  • Perform fast approximate nearest-neighbors search to find relevant memories
  • Implement a natural memory decay
  • Random sampling of sequences
  • Planning algorithm

3. How it works

The episodic memory is based on two sub-memories:

a) An index memory

This is an index of (high-dimensional) memory states that can retrieve top-k neighbors really fast. 
This is useful:
    - if we want to know if we already experienced a particular state
    - if we want to retrieve the most similar states compared to the current one (external or imagined)
    
    
b) A graph memory

This is a multi-directed graph that stores sequences of (action, state). 
It uses a 'natural decay' to forget least useful memories and so free some space

4. How to use

# Install requirements
pip install -r requirements.txt
import numpy as np
import random

from memory import EpisodicMemory

max_size = 10000
sim_threshold = 31
vector_dim = 200
stability_start = 1000
actions = ["up", "down", "left", "right"]

memory = EpisodicMemory(base_path='model_files',
                        max_size=max_size,
                        index_sim_threshold=sim_threshold,
                        vector_dim=vector_dim,
                        stability_start=stability_start)

# simulate some actions / perceptions
state_m1 = np.random.random((vector_dim,))
action_m1 = random.choice(actions)
for it in range(30):
    state = np.random.random((vector_dim,))
    memory.update(state_m1, action_m1, state)
    state_m1 = state
    action_m1 = random.choice(actions)
    print(f"states : {memory.n_states}\ttransitions : {memory.n_transitions}\tforgeted states : {memory.forgeted}")

# sample some trajectories
trajectories = memory.tree_memory.sample_trajectories(n=15, horizon=6)