Issues
- 1
Original DQN Example
#103 opened by ehknight - 5
Destination GpuArray is not contiguous
#96 opened by kashif - 1
attention tests
#99 opened by justheuristic - 3
policy_estimators param is weird
#93 opened by pshvechikov - 1
example: Learning to be kind
#48 opened by justheuristic - 3
- 1
deprecate preprocess_observation
#90 opened by justheuristic - 1
batch_size parameter is wierd
#95 opened by pshvechikov - 1
- 1
BaseResolver returns int64
#91 opened by justheuristic - 0
Vectorized environment
#89 opened by justheuristic - 1
DPG refactor and demo
#86 opened by justheuristic - 1
- 2
grad dtypes mismatch in some rare case
#83 opened by justheuristic - 0
Deprecation list
#68 opened by justheuristic - 0
better weights management for memory layers
#84 opened by justheuristic - 2
AgentNet recurrence won't compile if batch_size = 1 and unroll_scan=False and at least one input is a single-element vector.
#79 opened by justheuristic - 3
canonicalize LSTM
#80 opened by justheuristic - 3
Brief outline of modules
#77 opened by arogozhnikov - 0
Hierarchical MDP as a demo?
#71 opened by justheuristic - 0
- 5
Minimal initial example.
#53 opened by arogozhnikov - 3
Automated tests on convergence
#54 opened by arogozhnikov - 1
KSfinder experiment setup
#32 opened by justheuristic - 0
Dockerfile aka "makeitwork"
#69 opened by justheuristic - 1
TupleLayer refactor
#55 opened by justheuristic - 0
TODOs
#58 opened by justheuristic - 1
Continuous action space policy gradient
#50 opened by justheuristic - 0
Add py3 to container
#52 opened by justheuristic - 2
Continuous/ndimensional action support
#25 opened by justheuristic - 0
- 1
Environment interface with Lasagne layers
#46 opened by justheuristic - 1
- 1
Adversarial architecture
#35 opened by justheuristic - 1
Release preparations
#34 opened by justheuristic - 1
Forced category predictions
#40 opened by justheuristic - 5
- 1
- 1
Window memory
#44 opened by justheuristic - 2
Getting published
#33 opened by justheuristic - 1
- 1
Dialogs demo stand
#41 opened by justheuristic - 2
- 3
- 2
A3c a.k.a. Actor-Critic method
#36 opened by justheuristic - 1
Learning refactor
#39 opened by justheuristic - 4
K-step reinforcement learning
#27 opened by justheuristic - 2
Reinforcement Learning Comparison
#28 opened by justheuristic - 1
Session printing broken
#30 opened by justheuristic - 2
Implement SARSA and compare with Q-learning
#26 opened by justheuristic