Given a nrMDP and a MRM this framework will learn the MRM from the nrMDP and several experiments using the Angluin algorithm apply on mealy machine.
- map : a file with the map representing the model. (a description of this format is given below. An example can be found in examples/example.mp)
- expert value : the value given by the expert for the first check.
- default cost : the cost for every move that doesn't bring any observation. (default: 1)
- reset cost : the cost paid to reset the system. (default: 5)
- steps per episode : number of steps for each episode of exploration during the second checks. (default: 5000)
- number of episodes : number of episode of exploration during the second checks. (default: 5)
- getExperience mode : mode of the experience function (MIN or MAX). (default: MIN)
Learning_MRMs.py -i <input_file> -v <expert_value>
[-r <reset_cost>]
[-d <default_cost>]
[--steps_episode <integer>]
[--number_episodes <integer>]
[--getExperience_mode <MIN|MAX>]
Learning_MRMs -i examples/example.mp -r 5 -d 1 -a True -v 1.5 --getExperience_mode MAX
The code in pylstar is not from me, it is from https://github.com/gbossert/pylstar
The nrMDP and the MRM are described in a file structured as follow:
<description of the nrMDP>
OBSERVATIONS
<description of the observation function>
MRM DESCIPTION
<description of the MRM>
The first line list the initial states as follow:
initial <list of the names of the initial states separated by ",">
Example:
initial s0,s1
Then each line describe a transition as follow:
<strating state> <action> <probability> <destination>
Example:
s0 a 0.2 s1
The probability can also be written down by a fraction:
s0 a 1/5 s1
An observation function returns an observation for a pair state-action. Each line corresponds to an observation:
<state> <action> <observation>
If a pair isn't represented by any line the observation corresponding to this pair is 'null'. Example:
s0 a A
In this case the pair
The MRM is a mealy machine where the input alphabet are the obseravtions and the output alphabet some rewards. Each line corresponds to a transition in this mealy machine as follow:
<starting state> <observation> <reward> <destination state>
Example:
0 A 10 1
In this case if the current state of the MRM is 0, if the agent observes A by moving in the nrMDP then he gets a reward of 10 and the MRM changes its current state to 1.