Q-Behave

Q-Behave is machine learning library written in C++ compatible with Arduino and other embeded devices. The idea behind machine learning is to allow your device to acquire patterns based on user interactions. This is achieved without the need to program them beforehand.

To illustrate this concept take RGB diode with only one button as an example. After pressing the button the user expects to see diode flash with some color. Next user will have different color in mind when pressing the button. In a conventional way the system cannot be programmed to guess user intention on which color to show.

When we put Q-Behave as a processor between interaction (pressing a button) and state (flashing the diode) the device will choose one of possible states based on prior experience gained from user interaction. So when the user expects to see green light and instead the led flashes a different color, the user presses the button immediately again to finally get the desired color. This way the user informs the processor that he/she was expecting a different state of the diode. After several trials the device responds with a correct color right the first time because it learns what the user expects from it.

Utilizing this technique one program can behave differently and adapt to different users. In fact the same rule applies when user changes his mind thus forcing the device to learn the new behaviour. After some trials with the old color the program will adapt to new rules and satisfy user expectations without the need to reprogram it.

System design

To understand better what is this system take a look at block diagram of Q-Behave system.

Figure 1. Q-Behave block diagram

Inputs represents anything that can interact with the system like for example buttons or timers. It goes into the State Controller which checks the memory and ask Learning Processor to calculate the best action. As an outcome of this process an action changes the State of the machine. Reward or penalty is calculated either when triggered from external interaction or timer or as a result of repetition of the same action.

In Arduino code interaction is just an invocation of the function with the integer parameter representing the input type. States of the system are a bit more complicated, as each state have to be encalupsated as an object of class extending the abstract State class defined in the system. What is already provided can be represented by the folowing figure:

Figure 2. Q-Behave state types

The only thing that might need additional explanation is the testing environment and a special class MockState. The purpose of those two thing is to provide simple solution for testing the behaviour algorithm without the need of conneting any physical device.

Download and Install

To download this library using git:

git clone git@github.com:smigielski/q-behave.git

git clone https://github.com/smigielski/q-behave.git

To install, copy libraries/q-behave to your Arduino/libraries/ directory. No other tasks are required and you have library ready to be used. Check example projects for details.

Examples

RGB Diode control with one button only

First example that you can use to learn how this library works and to check how good user experience is when using learning devices. Except Arduino there are only few requiered components so I am quite sure you can build it with what you have.

Table 1. Part list for RGB Diode control with one button only

Part Type	Amount	Properties
1	Arduino Uno (Rev3)	type Arduino UNO (Rev3)
1	RGB LED (com. anode, rbg)	package 5 mm [THT]; pin order rbg;
1	Pushbutton	package [THT]
1	10k Ω Resistor	package THT; resistance 10kΩ;

It is best to first build it on the breadbord to see how it works:

Figure 3. Schematics of RGB Diode control with one button only

If you need we also provide you detail schematics file for a reference:

Figure 4. Breadboard of RGB Diode control with one button only

When you have hardware in place there is time to see software in more detail. First thing is organization of the states.

//Define states
RestState restState = RestState();
LedState green = LedState("green",9);
LedState blue = LedState("blue",10);
LedState red = LedState("red",11);

//Prepare connection map between states. Note that it is allow only to move
//from rest state to led states and off again.
Action restActions[] = { { &green, 0.0 }, { &blue, 0.0 },{ &red, 0.0 } };
Action greenActions[] = { { &restState, 0.0 }};
Action blueActions[] = { { &restState, 0.0 }};
Action redActions[] = { { &restState, 0.0 }};

StateActions states[] = {
		{&restState,3, restActions },
		{&green,1,greenActions},
		{&blue,1,blueActions},
		{&red,1,redActions}
};
StateMap stateMap = { 4, states };

As mentioned earlier RestState is the initial state with all leds turn off and each LedState represent different color. In action arrays there is definiton of posible action that Q-Behave can take from one state to the other. The last thing is that we have to gather it together in states array and than in state map, structure representing state system.

Next we can see how interaction and reward function is defined. In this case with have only one interaction button and as a reward there is timer configured.

//Interactions
SimpleButton button1 = SimpleButton(14);
SimpleTimer rewardTimer;
int timerId;


//Button configuration
void onReward(){
    Serial.println("onReward: ");
    brain.stop(10.0);
}

void onButton1(SimpleButton& b){
    Serial.print("onButton1: ");
    Serial.println(b.pin);
    brain.start(0);
    if (rewardTimer.isEnabled(timerId)){
      //if timer was already enable than restart it
      rewardTimer.restartTimer(timerId);
    } else {
      timerId = rewardTimer.setTimeout(5000, onReward);
    }
}


//Startup code
void setup() {
   button1.pressHandler(onButton1);
}

// the loop routine runs over and over again forever:
void loop() {
  button1.process();
  rewardTimer.run();
}

The last piece of information that is missing is memory and processing system.

//Simple memory definition and configuration. All memory is blocked in heap.
//Please note that memory is volatile right now and will not survive restart.
double prob1[6];
double* internalMemmory[]={prob1};
SimpleMemory memory = SimpleMemory(stateMap,internalMemmory);

//Main learning process.
QLearningMachine brain = QLearningMachine(&memory,&restState);

To see full source code of this example go to Q_Behave_Single_Button.ino

RGB Diode with three buttons and a special reward button

Second example, also quite simple, is mainly the same as previous one but this time there are three buttons that can be learned individually and a special reward button. To see how the learning process look like see this video.

The bread board and the schematics look almost the same. The only difference is that now it contains more buttons:

Figure 5. Breadboard of RGB Diode with three buttons and a special reward button

and schematics:

Figure 6. Schematics of RGB Diode with three buttons and a special reward button

If you want to be certain about what we have used this is the part list:

Table 2. Part list for RGB Diode with three buttons and a special reward button

Part Type	Amount	Properties
1	Arduino Uno (Rev3)	type Arduino UNO (Rev3)
1	RGB LED (com. anode, rbg)	package 5 mm [THT]; pin order rbg;
4	Pushbutton	package [THT]
4	10k Ω Resistor	package THT; resistance 10kΩ;

This time we need to define more buttons and reward function as a callback to the button press instead of the timer:

SimpleButton button1 = SimpleButton(14);
SimpleButton button2 = SimpleButton(15);
SimpleButton button3 = SimpleButton(16);

SimpleButton reward = SimpleButton(17);

and more interaction function:

void onReward(SimpleButton& b){
    Serial.print("onReward: ");
    Serial.println(b.pin);
    brain.stop(10.0);
}

void onButton1(SimpleButton& b){
    Serial.print("onButton1: ");
    Serial.println(b.pin);
    brain.start(0);
}

void onButton2(SimpleButton& b){
    Serial.print("onButton2: ");
    Serial.println(b.pin);
    brain.start(1);
}

void onButton3(SimpleButton& b){
    Serial.print("onButton3: ");
    Serial.println(b.pin);
    brain.start(2);
}

Setup and main loop is also a bit bigger now, but still quite clean:

//Startup code
void setup() {
   button1.pressHandler(onButton1);
   button2.pressHandler(onButton2);
   button3.pressHandler(onButton3);
   reward.pressHandler(onReward);
}

// the loop routine runs over and over again forever:
void loop() {
  reward.process();
  button1.process();
  button2.process();
  button3.process();
}

The last difference is that memory block have to accomodate more information, one prob array for each button.

//Simple memory definition and configuration. All memory is blocked in heap.
//Please note that memory is volatile right now and will not survive restart.
double prob1[6],prob2[6],prob3[6];
double* internalMemmory[]={prob1,prob2,prob3};
SimpleMemory memory = SimpleMemory(stateMap,internalMemmory);

State system is exactly the same as it was before:

//Define states
RestState restState = RestState();
LedState green = LedState("green",9);
LedState blue = LedState("blue",10);
LedState red = LedState("red",11);

//Prepare connection map between states. Note that it is allow only to move
//from rest state to led states and off again.
Action restActions[] = { { &green, 0.0 }, { &blue, 0.0 },{ &red, 0.0 } };
Action greenActions[] = { { &restState, 0.0 }};
Action blueActions[] = { { &restState, 0.0 }};
Action redActions[] = { { &restState, 0.0 }};

StateActions states[] = {
		{&restState,3, restActions },
		{&green,1,greenActions},
		{&blue,1,blueActions},
		{&red,1,redActions}
};
StateMap stateMap = { 4, states };

Full source code of this example can be found at Q_Behave_RGB_Led.ino

License

This work is licensed as GPL software. See LICENSE file for full text. If you need other license or would like to get support, please contact us.