maestro is an actor-model framework for creating intelligent autonomous actors.
maestro is currently in a prototype phase.
We are creating a "Markov compatible sensorimotor inference engine."
"Sensorimotor" refers to a program that gets input (sensory data) and produces output (motor commands). In other words, it's in a continuous loop with the environment (the environment being a puzzle or maze).
"Inference engine" means that it figures out how its motor commands affect the environment. It does this by paying close attention to how its sensory input changes based upon what its motor commands were.
Lastly, being "Markov compatible" simply means the environment is constrained to be simple: a static state-space that is fully observable, even though it may be very large. The proof of concept won't be able to handle anything more complex than that.
The proof of concept design can be understood as a network of nodes that all talk to each other and share information to understand the environment they're placed in.
Each node can see a portion of the environment, remembers certain things about the past, and can talk to any or all of the other nodes.
They basically propose and vote on what choices to make in order for the majority of them to make sense of the world, though each only sees a small portion of it.
The idea is that you can categorize environments based upon their features and complexity. Thus if the AI can learn how to manipulate a certain type of environment it can learn to manipulate every possible environment, ever possible puzzle, that conforms to that type. Thus it is generalized.
We have identified the following key features of environments that can be combined define its type:
-
Environments can be large or small. If it's large they are too big for one node to ever possibly understand, if they are small they are not memory intensive; one node can memorize the whole thing.
-
Environments can have symmetric/repeating sensory patterns or the state space can be arranged somewhat randomly. This is talking about an environment's entropy.
-
Environments behaviors (what the AI can do) can have symmetric effects on the environment (such as going right undoes the effect of going left) or non-symmetric effects.
-
Environments can have a static state-space that is fully observable or a non-static one. This essentially means there are other actors changing things in the environment, and that the Maestro AI actor is not the only agent acting on the environment.
This creates a matrix of 8 different types of environments ranging from simple to complex. For example, here's the simplest possible environment (2x2 rubix cube):
- small state-space
- symmetric/repeating sensory patterns
- behaviors have symmetric effects
- static state-space, fully observable
Here is the environment type we hope to be able to manage with our proof of concept (3x3 rubix cube):
- large or infinite state-space (memory intensive, multiple nodes required).
- symmetric/repeating sensory patterns
- behaviors have symmetric effects
- static state-space, fully observable
And here is a complex environment that we someday hope to have Maestro AI manage effectively:
- large or infinite state-space (memory intensive)
- non-symmetric and not repeating sensory patterns (sensory input is high in entropy)
- behaviors do not have symmetric effects (motor effects is high in entropy).
- static state-space, fully observable
We consider an environment that is not a fully observable static state space to be the most difficult problem to solve and should be left out of consideration for the time being. However, interestingly enough, we have noticed that is the problem others have tried to solve first. Chess, for example, is an environment with another player, therefore the state-space is not static, one state does not always lead to the same next state.
We feel that solving AGI for simple environments is possible and should be done first; that it is an 'early optimization' mistake not to. The world is full of static systems that an AGI, such as ours, could be in charge of managing according to the goals we provide it. We feel this is an obvious niche that has been overlooked.
We will know that our proof of concept for this design is a success if we give the AGI a variety of very simple puzzles (such as a Rubix Cube, or Atari video games) and it automatically learns how to solve them without any instruction or help.
It is our belief that AGI is necessarily computation upon distributed memory (on a network).
Some evidence for this belief is that most sophisticated machine learning algorithms such as Neural Nets are merely simulations of a network architecture. Even simple machine learning algorithms such as decision trees are represented as a network of nodes, in a particular hierarchical structure that produces a directed acyclic graph of computation.
Thus, Maestro AI is essentially just being a network of nodes, simulated in some way such as the "actor model" of programming. Maestro AI is an attempt to produce a simple, generalized method of computation upon a network of nodes. This is to be done by working in two containing paradigms:
-
Maestro is to be a Sensorimotor engine, in constant communion with its environment. That is to say it's not merely a model, applied to data, but can be thought of as a living model, constantly changing through interaction with its environment; as an actor.
-
Maestro is to be given, and made to understand the simplest category of environments first. There are simple environments (essentially static state spaces) and complex environment (essentially environments that change overtime regardless of maestro's actions). The Maestro Proof of Concept should first learn how to manage simple ones, rather than complex ones. This is because we want to, as its creators, learn step by step how to manage the communication of the network in order to most optimally achieve the appropriate distributed computation on distributed information for each type of environment. We need to learn the relationship between the complexity of Maestro's environment and the complexity of each node's memory and ability to communicate with the rest of the network (distributed memory).
Each actor has 4 things: behaviors as predefined functions, input data which is ultimately derived from the external environment, contextual information (communication with fellow actors), and a specific goal in terms of what they should get their input data to look like.
Once you have a system defined with each actor having functionality and set to receive messages from the correct other actors, you train the actors by letting them explore their behaviors at random, attempting to achieve their goals in the most efficient way possible. This training is done in an isolated environment.
Once the actors are trained they are put to work in the real environment. The idea behind this is that they can maintain, or bring about any state of that environment by working together and coordinating their efforts in the way they learned during training.
Maestro AI's distributed memory, and computational infrastructure acts as an efficient path finding algorithm which can find a path from one state to any other state in the environment's state space.
How is Maestro useful? Many structures in the world today are essentially puzzles. They're static, fully observable state-spaces in which a Maestro AI bot can be place and can naturally (unsupervised) learn how to manipulate the environment to achieve any state of the environment it is given as a goal. Thus, Maestro, Even a naive version of it could prove very powerful as specific programming would not need to be employed.
A perfect and extremely simple use case is a web server supervisor. Maestro would be given a goal to make sure the website is up and that not more than one instance of the webserver is running. That is a specific state-space within all possible states of the environment. During training it can learn how its behaviors manipulate the environment: one function it can call kills all instances of the webserver while another function starts one.
After it learns how to restart the server as soon as it crashes it can be deployed to the real world. It could be trained to supervise more servers than one and learn which functions restart which servers or what to do when there are complications and when it is appropriate to get a human involved.
Its functions can even be masks over sending messages to other actors (for instance to inform them of a context). If you have an distant actor trained to monitor or use a service that relies on the website to be up, for example, it can learn that when it detects the server goes down it should notify that distant actor so that the distant actor takes the appropriate behavior.
The important thing is that these connections are learned by the actors. With the advent of the internet, and distributed systems we expect the actor model to become more and more popular because of its added layer of abstraction. We expect computing in general to be seen more as a protocol on a network than as serial instructions to one machine.
We expect to see more microservices and more intelligence to be injected into our distributed systems to make them adaptable, autonomous, resilient and robust. We see, therefore, the role of the programmer changing from one of micromanager to be one of steward and incentive designer, almost growing a system rather than engineering it. Less procedural and more declarative; putting things together before they exist.
We view intelligence, or at least general intelligence as a network effect, an emergent property of arising out of the interactions of nodes on a network. There is a feedback loop between the protocol, or language of a network and the structures that the network generates both in its internal connections and in the composition of each node. This feedback loop is the means of amplification of the inherent intelligence of the system. It is a strange loop, inherently, homeostatic (for a time), yet inherently chaotic and unpredictable because it's causes recede to the edges of the environment in which it is placed and the depths of the nuances in its internal dialogue.
Maestro represents an attempt to exemplify the above interpretation of what intelligence fundamentally is by creating a network of entirely naïve nodes that work together to achieve a certain goal. Their protocol does not evolve nor does their connections or internal computational structure to any significant degree but once these elements are statically created, different ways in which they can effect one another can be explored.
maestro, being a naïve sensorimotor inference engine has a very limited scope of environment which it can accurately model and command.
The most important constraint is that the environment have the Markov principle; that the state space of its various configurations be a static structure. maestro will explore this state space, it need not see it in it's entirety, but it must be static in order to be predictable. If maestro is in a location in the state space it has seen before and it does the exact behavior that it performed last time, at that location, it should always get the same result.
The second constraint is that the environment be fully observable; that it has no hidden variables. This is almost a restatement of the Markov property, but its technically a different constraint - that is, it must have the Markov property and it must be known to have the Markov property.
The third constraint is that the environment does not change state on its own. That is, the maestro, when acting upon the system should be the only actor upon the system. This, again, is almost a restating of the previous constraints.
The last constraint upon the environment is that the actions maestro can perform upon it change only a specific subset the environment's representation and that said actions always have an effect on the same subset.
This may look like a very constrained environment, however most logic puzzles fit into this category of environments: an isolated (from any external force), fully observable, static state-space, where all actions upon it alter only a subset of its informational representation.
The Rubiks Cube, being a quintessential example of an environment like this, has been chosen as our primary testing environment as it is a vast state-space, somewhat complex, but uniform enough to not require a full exploration of the space, that is, it's underlying relationships needn't be logically deduced.
Maestro proof of concept is essentially a 2 layer higherarchy. One actor is initially created who is the "master" of all other actors. The Master node has very different responsibilities than the network of nodes. His role is to create the appropraite size of other nodes and serve as a conduit between the network and the outside environment, passing information from the environment to the nodes and carrying out the will of the network upon the environment (performing behaviors). The master node is the mother of the network, the eyes and voice of the group.
The master node has 2 important roles:
-
Create all other actors as required by the environment. One actor will be created for each combination of inputs from the environment. if the environment is represented as 3 data points, A B and C maestro will make 6 subordinate worker actors: one for A, one for B, C, AB, AC, BC. No node sees the entire picture, thus no node is assigned ABC. If B never changes unless C changes then B and C nodes can be removed, leaving only nodes A, AB, AC, and BC.
-
Interface with the human operator and relay goals to the actors. And interface with the environment to pass state representations to the actors. The master node is the maestro. It doesn't know how to do anything, it makes little memory. When a goal is given it is given in the form of how the environment should look to the master node: 101. If the current state of the environment is 000 the master node is tasked with the responsibility to choose the appropriate actions to move the input state from 000 -> 101. It does this by relaying the task to subordinate actors and waits for them to work it out amongst themselves. When a consensus is reached about what action to take next the master node implements the behavior and if the goal is reached notifies the human controller, if not, the process repeats.
The way in which actors coordinate is naïve and simple as is their memory. Each actor memorizes all environment-state-inputs paired with all behaviors taken and the resulting outputs it has ever seen. Actors that have complete consistency, that is the same input always leads to the same output, are prime members of the society. Their input matters the most because in theory they're the only ones that need to communicate with each other in order to visit any desired state of the environment.
Goals are sent to this group first and only if they fail to achieve the goal are other actors consulted. This prime group of actors broadcast the states closest to the goal state that they know they can get to with actions. Other prime actors listen to the broadcasts and see if any of the states correspond to an input state that, given a certain action, can lead to the goal state. They continue in this cacophony until either a hard limit of broadcasts are achieved or a goal state is found.
If the goal state is found a special broadcast is made and that thread gets passed up to the master node. The master node parses out the actions and does the behaviors. If a goal state is not found, the closest one to the goal is selected and passed up to the master. The master then decides how many of the suggested actions to do, from at least the first, to all of them. Then he passes back down to the prime actors the new state of the environment and the goal.
If too many failures to find the goal state are accrued, the master will start speaking to the non-prime actors to get their input. If they fail to find a solution the master will tell the human it has failed to achieve the desired goal.
Maestro has an internal network of nodes with different and overlapping views of the environment so they have different and overlapping memory. When confronted with a task the nodes know and care only about fixing whats wrong with their view of the world. They analyse their memory structures and deduced how to manipulate their view of the world completely, regardless of what happens to the other parts of the world.
So when given a current state of the world, and asked to make it into a goal state they first come up with the optimal way to fix their side view of the world to look as much like the goal state as possible. They then list a series of macro moves - that is moves that might affect others - along with the final state of the system as far as they can see. Every node does this proposal of their preferred set of moves.
Then every node begins to fill in the missing parts to everyone else's proposals. They say, "if we did this move, as you have suggested, from my point of view, this part of the environment, which you don't know about would end up looking like this." and they do that for all proposals according to what seems most valuable to promising them and what is nearly complete (and therefore most valuable to the group).
Eventually one proposed path will be completely filled out and the final state of all the actions is known. That path will be chosen, not to execute immediately, but chosen as a new starting state where all the nodes will then look for paths from that state to the goal, and repeate the process above.
Of course this same process will occur from the goal state, back to the initial state too in parallel. This means if there is any matching state on the goal side, working backwards that matches a state on the current inital state working forwards then we have found a path to a goal. nodes are also checking those states to see if any of them match or partially match - that is some of the state is known, has been filled out, and all of what is known matches a state on the other side; those will be looked into further, and will direct attention to work on those kinds of paths.
Eventually a path from the initial state will be found that leads to the goal state. At that time the completed actions (the path) will be passed up to the master node, and it will execute the behaviors, always verifying the prediction as it goes. In the event that a prediction is violated the mistaken node will be shown the mistake and it's memory will be fixed. Then the process will start over again from the current state.
This is the simplest multi-agent, distributed memory, collaborative, single-set-of-collective-behaviors, generalized, path finding algorithm we could come up with.
maestro> python setup.py develop
maestro> maestro # maestro is instantiated with one node by default
maestro> tune # explore the environment with one node learning how many nodes need to be instantiated
maestro> stop # stop all behaviors and activity, instantiate the right number of worker nodes
maestro> tune # explore the environment with all nodes learning how the environment works
maestro> stop # stop all behaviors and activity
maestro> info # stop all behaviors and activity
maestro> debug print(self.musicians)
{(1, 2, 3, 4, 5, 6, 7, 8, 9, 10, 11, 12, 13, 14, 15, 16, 17, 18, 19, 20): <maestro.core.musician.MusicianNode object at 0x00000239216BC940>,
(29, 30, 31, 32, 33, 34, 35, 36, 37, 38, 39, 40, 41, 42, 43, 44, 45, 46, 47, 48): <maestro.core.musician.MusicianNode object at 0x000002392345F668>,
(1, 7, 8, 9, 10, 18, 19, 20, 21, 22, 27, 28, 29, 30, 38, 39, 40, 41, 47, 48): <maestro.core.musician.MusicianNode object at 0x000002392345F908>,
(5, 6, 7, 15, 16, 17, 18, 19, 25, 26, 27, 28, 35, 36, 37, 38, 39, 45, 46, 47): <maestro.core.musician.MusicianNode object at 0x000002392345FC18>,
(1, 2, 3, 9, 10, 11, 12, 13, 21, 22, 23, 24, 29, 30, 31, 32, 33, 41, 42, 43): <maestro.core.musician.MusicianNode object at 0x000002392345FF28>,
(3, 4, 5, 12, 13, 14, 15, 16, 23, 24, 25, 26, 32, 33, 34, 35, 36, 43, 44, 45): <maestro.core.musician.MusicianNode object at 0x00000239234C8278>}
maestro> debug print(self.musicians[(1,2,3,4,5,6,7,8,9,10,11,12,13,14,15,16,17,18,19,20)].structure)
input ... result
1 2 3 4 5 6 7 8 9 10 11 12 13 ... 8 9 10 11 12 13 14 15 16 17 18 19 20
0 left left under back back left top top back under under back right ... None None None None None None None None None None None None None
1 under back back left top top left left back right left left top ... left back right left left top top back top front right top under
2 under back back left right under front left back right left left top ... left back right left left top top under right back top right under
3 under back back left front under back left back right left left top ... left back right left front under right under left back top right under
4 under back right back back under back left back right left front under ... left back right left back under under left under back top right under
...
31 top top back top top front right back right back right left top ... back right back right left top back under back under front top left
32 top top back top right right left back right back right left top ... None None None None None None None None None None None None None
[33 rows x 41 columns]
None
maestro>
maestro comes with a directory structure for the importation of external modules, a package structure for the creation of internal modules, a notebooks and playground area for data exploration and visualization, a models folder for finished models, a tests folder for the creation of a test suite, documentation folders and sphinx setup, and the skeleton of a flask web app front end.
maestro
|
├── maestro <-- Source code for use in this project.
│ ├── bin <-- Module: command line entry point into project.
│ ├── config <-- Module: configuration or settings files.
│ ├── lib <-- Module: common functions used by other modules.
| ├── simulations <-- Module: environment simulations.
│ └── core <-- Module: core functionality.
│
├── database <-- Holds state of the actors upon shutdown.
|
├── docs <-- Holds all documentation (see sphinx-doc.org).
│
├── notebooks <-- Jupyter notebooks area for data exploration and
│ manual model creation.
│
├── playground <-- Exploration and experimentation area
| (is ignored by git).
│
├── tests <-- Tests for source code.
│
├── web <-- Flask app web front end. Possible uses: make
| documentation available or call workflow remotely.
│
├── README.md <-- README.md for developers using this project.
├── make.bat <-- Skeleton Sphinx make file. `/maestro> make html`
├── setup.py <-- Skeleton `/maestro> python setup.py develop`
└── Dockerfile <-- Skeleton Dockerfile for creating portable image.
to remake documentation:
- delete contents of docs folder
- run
sphinx-quickstart
- specify docs folder or move files into docs folder and modify make files after
- run
make html