I'd always imagined that I would hook up a small robot with a embedded neural network, giving myself a remote control with a button like this:
The robot would rove around, and whenever it did something "bad" (e.g. ran into a wall that it should have registered on its sensors) I'd press the button and it would train itself using that "bad" input->output pairing - e.g. that "move forward" when the front sonar sensor is registering an obstruction is "bad". I could also have a "good" button if it did something like turn just before a wall, for instance, to reinforce the correct behaviours.
This appealed to me as it was also very similar to how I (attempt to) train our cat...
Yes, that is our cat. No, that was not a training session... |
The algorithm works in the following way:
- Read the "sensors"
- Apply sensor readings to a learning tool (neural network), get the output
- Try out the output "in the real world"
- If the result of trying out the output is "bad":
- Slightly mutate the output
- Goto 3 above
- Train the network with the resultant (original or mutated) output
This approach is similar to the "bucket brigade" rule-reinforcement technique that can be used to train expert systems. It is also not dissimilar to reinforcement learning principles, except that the observation-action-reward mechanism is implicit instead of being explicit - the action is the output generated based on the observation and the weighting of the neural network and the reward (or penalty) is externally sourced and applied to the network only when needed.*
I am looking forward to trying this out a real mobile robot as soon as I can order my Pi and I will keep you up-to-date on how it turns out.
* Oh, and just to be clear, I am not a robotics or AI PHD student and this is not part of a proper academic research paper. It is very likely that what I am doing here has been done before so I make no claim to extraordinary originality or breakthrough genius - just consider this some musings and a pinch of free Python code :)