Wednesday 22 October 2014

Goal less intelligence

This is the 2nd post about "Emotional Intelligence", please visit the Introducing Emotional Intelligence before you go on reading for a proper introduction on the subject.

Goal less intelligences

In my first post I already commented on the internals of the simpliest entropic intelligence possible, one that scores all the futures as 1. If you haven't read it and want to know in more detail this case, you can visit the link first. Anyhow, I will try to summarize the basic working of tis model again.



We have a kart as our system, a simple physic simulation code is able to answer the question "where will you be in 0.1 seconds if now you are here".

We also have a degree of freedom as part of our system: we can move the driving wheel left or right at any moment by "pushing" a imaginary joystick left of right.

We will call it "our options", so in this simpliest case, we only have two of them: left and right. The kart engines are always on, there is no way to break or slow down, so our options consist on a single number -the force you have to apply to the imaginary joystick- but in general, it will consist in a vector of N numbers, where N is the number of degrees of freedom -or joysticks, in my previous example- so think in an option as a vector containing the "push" you need to apply to the N joysticks to make the agent evolve "intelligently".

With just that information, this algorithm is able to tell you how strong you should push this joystick at any moment to get a "intelligent behaviour" on the kart. What this will mean in each case is out of your control, the intelligence is "goal less".

I call it a "common sense" or "goal less" intelligence.

Watch this video to see this implementation in action:


Some technical details: The video shows a simple kart simulation with two degrees of freedom, the acceleator and the left-rigth control. For each one I use four options, that in the case of left-rigth control, could be: -15, -5, +5 and +15 as it worked much better than the simplified two-options counterpart.

The algorithm in detail

I will detail the algortihm in this basic form using this image captured from the real application (V1.1) as my test case:



1) Take the first available option, in this case, "push the joystick left". We are going to "score" this option so we can later compare with the second option.

Everything related to the first option "go left" is painted on blue in the above image. For the "go rigth" option red is used. This way you can clearly separate what each option use as futures.

2) Take your actual position (or "state") and use the simulation to know where you will be after 0.1s if you push the joystick as this option dictates. It will give you the state where you will be after taking this decision.

In the image above, this would be the origin of all blue lines, actually under the kart body. For the second option, going rigth, it is the origin of all the red lines.

3) From this "option initial position", imagine a 5 seconds long random future, by iterating steps of 0.1 second long. In all those 0.1s steps, the degree of freedom will take random values in a given range, in this case from -5 to +5 units of force, so the joystiks are randomly pushed left or right in every new step, until you reach you time horyzon, how many seconds in the future you want to "think". In my first tests it was 5 seconds, so at a 0.1 second steps, it will take 500 steps until you arrive to the future's end point.

This step woud draw a blue or red line, depending on the option you were scoring.

4) Take this end point coordinates, round them to a given precision (in my case I initially used 10 pixel units), and add it to the option's list of different futures found (a list of vectors containing the ending point of all different futures found this far). If this ending point is already in the list, just discard it.

In the image above, it correspond to the blue circles for the left option, and red circles for the right option. The radius doesn't mean anything, as they all score as one in this method.

5) Repeat the process fom 3) to imagine a new possible future untill you try a fixed number of futures. In my case I used 100 futures to try, but a biger number like 500 works better.

This would make all 100 blue dots appear one by one on the first option, and all the reds on the second option.

6) Score the option with the number of different futures you found, N.

If you imagine the grid size used to round the final future positions as representing "tiles" on the track, as in a grid, the this N is actually measuring the area of the surface "touched" by the blue circles.

The tile area itself -the squared grid size- is not counted as we only need to compare the blue area with the red area.

It also means we are scoring each future with a simple one, and using N to score the option becasue it is the sum of all those ones. Options always score by summing all its futures' scores.

7) Repeat with the next option available (in this case "turn rigth") from point 2) until all options have been scored.

Now you have the blue and red areas measured.

8) Now we normalize all the options score so they sum 1: start by locating the smaller score and substract it to all the options scores, so now the smaller one is exactly zero. Now divide them by its sum so they sum 1. You have converted the option's scores into weigths you can use to average.

In the image, it would mean you first find the smaller area -the  blue one is smaller this time- and take it from all the areas, as if they were shrinking until one dries completely. Finally, one area is zero and the others are >= 0.

Note: This step can seems too dramatic a change. It is in this case, but when you play with 8 or more options, doing this will make intelligence faster deciding among similar options. Imagine the kart has a Y shaped junction in front. Going left or rigth is almost equally scoring, but it you do nothing and go straight, you will crash. The sooner you decide one path, the better, so in general it makes intelligence "sharper". In a case we were not dealing with such a "organic" agent, may be disconnecting this would make the algortihm more "neutral".

9) Then, the Intelligent decision is the averaged value of your options -remember they were just vectors with one component per free param- weigthened with the scores we normalized in 8).

You are graphically comparing blue and red areas in the inintial example image. If you see more blue dots, you should go left.

Now you have your intelligent decision, it is a vector and each component is a force you need to apply to a joystick at each step, so you end now by simulating, on screen, this decision:

10) We are back to reality instead of imagining a future. In my case, I switched between two posible states an agent can have in my implementation: a "real state" showing where you actually are on screen, and a secondary "imaginary state" I use in the simulation while the agent is imagining a future (there is an internal flag for that in the agent's code).

11) Push the joysticks with the forces contained in the "intelligent_decision" vector. Pushing the joysticks change the state (now the "real state", we are not imagining a future) of the agent -the joystick position changed- so when you ask the simulation "where will I be in 0.1s from this initial state", the anwer will be the next real position of the agent in your simulation, the one showing next on screen.

12) Here you refresh the screen with the new positions -or states- of the agents, using the "real states".

We could also draw the futures lines in the step 3 to see "what this agent is thinking before deciding", in the exe you can switch this on or off and in the video above it was "on", so you could see these blue and red lines create in real time as the agent is pondering its options.

This was the exact moment the image above was taken!

13) You ended a frame on the film. The agent are now in a differnt position. Take those positions as initial positions, and go back to step 1. You have a video of an agent moving around "intelligently".

At this point, you have produced a video like the one I showed at the beggining. Watch it again and try to follow the kart's logic when it decides to make a turn or the other just based on the number of blue and red dots.

Remarks


Please take those remarks into consideration if you plan to really code it:

-Using N (number of different futures of each option) as the option score is a simplistic way to approximate entropy. It doesn't matter too much because we will normalize the scores so the scale is not important. Basically, you are assigning a score of one to all of the futures.

-Instead of N you should try to calculate the actual probability of each of the different futures by counting the "hits" this ending point received -how many futures ended on this rounded position- and normalizing those hits by dividing by 100 (as you imagined 100 futures, the sum of the hits is 100).

OptionScore = Sum for all different futures of (p*Log(p))
With p = probability of this future = hits/100.

-Preselecting the grid size proved to be tricky. In some situations it is better to use a small grid size (when you are in a difficult situation) but other times a bigger size works better, depending on how "close" or "open" is the space you are moving on.Using some kind of heuristic is better than using a fixed grid size.

Forewords


Along with the algortihm itslef, you should consider those two concepts before continuing:

1) The step in witch we deleted the futures with similar ending points was the moment the algortihm became "entropic", as this mean we are using some kind of entropy. It is key for the algortihm to work, so it is always untouched on every version of the algortihm.

2) The moment we decided to use N as the option score we was really saying "all the futures score the same for me". This made the algorithm "goal less" and the resulting psicology was "no psicology". By changing this score from one into something more meaninful is the way to radically improve the resulting AI.

3) If you use Sum(p*Log(p)) instead of N as commented on the remarks, you will be using the best goal-less entropic intelligence available. But all the futures still score "one hit" so the resulting intelligence remain a goal-less and is psicologically flat.

Please also note being "goal-less" doesn't mean not begin capable of incredible things. If you watch the video once again, you well notice how well the kart does it in quite a sliding track!

If you were to let this AI drive your real RC helicopter, it would do it correctly at the first try and keep it up and safely running on a changing environment witch wind changes. It is the ideal autopilot for any gadget or real system as far as you can give it a rought approximiation of a simulator. Quite remarcable in my opinion.

From this point on, everything we will do is change this score of 1 with some different functions and discuss the consecuences.

The final goal of this version 2 posts serie is to change the one with a nice working function based on toy models of real feelings that we supposed the agent has.

No comments:

Post a Comment