Monday, 31 March 2014

Dancing with the danger

As I mentioned in the last post, negative scoring was meant to be mandatory in the AI, so level 7 brought the posibility to use those negativeness for good.

Now we have here the first example of how good negative scoring can be: A new tendency to not going outside the track, even if a big drop is in the edge of the track attracting you to the dissaster.

Before explanations, a first video with 3 karts:

White one is fearless, like all the karts were before, and have an easy tendency to leave the track if it means getting some more drops.

Orange one is the now standar "fear" level. It avoid going out with the same strength as it tends to run longer (more on this after the video).

Grey kart is very scared about getting out of the track, double than it likes running, so this fear some times make it seems stonish, like blocked by fear.

How does it works? I tried quite a few approaches, basically trying to set a number of seconds the kart was meant to get alive (time to die was called), but it didn't work too nice, nor it was very stable: some times it did it right, some other the player got stuck with fear, and others it just went off the track.

Finally, the solution was quite simple, the simpliest one I tried: If racing in the track score with the "distance raced", then if you get out of the track, score this meters raced outside with minus distance raced. That simple. Need m ore fear? Use double negative scoring... but only "fear" levels between 0.5 and 1.5 seems to work reasonably nice, being 1 the "magic number" that makes all work smooth.

I mixed this with the "cut engnes off" that happends when you go out, so the distance raced outside the track depends solely on the "inertia" you had when you leaved it. This inertia is the energy of the "impact" if you should crash on a fence on the track limits, so it makes a lot of sense... once you know it works like a charm.

How good this approach is? Superb, the stability a player with this modified "raced distance" goal is incredible! I prepared this other video with more karts and rockets flying arond ina really narrow track trying to avoid crashing (again, white players don't have any fear, yellow to grey have more fear, being orange players the "standard" ones).

There is quite a lot of movement on this one, but the stability of the AI on keeping all the payers inside -except for the white ones- is really remarkable.

By the way, now the karts have the ability to drive backwards at 50% speed, really nice to get away from scaring hard situations if needed.

OK, the base AI is -almost surely- finish with level 7, and "distance raced" goal is now refined and capable of getting the player inside the track on even the worst conditions, so we are ready to move on into new goals: next video should be about feeling "hunger", or variable strength goals.

Tuesday, 25 March 2014

The new intelligence level 7

For the 3rd or 4rd time, I think the new improvement in the base intelligence algorithm can be the final one, so let be cautious on this: may be the new AI level 7 is the perfect one.

But let's start with the origin: I really need negative scorings on the futures. I previously thought it was an abomination, and in a philosophical point of view still is, but it is mandatory if you want to have an real AI, one that can compite to acomplish a goal, to beat an oponent or to get as many score as possible: to have an intelligence that is usefull on optimizing, on game theory, etc.

Why it rendered mandatory to have negative scoring will be covered on a next post, now I will just introduce you the level 7, and we will start with a visual comparasion with some previous intelligence level on my old and good "test circuit".

For this occasion, I rescued from the old days 3 levels to compare with level 7:

White kart use level 3, so it score an option with N different futures as k*Log(N). It woud be the real entropy of the option if all futures were equally probable, or would dissipate the same amount of energy, but it is not the case, so you will notice white kart is a little irresponsible. In a given point, if get trapped on a hard situation in with all options have aproximately the same number of futures, not of the same length, but it doesn't matter on this level 3, so it get totally stopped. Pathetic.

Yellow kart uses venerable level 5, meaning it uses the square of the raced distance to score each of the N different futures found for an option, and the option's score is just the plain sumatory of the N futures' scores. It fits 100% on the entropy definitions, but there isn't a firm candidate to a "most intelligent decision based on its option's scorings" on the theory, so as I explained on this post about level 6 AI, there can be room for enhacing a little more the AI.

Orange kart use the ugly level 6, it is not a real entropy formulae but a strange mixture. It was able to make faster decisions on Y-shaped bifurcations at the cost of being not efficient on all other cases. I have totally abominated from this level 6 invention, but I rescued it here as level 7 tries to fix the same problems that level 6 was suppose to fix: slow reactions of the intelligence on some not-so-clear situations. It doesn't match level 5 as you can see, so not much of a problem to discard it on all future tests.

Red kart finally sports the new level 7 intelligence. This level 7 intelligence, compared to level 5, just change one little important thing: once you have all the options of one free param scored, and before normalizing them, just get the minimum scoring and take it from all options' scores, then you normalize. It means an option could score negative, as this negative will be taken from all options, making this one to score zero. Internally I call it "ZeroedScores" as it makes sure the lowest options score will be allways be zero, while the sum of them remains being 1 as in the level 5.

They four run without colliding with the others, so you can compare each one's behaviour, but after the little white kart get stuck on a difficult part, I decidied to switch collisions on to see how the other 3 karts managed to  go throuh, and I have to say red one did impressed me: it decides to crash with the stopped kart to make him room! The yellow and orange ones are unable to decide anything and stop along with the white one. No doubt after this red one is really better!

Level 7 really works nicer than any other, and even if it was specifically crafted to be usefull with negative future scores, it makes it way better than level 5 on this case with all futures being scored positive. The side effect of lowering the lowest scores down to zero is that, the rest of the scores grow proportionally once they are normalized, so you get a quickier intelligence that doesn't let hard decisions for the last moment, but this time without breaking any entropy law!

So in all next videos, the default AI level will be 7, and if this time I am right, no more levels will be neccesary... but it doesn't mean we can get way more of the AI... we have goals to play with!

Friday, 21 March 2014

No more suicides

Last video was quite impresive, a rocket flying inside a cave at full speed, but for me it was really disapointing: it showed up quite clearly a suicide tendency in the AI.

As you can see, the rocket leaves the track 2 or 3 times as it gets too fast to stop before crashing, but if you look closely, it happends that the crash WAS not impossible to avoid, not at all, but somehow the rocket put hands down, stop fighting, give up and let itself crash without even trying to scape. Why?

As I commented on the last post, I considered usign a negative scoring tendency to "keep alive" the player: if you die before 2 seconds while imagining a future, then give it a negative scoring so the player will actively avoid it. It was a really desperate try, as score is internally an "entropy gain", and allowing negative values is like allowing entropy to lower with time... it is a physic abomination, and I am really happy it didn't work out in my tests (I didn't finally use negative scoring but something as ugly as this).

It was not the problem, nor it was necessary or even healthy for the AI to use negative scoring, as the real problem was... a severe case of  poor imagination: too narrowed imagination gives the intelligence a clear tendency to commit suicide... again, sure there is a deep psicological lesson on this!

When I start imagining a new future, first thing is deciding witch free param (degree of freedom) I will change, then choose a value from a set of possible decisions. In the case of accelerating/braking a kart, for instance, I choose one value in the set (-25, -5, +5, +25) but later, in the rest of the future steps, when you have to randomly decide how to move the accelerator, I decided -shame on me- to narrow down the choices to a random value from -5 to +5: I didn't allow the AI to imagine "rude" futures, just smooth ones.

So now you have a kart running at full speed, it suddenly detects a wall in front of it, and start thinking on its options. When it comes to analize the option to "full brake" (accelerator -25), in all the futures it imagines after this first move, the AI is not allowed to continue "full braking" to -25, just at -5 rate, not enought to stop the kart at time, so as it can't imagine keeping the foot on the brake until it full stop, the kart is unable to find a solution, nothing it can imagine off will save its life... so it does nothing, just wait for the unavoidable.

Solutions: Use the full spectrum of possible decissions, from -25 to +25, instead of -5 to +5: when it comes to take this random decision, use a gaussian (normal) probability function with mean of zero and std. deviation of 5 to get the random value. It will range from -25 to +25 because it is hard limited, but will tend to get the more values in the range -5 +5.

After this change, the "spectrum" of red and blue futures that a player draw in front of it will visually spread and get quite wider as you can see on the video: the player is considering doing some more "extreme" drives than before.

Here you have a set of rockets playing around with this little change on. Now they never give up and let them crash, given a much more solid behaviour: now I would dare to let my own brand new rocket to one of those AIs for a walk with no fear... well, only if the gas was for free as in the simulaton!

The funny part is that it was already present in the code as an option, but switched off. I made some tests with this idea far in the beginings, but may be the AI was not still ready for it and really worked awfull, so it was just switched off.

May be -only may be- the base intelligence is finished... if it proves to be true and it works out as stable and optimized as it seems, I could move on into more interesting uses for the AI (team thinking is on my list).

Wednesday, 19 March 2014

Driving a rocket inside a cave!

Version 0.7 of the software came with a great generalization of all the parts that conform this AI (at the cost of a noticeable drop in perfomance) so it is now possible to use a base class of  "Player2D" to create a brand new kind of vehicle quite easily, with a minimun amount of code: just define the vehicle params, its simulation code, the drawing stuff, and thats all.

It was then time to try with a brand new creature and compare it with the old known kart. I decided to code a classical rocket that travels the circuit as if it were a vertical cave, with a gravity force downwards that only apply to rockets, and see how the AI deal with it.

I have to admint I made it way too powerful, and given that the AI will try to maximize entropy, it is short of natural that it will drive the rocket as fast as it can. Notice how much it likes to spin around as a way to hover around a place before it decides where to go: the quickiest it rotates, the more future options it has, as it can leave the spinning to any needed direction in a short time. That is way spinning is a nice option to its eyes.

So here you have a rocket and a kart trying to get sweet drops from the circuit, look carefully at the rocket as it enter the narrowest parts of the circuit, it is amazing how good the AI is managing such a difficult to master kind of ship:

Having two kind of vehicles so different from each other traveling the same circuit gives you a better view of the AI, you find some strange behaviours on the AI that, on a kart, for instance, doesn't show up so clearly.

Watching the rocket fly out of the track because it tried to get a drop so blindly it didn't noticed that, after picking the drop, the crash with the circuit fences was inevitable, I wander if there is a magical set of goals a creature need to have in order to behavie in a consistent manner.

In this example, both players have the same set of goals: you get scored by racing more meters and by getting sweet drops. It sonds reasonable if crashing were not so bad, but crashing the vehicle is actually scored as zero, it is not scared of dying at all, and may be it should.

If I could add a new goal to both, a goal about surviving at any cost, I think the rocket would be quite more conservative some times and avoid some suicide rides that usually end outside the track. But how?

In the actual implementation, negative scoring is not allowed. The raced distance is allways positive, and so it is the drops collected scoring, and so on. Negative scoring is implemented with coeficients in the range 0-1 that are multiplied together to get a score reduction coeficient, that will be aplied to the sum. of the positive scoring (raced, drops collected, etc).

When you crash with other, for instance, you can code it so both players' scorings are reduced with a coeficient depending on the energy on the crash, so bigger crashes decrease this future's score quite a bit, making this option less appealing to the player.

But being afraid of breaking the toy must compute as a real negative scoring somehow, if going left makes you die in a second, then scoring left option with a zero is not enought, it should, in emergency cases, score negative, meaning that going right will increase its own scoring with this left negative scoring converted to positive.

So it is my next goal, being able to deal with negative scoring and then adding a goal "try not to die in the next N seconds", that will score negative any future that involves dying in less than N seconds.

This new "goal" will trigger when, in some of the future's steps, the "dead" property of the player changes from "false" to "true". In this "dying" moment, the "goal" will retrieve from the future its "time raced" or time length (elapsed time from the future's start until now) and compare it with the N seconds you want to keep alive.

If T = "time raced" if smaller than N, I will use T/N as a parameter (that will range from 0 to 1) and convert it to negative scoring in such a way that, for T=N (T/N=1) the scoring must be 0, but as T drops to 0, the scoring must tend to minus infinty or, to be conservative, to some floor scoring like -5000 or -10000.

Log(x) is a good candidate, as Log(1)=0 as we wanted, and Log(0.0001) is a big negative number, we just need to avoid asking for Log(0), so my best candidate so far is:

Score = Log((0.001+T)/N)

May be tomorrow I can show you how good or bad this idea was in a new video, and if it works as spected, may be it will be added to all the players as part of its "standard AI": a goal for raced distance, and a goal for trying to keep yourself alive for at least 2 or 3 seconds.

Monday, 17 March 2014

Mixing goals

Next video shows 36 karts driving around with 3 goals each one:

1) Move fast.
2) Pick up the drops.
3) Store the drops.

The internals are easy:

1) When you drive N meters, you score N.
2) When you drive over a drop of size 5, if you have internal storage left (a kart has a capacity of 100) you "upload" the drop and get 5 points into your scoring.
3) When you drive over a "storage" rectangle, if it still has free space, your internal storage is "downloaded" into the storage and you receive scoring for it.

The result is something like a group of ants collecting food and accumulating on some spots:

Basically we have modified step 1.3.3) in this pseudo-code:

1) For each option I can take (going left "value = -5" or going rigth "value = +5")
1.2) Imagine you take this option and simulate so you calculate the ending point after 0.1s.
1.3) From this point on, imagine 100 futures of 5 seconds by iterating:
1.3.1) Take a random decision (value = -5 + random*(-5+5))
1.3.2) Simulate so you get your new position after 0.1s more.

1.3.3) Calculate the micro-scoring of this movement by adding all goal's scoring (raced distance, drops collected, downloads, etc) and accumulate this scoring, so, when you reach the end of this future, you know its "scoring".

1.3.4) Stop when you have simulated 5 seconds in the future.
1.4) Now round the final points (x,y) of all 100 futures to a precision of 5: x=5*round(x/5)
1.5) Discard futures with the same end points so you end up with only different futures.
1.6) Score this option with Score=Sum(future.score ^2) for all the different futures found for this option.
2) Normalize the option's scores by:
2.1) Calculate AllScores = the sum. of the scores of all the options
2.2) Divide all option's scores by AllScores.
3) Intelligent decision = Sum. for all options( value * score )

Wednesday, 12 March 2014

Garbage collectors

In the last post I showed you "motivations", after it I renamed it, on the code and on my mind, to "goals", much shorter and general!

But in this previous video there were only "personal goals".

Each player (or kart) had its own set of motivations, and that is why only orange kart was able to eat the orange sweet drops on the track: only this kart was able to "see" them.

This time I have added "team goals", so a goal is shared by all of the players in the team. Now, if I add a "sweet drops" goal, all the karts will fight to get the sweet drops on the track:

Along with this video I am also adding V0.8 of the software along with source code (delphi 7) on the "Downloads" page (on top of the blog) so you can digg into the "TGoal" class and find how it was designed.

Goals are still under heavy revision, so spect goals to change in next version, but the "general AI" is fininshed, level 6 is the last in the list, so all this part is finished.

So what is next?

1) Add other kind of "goals", negative ones, like trying not to run out of health (keep alive) so crashing with another player can be "dangerous" and avoided actively.

2) Design sets of predefined goals so I can have some karts to act as "sweet drops collectors" that accumulate sweet drops in a central storage, etc.

3) With all this, I would try to get an scenario where two teams of karts fight for collecting more drops. May be adding more kinds of karts (warriors, sweet defenders, etc) would give me finally a terrarium simulator with "ants" on it fighting for survival... or similar.

4) Kart (ops! ants) would get old and die, and could pair and have a baby kart with mixed params from its two parents... I could make the ants to evolve to better forms of "drop collectors". An evolutive algortihm.

I don't know if I will reach even point 3 of the list without being atracted by other new possibility, there are many ideas I want to try out.

Tuesday, 11 March 2014

Introducing "motivations"

The intelligence at level 5 is quite nice, almost perfect, or may be perfect, but there is something we can't control: the goals.

We never knows what the AI will decide to do to solve the puzzle we present to it, nor we know witch goal, if any, will it follow.

This is quite nice to have something like this, it could react to unespected scenarios as if it were used to them, but it would also be lovely to be able to "drive" the curse of action towards some more mundane goals: domesticate the intelligence and make it follow our likings.

Redefining goals will make this AI suitable for optimizing -in any sense- any system you could define and simulate, in a "intelligent way". Anything. Amazing.

And how do we get that?

Back when we defined in pseu-code the level 5 intelligence, there were a line 1.3.3) where I asked you to keep a sumatory of all small distances runned in every tick, so once you reach a future, you have the "raced distance" calculated.

This "raced distance", when introduced to the intelligence back in the post about entropy and energy, gave the kart a great bust in velocity: compared, it was like having speed in the veins!

This method to score futures with the sqared distance raced was neutral as it appears on the entropy definition, but using "distance raced" as the way to measure it all, it was not really on the formulaes, not this way.

What I mean is that, in general, after a small time simulation, your kart has passed from a situation or state "A" to a new state "B", and all we need to make the AI work, is provide it with a formula to measure how much have we earned when moving from "A" to "B".

Using the euclidean distance between position "A" and position "B" was a natural election, given I was thinking on racing karts, but any other function Score(A, B) could have done, as far as the resulting score is always a non negative number (this is important, no negative thinking is allowed) like it happends for a distance.

So this is the ultimate point in the algortihm we can twist to make the kart, or witchever you simulate, to follow our orders, orders that need to have the form of a function Score(A, B) as mentioned.

Let see first what can be achieved with this method:

In this video you see 3 karts, white one is the usual one, his only "goal" in life is to run as far as possible, it uses score = raced distance, as in AI level 5 or 6.

New things start with yellow kart. This ones has two "goals" in life, running far as the white one, plus another one, represented with a green line, that score how many degrees has traveled the kart counter clock wise, and add this to the future's score. Remember, always positive, so if the kart drive back, clock wise, it will not socre negative, just will not score at all.

The efect is that yellow kart has a natural tendency to rotate around a central point, so tends to drive in spiral to the central point, as if it were pulled into a wirpool. This tendency is mixed with the racing one, so it tends to run faster and around the central point.

And orange one, this one is fantastic! I added 100 drops of "score", so when the orange kart passes over one drop, all its points are added to this future's score. Just this. And look how it brakes before a drop not to pass by, take it, then goes for another, it is like a bee or an ant loking for food.

In future tests, I would like to simulate more behaviours by introducing new kind of "goals", for instance, orange kart could get extra score for taking the collected drops to a central storage!

And at last, I have managed to comment on the blog about what I am working on now: goals! This video above is from todays' version of the software, so it is hot.

I left for other post all the physics behind this way to manipulate the AI, because we are still using plain entropy, no matter how adapted to our need it is, there are restrictions that made us remember that it is all about is entropy.

So now that you are up to date, I can tell you a little on what I am planning to play with next days, and please, any ideas about how to continue from here will be much wellcomed!

So, this new "goals" are in my mind now. First thing, I need to make a "stack" of goals, so I can mix a bunch of them in a single intelligence without much effort, I already have been thinking about the kind of goals I will need, may be there is 3 or 4 main types, and the rules of stacking are clear from months ago, so I need to code it and think about an scentario to test it properly.

Next thing I am working on is "Team intelligence", make all the karts to think as one and behave in the best way for the group as a whole. May be this will end up being "team goals" that accumulate some way with the own player's goals... some of this is in the actual code, but it is not usable this far.

It would be nice to emulate a terrarium with different creatures and make them figth for survival... add it a little evolutive algortihm and you have an way to incubate more capable intelligences. Use a real scenario and the algortihm could find a better tool and clever ways to use them on this scenario, starting with the actual tool used for this purpouse.

But all this is just future, so I will now focus on stacking goals together and play with it (hey, it must be the decission that opens more possible futures on how to evolve the AI, isn't it?)!

Beyond entropy

Level 5 of intelligence seems to be reflecting the actual definition of entropy on the original paper, so before going any further, we will write it in pseudo-code and embed on it the example of the kart seen in video 1 entry:

1) For each option I can take (going left "value = -5" or going rigth "value = +5")
1.2) Imagine you take this option and simulate so you calculate the ending point after 0.1s.
1.3) From this point on, imagine 100 futures of 5 seconds by iterating:
1.3.1) Take a random decision (value = -5 + random*(-5+5))
1.3.2) Simulate so you get your new position after 0.1s more.
1.3.3) Accumulate the small raced distance so you know the "raced" distance at the end.
1.3.4) Stop when you have simulated 5 seconds in the future.
1.4) Now round the final points (x,y) of all 100 futures to a precision of 5: x=5*round(x/5)
1.5) Discard futures with the same end points so you end with only different futures.
1.6) Score this option with Score=Sum(raced ^2) on all different futures found
2) Normalize the option's scores by:
2.1) Calculate AllScores = the sum. of the scores of all the options
2.2) Divide all option's scores by AllScores.
3) Intelligent decision = Sum. for all options( value * score )

The algortihm looks pretty simple, and apply the most accurate entropy I could get, deciding on the averaged options as in the paper... and it is so general and compact one would think nothing more can be done to make it better.

But there are still some grey parts on the algorthm, for instance: Why, once you have all options scored, you do use a weightened average on them to get the final decision (pseudo-code point 3)?

It is a simple way to do it, and the paper uses it... but in some ocassions, it is some how far from being optimal!

Imagine you are the kart, and you have an Y-shaped bifurcantion in front of you. Your options are -15 (left), -5 (little left), +5 (little right) and +15 (right).

If you go left of right, you drive into one of the two bifurcation's arms, so you get a nice 0.4 score on those two options. May be you found 5 different futures on each of those two options, and each future had a length of 4, so score = 5*4^2 = 5*16 = 80.

But if you do a little turn, left or right, you crash into the corner and so your score is lower: you find less different futures (let say 5) and they are shorter (let say 2 meters), so score = 5 * 2^2 = 20.

So you have scores of (80, 20, 20 and 80) for options (-15, -5, +5, +15), if you normalize scores dividing by 200, you get pairs (option.value, option.score) like (-15, 0.4)(-5, 0.1)(+5, 0.1)(+15, 0.4).

If we plot these four points, and use a bicubic spline to interpolate scores for other options not considered (not -15, -5, 5 nor 15), this is what you get:

And here you can see the problem: the averaged value or intelligent decision is: Averaged_Decision = -15*0.4-5*0.1+5*0.1+15*0.4=0, yes, ZERO, the worst possible one!

In the grah. you can see that 0, the averaged decision, have an interpolated score of about 0.07, while
-15 or +15 have interpolated scores of 0.4, then, why do we choose 0?

In a real kart, it means that, as it approach an Y-shaped bifurcation, as turning all left or right is equally good for it... it does nothing and continue right into the corner in front. Usually, on getting too close to the corner, some small difference between left and right will make the kart final decide to go left, but too late to make a smooth drive: it needs to brake, then turn left. Not optimal.

I tried using the max. value to decide, but it was way too agressive to be usuable, then tried miximng both averaged and maximum values 50%-50%, or 80%-20%... eventually it started to work out somehow, but I needed a way to only activate this "mix it with the maximum decision value" only on Y-shaped situations, as in the other cases using the averaged decision was smoother and better.

By comparing the averaged decision's score with the maximum score on the graph I managed to make a smart averaging that I now call "AI Level 6" (don't worry, I stop on level 6!).

So now you have the averaged decision value (AvgValue = 0 in this example) and its score (AvgScore = 0.07 in the example) and the maximum decision value (MaxValue = -15 or +15, let chose -15 for the example) and its score (MaxScore = 0.4 in the example), so you make the level 6 AI "refined decision" by miximg both values:

AvgCoef:= Min(1, 1 - ((MaxScore-3*AvgScore)/Max(0.001, 3*AvgScore)));
Level 6 decision = MaxValue * (1-AvgCoef) + AvgValue * AvgCoef

Let see how it behaves, in the next video, yellow kart use level 5, whiel white one uses level 6. Notice white one is more "nervous" than usual:

This formula is not THE formula, it is just one that works not so bad and that I choosed to apply for Level 6 AI. It is not always better that the level 5 one, on smooth circuits with slow karts, using just entropy with "AI level 5" is marginally better, but when things get tought, karts has more power they can use, or track is not so smooth, then having level 6 on makes the kart to react to hard situations somehow earlier than standard.

So I recommend you stick to "AI level 5", and just keep in mind that, in some cases, it may be better to help the AI take fast decision by adding some spices like this level 6 I have tried here.

Friday, 7 March 2014

AI level 4 and 5: Entropy and energy

In this video, yellown kart is using "level 3" intelligence so it score an option with N different futures with Ln(N), and it would be ok if all futures were equiprobable, but they aren't.

In the case not all futures are equipropable you have to switch to another, more complex, way of calculating entropy.

When you have N microstates but each one has a different probability of happening, call it P(i), in the instantaneous or "clasical" entropy, we use:

S = Sum. on all possible microstates(P(i)*Ln(P(i)))

If it were about a gas molecules, the Ln(P(i)) part will correspond to the momentum of the molecules, so translating it into our case, it correspond to m*v, the momentum of the kart. As m is the mass and it is a constant, we can forget about it, and only v remains, the kart velocity.

But we are talking on futures, so we need to integrate v -the kart velocity- over the path that followed the kart, and as we use constant time deltas to construct the futures, on each step v is proportional to the raced distance, call it x.

Integrating x over a path gives you r^2/2, with r = length of the path, so using r^2 seems to me like the perfect way to mimic the real entropy on the AI model (it may be wrong, not sure I really understand all the inners of the entropy at this level).

So lets compare and see!

Yellow kart: Use Level 3 with score = Ln(N).
Grey kart: Use level 4 with score = Sum.(r) (r = raced distance)
Black kart: Use level 5 with score = Sum.(r^2)

In my modest opinion, level 5 on black kart properly reflect the original paper formulaes for future entropy, and makes a really great job as a driver.

Can't we do it any better? It is a perfect AI? No, it is a nice and general algorithm, but there are still some aspect of it that can be tweaked for better, not many, but some.

AI level 3: Using real entropy

Intelligence "Level 3"
Looking back at level 1 and 2 of our AI, you can notice that in both cases we are scoring each option using just a count of the different futures we were able to find.

But this is not a real entropy definition!  If a macrostate has N possible and equiprobable microstates, then its entropy is not just N, it is:

S = k*Ln(N)

As k is constant, we can forget about it, and so instead of using N to score each option as in the first videos, we now use Ln(N) for the orange kart:

Here we have tree karts, each one using a different entropy formula:

White kart use "level 1", the count of futures not ending on a crash. This is not a real entropy aproximation, but is left for personal nostalgic reasons. As you can see, the kart some times freezes, it is because many futures have been discarted and too little were left to really make sense of it.

Yellow kart has "level 2" AI so it is using N, the number of different futures, to score its options. Not really a entropy formula again, but quite near, and works ok.

Orange kart is on "level 3", it is the one doing it near right, it uses Log(N) to score each option and, if all futures were really equally probable (they are not, longest runs are more difficult to get that short ones) then we were using THE REAL ENTROPY to drive the kart.

Only facing the fact that the futures are not equally probable we will be able to make this AI better, much better, but this will be on another entry.

Sunday, 2 March 2014

Intelligence Level 2

Video 4: Intelligence level 2

In video 1 we commented on the simpliest way to implement the entropic intelligence on a kart: count how many different end points you have in the futures that start by chosing "+5", and compare with the number you get for choice "-5", then average and take this as your decision. We will call this "intelligence level 1".

But as simple as it seems, almost every aspect of the AI explained on video 1 can be redefined so the driving of the AI "looks" more natural, as if the driver were a real driver doing his best.

By the way, chosing a kart simulation as a test-bed for the algortihm as proven to be a really good choice, as it is very easy to just observe two kart driving side by side, each one with a different version of the AI, and tell witch one of them was doing a better job. It wouldn't have been that easy with another simulation.

So, steeping over video 2 and 3, that just show intelligence level 1 solving different circuits -you can watch them on the "YouTube" link avobe- we jump to video 4, the first one to really level up intelligence to level 2:


In the first video (level 1), I chose to only count as good those futures that didn't end up on a crash with the circuit limits, and I commented it was a really bad decision, so now we come again at this point.

Discarding those futures was a bad decision for a couple of reasons. The most evident one is that as you increase the number of seconds you simulate to get a future, the more futures will end up crashing, as the futures are found by simulating a random drive, and long random drives tend to easily crash. So using 20 seconds was a silly way to discard all the futures and have a kart unable to take any decision at all. This was bad.

But the second reason is more general: when calculating an entropy you have to count all possible cases, you can not discard micro states just because they look ugly to you. By doing so, you add some human stupidity on the AI, and you get stupid decisions.

In this video 4 avobe, all futures are considered as valid, and crashes near the initial position count as much as another future in witch you manage to run all 10 seconds without crashing. Still it is not fair, but as you can see, now the kart is much more agile driving.

I used to make an joke with my wife -she is psicologist- about how I become some kind of weird "kart's psicologist", as I spend a lot of time thinking about why did the kart drove this way instead of doing it right.

In thi case, removing the futures where the kart crashed is equivalent to being afraid of dying for a human. Well, kind of. And counting on all futures without judging them made the kart to drive touching the circuit limits instead of avoiding the limits at any cost.

Sure there is some deep lesson on this!

Also, in this case the AI have two "degrees of freedom", or joysticks, to move: AI was able to turn left or right as on video 1, but also accelerate or brake at will.

One thing that stoned me at first, when I introduced the accelerator to the AI, was the kart initally deciding to accelerate fast and start running. Why? I didn't tell it anything about running being a goal! The AI decided to do it by itself.

It is clear when you think with entropy in mind: if the kart stay still, it only have one possible future, being in the same place. But if you push the accelerator now, in 10 seconds you can be at many different ending points. So if you make the numbers, accelerating will has a much biger weigth that braking.

So, level 2 of the intelligence is as level 1, except you can not discard any of the found futures, and the resulting AI is, visually, much more near to doing it right at driving... but still far from optimal.

Saturday, 1 March 2014

The "Entropy"

Before going any further on the algorithm itself, we will stop for a moment on the real meaning of those "causal entropic forces" the algorithm is based on.

This is a little technical -but quite interesting- and I will try my best on being easy to follow, but feel free to pass on this and focus on the algortihmic-only articles if you want. You will get as much knowledge of the AI as you will need to apply it, but be warned: when it comes to defining your own system, adjusting the params of the AI and polishing the way you measure how better a new situation is compared to a previous one, the understanding of the underlaying physics will give you an extra insight on the proccess and will help you pin-point the weak points on your implementation.

Disclaimer: I am not a physicist, just an oxidized mathematician and a programmer who loves reading about the subject, so please be benevolent when commenting! I just pretended anyone could have a clear picture of the concept itself and the extreme power under the nice sounding word "entropy".


Entropy is a very powerful phisical concept, it is behind almost all laws of the classic phisics. It is quite simple in its definition, but almost imposible to directly use in any real world calculation.


An informal definition of entropy could be like that:

Imagine you have a transparent ball of some radius, now you fill it with a hive of 100 flying bees. In the jergon, this is our "system", and as in this case, it is supposed to be "isolated" from the rest of the universe.

From the outside, you just see a ball filled with a lot of bees. This is the "macro state" of the system, what you can see of it in a given moment.

But if you place a fixed camera near the ball and take a serie of consecutive images, when later comparing them, you will find that most of them are quite different from the others. Bees are in different positions if you freeze time and take some shoots. Each one of these "different images" is called a "micro state" of the system. This is like a fine detailed version of the macro state.

If you use N for the maximum number of different "images" -or micro states- you could get (apart from the megapixels count of your camera and how you define "different", it is supposed you use some given precision in your readings), this N is, basically, what physicist call the "entropy" of the system.

If now we reduce the radius of this transparent ball, bees will get nearer from each other, and the number N of different images you could get, will drastically decrease. When radius reaches a minimum, bees get so packeted they just can't move. In this situation, you will only have one single possible picture or micro state. They will be with maximum order and minimun disorder, or entropy.

The actual formula for the entropy of a macro state with N possible micro states is S = kLog(N), but there is an alternative formulation and well, it is not so important in this history, just remeber this: the more different "images" you can take of your "ball with bees", the more dissordered and the higer the enthropy will be.

Entropy and physic laws

As far we have seen just the mathematical definition of entropy, now I will try to show you how important this concept is, starting with physics.

One of the most important laws in thermodinamics, the second law, talks about entropy: "entropy of a isolated system always grow". What really matters here is that it will never decrease: any real physic system will evolve, never mind witch laws it has to obvey, in a way its entropy -or dissorder- will always increase with time.

This is powerfull, but there is a much more refined version: a system always evolves in exactly the way its entropy grows as much as possible at any moment.

As an example, if you have an empty tank connected to another one filled with a gas, and then you open the connection, the gas will flow from one tank to another in a way you could calculate by using know flow theories, and you will get the aproximate evolution of the gas.

But if you could calculate, from all possible small changes that could happend to the gas in any given moment, exactly the one that would give you the higer possible entropy, then you would know exactly how the gas will evolve, as any system evolves exactly chosing this path that maximize entropy creation at each little steep. And the most interesting thing is that this is true for all kind of physical systems.

Note: technically it applies only to "probabilistic" systems, meaning they are formed by a miriad of small parts (macoscopic systems in short). It does NOT apply to small nano systems with a few particles as the ones in some quantum studies where entropy can decrease some times (here the same new in spanish).
So it is said nature tends to make entropy grow as much as possible on all ocasions, and this tendency, in some way, is what makes the classical laws of physics to emerge when you see the system from a distance.

It is very important to note that, when here we talk of tending to the state of higer entropy, we are talking  about inmediate growth. In no way nature "knows" witch entropy the system will have in some seconds. It is a short term tendency.

In this history, what we called "tendency", can be understood as a kind of new "force" that emerge from nowhere and physically push the system toward this state that maximizes the entropy in the short term.

This magical force is thecnically called an "entropic force", as opossed to a "physical force" we all knew of.

Calculating physical forces in a system is easy with today's computers, but computing the entropy of a real world physical system is just dreaming, not to talk about calculating the path at witch the entropy grow quicklier, that is needed to get the "entropic forces". But, if we knew how to calculate it, then using the first or the second set of forces, physical or enthropic ones, should give you the same results.

Entropy and life

So nature always tends to make entropy grow up, ok, but some way nature did managed to fool itself with a curious invention called "life".

Life is not easy to define, but my favourite one is about entropy, again, and can be expressed this way:

When a "subsystem" -an ameba for instance- that is inside a bigger system -the ameba and its environment- is capable of consistently lowering the subsystem entropy, at the logical expense of "pushing" it away and into the environment, it is said that this subsystem is alive.

It is not a complete definition, and may be your phone can do this too, but all the things we assume are living creatures follow this rule, while "dead" systems do not. There are always things in between, like a virus or a robot with solar cells, but definetly a cell keeps its interior clean and tiddy while poluting its near environment with heat -that increments the entropy on the medium- and detritus.

Humans just do this with the whole planet in a more efficient way. We are, in a planetary scale, alive.

Entropy and intelligence

So systems gain entropy with time, while living things manage to lower their internal entropy somehow, but what makes different a live being acting intelligently, as opposed to a dumb one?

Some studies, as the paper that inspired me, has suggested that deciding on the only base of having more reachable different futures, is a quite nice strategy to generate "intelligent behaviour" on a system.

The idea is simple:

Suppose you have some possible bright futures ahead, and you have to decide at each moment among several options, let say only two to make it simple. Your decisions will make you reach one future or another, so the question here is how to decide about doing "A" or doing "B".

So if having more accesible futures is a good strategy, then I could count how many of those brigth futures that I can imagine start by chosing "A", and how many start by chosing "B". The option with more futures is your best bet, and as long as you didn't hide some not so bright futures in the proccess, the decision will be basically right.

This is almost exactly what the kart AI uses in the video 1, so you can have a look at this simple idea applied to a kart, if you haven't done this far.

Surprisingly, this sounds quite similar to the "usual" entropy definition!

We can actually define this kind of intelligent behaviour as a natural tendency to always chose what maximizes the number of reachable different futures.

So if we unofficially call "the entropy of your futures" to the number of different futures you can reach at a given moment from your actual position (well, technically we changed different "microstates" found in the present time, with different "macrostates" found in the future in the classical entropy definition), then, if you were an intelligent agent, you would tend to change your position in such a way the "entropy of your futures" grows as fast as possible.

This tendency any intelligent agent will have to increase its number of reachable futures, as in the physic previous example, can be seen as a entropic force than, some how, pushes the agent to do what it does. When this behaviour is seen as a force, as this algorthim does, this force is called "causal entropic force".

Please note this is not exactly what the paper suggest, nor it is a real "entropy", it is just an over simplified version, that surprisingly works quite well. In the same way as, in the "usual" entropy definition, we used kLog(N) as the entropy value, here, the entropy of the reachable futures is a little more complicated that just a "N". In the paper it takes the form of a double integral over a path of an expresion using probabilities of the different futures. Quite intimidating, but we will come on this on a later article.

What is really remarcable about this idea is that, in the same way the growth of the entropy could explain all laws of nature, if you code it right, this algorithm would decide intelligently about anything you place in front of it, whatever this means for each particular case. This is truly remarcable if it proves to be true (just showing an intelligent behaviour in some tests doesn't prove anything so general).

That was all about entropy, a lot of important details were intentionally left behind as I warned you, but I hope you now have a somehow complete view of what this algorithm is based on, and how powerful this could be if we manage to generalize it and apply to very different systems.