Thursday, 27 February 2014

Video 1 - The basics.

Before this blog I used to publish videos on a Youtube playlist about how the AI was better on this or that, it gave me a way to get comments on the subject.

Now that the blog is up and working, I would like to start by reviewing all those "old videos" and give an explanation on how the algorithm was working by this time, how good it was and witch things needed to be changed.

So today we will comment on the first video, it is the most important one to understand in order to get the whole idea of the algorithm, so read carefully and comment on any aspect you feel is not cleared in this post, I will try to help my best.

So start by watching the video:

As you can see, we have a simple case: a track with a kart on it. The kart is also quite simple, it is always accelerating (it has no brakes) and you just have a joystick for left/right turning (it is said that this system only has one degree of freedom). The AI must decide where will it "push" the joystick on every frame, that's our goal.

1) Options = Possible decisions

First thing you need in order to make the AI, is to have a list of witch "option" or "decisions" you are going to ponder. In this simple case, we will consider only two possible options/decisions: Push the driving wheel left by adding +5º, or push it right by substracting -5º. In the video, each one is represented by red or blue lines.

2) Create random futures for each decision

Second thing to do is to "imagine" a bunch of, let say, 100 futures for each possible initial decision.

For instance, you consider the decision "pushing +5º ", then you simulate a frame (in my case, a frame is 0.1 second time) with this "push" working, so simulation code will tell you where the kart will be after 0.1s of pushing driving wheel +5º. It will be a little ahead and left form actual position, and surely a little rotated counter clock wise.

From this "lefty" possition, you continue simulating until you get to time+10s (those 10 second is a parameter, how far in the future you want to simulate), but on all subsecuent frames, instead of "pushing +5º", you will chose the "push" randomly in the range +5 to -5. This makes the kart to drive randomly, and it correspond to the blue lines on the video.

You repeat it 100 times, so you end up with 100 blue lines, representing 100 possible futures that start by turning +5º left the kart. Those 100 blue lines form a "blue flame" in front of the kart as you can see in the video.

Now you repeat with the second initial decision, pushing -5º to the right. In the same way, we get 100 more futures, all them painted on red on the video.

3) Counting different futures

With all those futures found for one of the two possible decisions, now we need to discard the similar ones in order to get the list of the "different" futures that taking this decision could bring to us.

For doing this, you need to know witch of the kart's parameters are going to be considered "important" for comparing ending position. I decided to only use the position of the kart (PosX, PosY), and rejected to use the angle. For considering "similar" two futures, their positions, rounded to a given precision, must be equal.

So, if future #1 ended up at position (234.4, 187.0) -here I use just pixel coordinates- and I am using a precision of 5, it means this future "roughly" ends at (235, 190), and this rounded position is the one to compare with other futures in order to get a list of different ones.

So, after discarded duplicated futures, may be you end up having 35 different futures for decision 1 (turning left 5º) and 15 for the other decision (turning right -5º).

4) Deciding

Well, everything is ready, the AI only needs to decide by averaging the 2 possible decisions (+5 and -5) weightened with the number of different futures each decision had, compared with all the different futures found so weigths sum 1.

Decision = +5 * (35/(35+15)) -5 * (15/(35+15)) = 5*35/50 - 5*15/50 = 5*0.7-5*0.3 = 5*0.4 = 2

So, the "intelligent" decision, in this case, is turn +2º to the left!

5) Loop on it

Well, the work is done, now you just simulate the kart after applying the decision and the kart moves on screen. You are again ready to go to 2) and start over again from this new starting position.

Thats all the algorithm is doing: counting red and blue different dots and heading left if there are more blue dots than red ones.

Highlights

-A future that crash with the fences was supposed to be a "bad" future, and its final point is not even draw, only non-crashing futures are considered as valid ones. It was a really bad decision from me, we will come back to this in future posts.

-Each different future found count as 1. A longer future (one that takes the kart far away from its intial position) count as much as another future where the kart crash after running break and only race for 1 meter. It is simple and work, but it is far from optimum. Scoring the futures will be one of the biggest improvements in next versions.

1. Thanks for that great introduction. Very interesting. One question: when you simulate the "random path" form step 2 onwards, do you assume changing by exactly +5 or --5 at each step, or do you assume some number between -5 and +5. I.e can we randomly select -3, or +4 etc?

1. Only in the first step you need to chose +5 or -5, in the following random steps, you take a random value between -5 and +5.

The idea is that this first step need to be in a short list of possible "options", so later you can count on how many different futures each option had and use this as its weight when finally averaging all the options to get the "intelligent decision" (that will be between -5 and +5 again).

2. By the way, this random value doesn't need to be rounded, it can be -2.12, +4.00432, etc.

In the actual implementation, I have two ranges, 5 and 15 (for instance): In the initial decision I choose among -15, -5 +5 or +15, and in the following steps I take a random value between -5 and +5.

It just proved to work much better than the simplier case on stressing situations, but may be usign a Gaussian distribution of mean 0 and std.dev. of 5 (or 15) could make the same effect.

2. Hi!
I'm missing something in the explanation (reading again all entryes), because if in the first step you choose between 2 options, and in next frame (for example) for each of those 2 initial options, you choose a random (only one) option, finally you will only have 1 future and not 100 for each option, right? I think I miss something here :(
Well, what my mind says to me is that in each step (frame), and for each previous decision, you should again try to search for different futures, for example always open new 4 futures (-5, -2.5, 0, 2.5, 5) for each option, so you will be exploring much more futures and you'll have the probability to search a better one. Of course at cost of more computation.
But probably you will say to me that I miss something in the way... :)

1. Yes, you are missing somethnig!

Think of that this other way: Starting with your actual position, trace 100 random futures, all different, with all of it small steps being random EXCEPT for the first sterp of every future.

This first sted always use instead of a random decision over each degree of freedom, it choose only one of the degree of freedom, and then one of the 4 options I use for each degree of freedom. This way, if you have 2 degrees of freedom with four options each, you end having 100 futures withh teir first steps always being one of the 4+4=8 possible "options" for the system.

In average you will have 100/8 futures starting with a given option, but I don't force this, as in the long term it cancels, so I just take care of doing the first step of every new future I start to be one of the eight options considered.

Then, at the end, I group all the futures in the FutureList by its first step, and I call them "options" in the code (in UListOfFutures.pas). This way, each "option" will have its own sublist of futures, the ones that started by taking this exact option as the "decision", score all those futures, and sum them into the option's score.

Finally, sum the 8 option's scores so you can nomralize them by dividing each option's score by the sum.

The sum. of option*normalized_score gives you a vector (option is a vector, but with only one coordinate being non zero, the one that correspond with the degree of freedom you chosed to change in the fist step of all those futures) with the correct decision for the 2 degrees of freedom.

Using 2 degrees of freedon just makes you think on an option as a vector instead of a single value, thats all.

Hope it did clarify the core of the idea for you.

2. By the way, you can clearly see that difference on the first step of a future and the rest of steps by looking at the code: Go to UState.pas and look for TakeFirstStep and TakeRandomStep functions, you will have one for the State (the set of all StateParams) where I choose one of the available FreeParams and then call the real code inside this FreeParam coded in TStateParam.TakeFirstDecision

In that code, I chosse a value for it. For the available 4 possible values, I use the MaxValue set for that free param as a base, then decrease it by a factor called "Sensitivity" (only for the level 8 of the algorithm) and the four values are the value you found, its negative, and a third of they two: If MaxValue is 30 and sensitivity is 1, I use 30, -30, 10 and -10 as possible values for that FreeParam.

If it is not the first step, then TakeRandomStep do the work: It takes a random step over all the free params, and each param choose a value not with a simple random, but by using a normal distribution of probavility of normal(0, 10), being this 10 again 1/3 of the MaxValue of the free param multiplied by its "sensitivity", as the small value in the first step.

Thanks god delphi has a funtion for that already coded, so it is just a simple line of code in TStateParam.TakeRandomStep funtion.

3. In my first response I forgot an important step!

When I wrote:

Then, at the end, I group all the futures in the FutureList by its first step, and I call them "options" in the code (in UListOfFutures.pas). This way, each "option" will have its own sublist of futures, the ones that started by taking this exact option as the "decision", score all those futures, and sum them into the option's score.

I should had wrote in this way (notice the new parragraph inserted in between):

Then, at the end, I group all the futures in the FutureList by its first step, and I call them "options" in the code (in UListOfFutures.pas).

Then I round all final position of all the futures by iterating over all the PositionalParams (PosX and PosY in my case, but sometimes I try with Angle beging a psitional param too, you can switch in on in Player2D.DefineState function, you will see a commented version of the create of the Angle2 param as positional instead of computed param) and discard the futures with the same rounded postional params: the ones that ends inside the same grid little square and only sum one of them, the one with higher score by the way, as it gives the algorithm a little more "robustness" (I can use a lower number of futures and still get good intelligence out of it).

This way, each "option" will have its own sublist of futures, the ones that started by taking this exact option as the "decision", score all those futures, and sum them into the option's score.

4. Hi! Ok thank you about explanations, clearer now. Some doubts:
1) why is it so critical in the first step not to choose random options in the range of possibilities?
2) also, not sure to understand why in next steps to first one, you choose random options. Not sure to understand why random calculated futures are better than for example choose logical decissions that may find more futures than a random decission. For example my mind says to me that we would find more futures if: a) you start with every options (-5, -2.5, 0, 2.5, 5). So until now 5 futures b) next step, for each previous option you try all the options, so at this momment we have 5 x 5 = 25 futures. If you try many steps, many futures will be calculated but probably many more futures will be find because we try many options with a pattern that gives more possibilities to find futures...of course at a bigger computation cost. Let me know what you think about it

5. 1) Because otherway you will have 200 options with 1 future each, instead of 4 options with 50 futures each. In the first case, you can not "discard repeated futures" that is a key part of the method.

2) For 5 seconds you have 50 steps, so If you can manage 2^50 futures, it will work ok.

Anyway, each future is calculated form start to end, no tree or "splitting future" is done. Things are simplier than you are imagining I think.