Wednesday, 29 October 2014

Gain feelings

This is the 6th -and last- post about "Emotional Intelligence", please visit the Introducing Emotional Intelligence, goal-less intelligence, enjoy feelings, fear feelings and mood feelings before you go on reading for a proper introduction on the subject.

Introduction

To end the enumeration of the three basic feelings associated with a goal, we need to deal with gains and looses, and their proper feeling representation.

Gain or Loose feelings

When something you had at the initial state, like your health level, is higer or lower in the final state representing the end point of the future, it means you have gained or lost something.

In the early example of the "speeding" enjoy feeling, if you started totally stoped, and at the end of the future you are stoped again, then you did not have any real gain. Speeding is something you enjoy experiencing, but when it stops, nothing is left.

But with energy of health it is not true. If you started with 100% of health, but in the middle of the future you crash and loose 40% of it, then, at the end of the future, this lost is conserved. You really have -40% of something valuable for the agent.

Inversely, if you touch an energy green drop and gained energy, then may be you started with 35% of energy and ended the future with 55%. You had a real gain here, as even if you now do a full stop, the agent retain this gain, it still has 55% of energy.

When something "real" is gained or lost during a future, being it health, energy or some other valuable thing, you need to modulate the final score of the future, the "FutureEnjoy" we formed by adding all the "StepEnjoys" on the previous post. We will modulate it, as always, by multiplying if by a future's "FinalCoef".

The "GainCoef"

In the first example, as we were tracing the future we got some sum of "FutureEnjoy", but we also started with 100% health and ended with 40%. The "GainCoef" is then calculated as (final value / initial value) = 40/100 = 0.4.

In this example, as we lost 40% of the initial quantity, the "GainCoef" is 0.4, lower than 1 as it is representing a loos and not a gain.

But we can not just use this 0.4 as the "FinalCoef" we need, it would not work (it was in the root of the problem the previous goal model has, not being able to simulate more than a few seconds) as there is a factor we forgoten to take into consideration: the time.

The "TimeCoef"

If the doctor says to you "If you don't stop doing this or that, in 80 years may be you will die because of it", would you panic? Nopes. 80 years is way too much for me to care about a possible lost I could have in the future. We would dismiss the alarm and go on with our lives.

But if it says " ... then you will probably die in two months", all the alarms fires in your mind, and you promise to your self to avoid "this or that" as if it were the only important thing in life.

The "GainCoef" is always zero, as your health drops to 0 and you die, but how far in the future it occours makes it more or less important to you. We need a "TimeCoef" to make the "GainCoef" fade away in importance as it happends more and more into the future.

To construct this "TimeCoef" I defined two parameters that controls how near in time it has to happend to start being important tom me, the "ReactionTime" in seconds, and a second "Urgency" factor that controls how fast the alarm then grows as it happends nearest to your present in time.

The following graphic shows a "TimeCoef" plot for a "ReactionTime" of 5 seconds and "Urgency" factors of 1 (blue line) and 2 (purple line), while the green line represent the "FinalCoef" for Urgency=2 applied to a lost of 0.4 (you lost 40%):



Lets start with the blue line, it represent how the "TimeCoef" varies, for a "ReactionTime" of 5 seconds and an "Urgency" of 1, as the loose occurs in different points in time (time is presented in the X axis, where 0 is the initial point where the future started).

"TimeCoef" is 1 for times longer than the "ReactionTime" because if the lost occurs more than 5 seconds far in the future, I will not care about it, so the coef need to be 1 not to change the enjoy feeling we could had.

For times smallers that 5 s. the "TimeCoef" linealy drops to 0, meaning alarm grows linealy as it happends near to the present time.

So in this case ("Urgency"=1) the "TimeCoef" will varies with time t with:

TimeCoef = Min(1, t/ReactionTime)


If "Urgency" were 2, then the lineal growth would convert into a X² growth. As number are smaller than 1, we are lowering down the line, meanig the alarm will initialy grow faster, as you see on the purple line.

If "Urgency" were set to 1/2 the line would had been above the blue line, so we would get the opposite efect: the alarm will initially grow slowly, and only when it happends really near to the present, it gets really scaring.


So for any "Urgency" value, the complete "TimeCoef" formula is:

TimeCoef = Power( Min(1, t/ReactionTime), Urgency)

Mixing them into the "FinalCoef"

Finally, we mix both "GainCoef" and "TimeCoef" as in the green line:

FinalCoef = GainCoef  * TimeCoef + (1 - GainCoef)

Where (1 - GainCoef) represent the 60% you didn't lost.

The final "emotional" formula

 We now have all the ingredients and the way to mix them together to score a future ina completely "emotional" way:

If we have 2 goals G1 and G2, and each goal has 3 params for the 3 basic feelings called G.Enjoy, G.StepCoef and G.FinalCoef, then the future would score as this:

StepEnjoy = Sqrt( G1.Enjoy² + G2.Enjoy² ) * (G1.StepCoef * G2.StepCoef)
Future.Enjoy = Sum for all steps( StepEnjoy ) 

Future.Score = Future.Enjoy * (G1.FinalCoef * G2.FinalCoef)


That was all, this gives you an "emotional" way to socre futures, so if you had 2 options, now you can trace 100 futures for each, discard repeated ones, and score each option with

Option.Score = Sum for all different futures( Future.Score )

And, the AI desicion will be:

Decision = Sum for all option( Option.Vector * Option.Score )

Where Option.Vector contains the changes you does to all the degrees of freedom, or joysticks.

All is done then?

Apart for a post I need to add about how to model the joystick itself , yes, we are done with the emotional intelligence: we have fully defined a complete layer of emotional intelligence.

But there is still a big step awaiting: multilayered emotional intelligence.

But it will be on some weeks from now, I need to take a little time to prepare a V2.0 of the demo app, my actual version needs to be cleanep up a little before it.

Emotions full power

In this post I just show you a couple of new videos using the full "emotional" model for the goals and also a new system to auto adjust the joystick sensitivity (I will comment on this on a future post, it is far more important that it seems).

Asteroid field

First video shows 6 rockets being streessed by a asteroid field (with 50 of them) randomly falling down and how the actual intelligence can deal with this without getting nervous at all (this is thanks to the new joystick model).

I created this simulation because I needed some visual way to judge how competent the agents are in hard/delicate/streessing situations. It was the third or fourth video of a serie, as it was almost imposible to make a rocket be hited by a rock using 10, 20 of 30 asteroids at the same time, so finally I tried with 50, and even then only one rocket get hited!

We need the algorithm to be solid rock and stable, so this kind of tests are of great interest to me.


It is really the most remarkable video I have produced this far.

Natural killers

The second video uses the same algorithm, but this time we have two "ninja rockets" that can use its thruster flames to burn other's energy out. If one rocket burn 50% of the other's energy, one half, 25%, is added to its own energy, as if the laser beam could bring in some energy.

Burning others score quite high on both (I use a "gain feeling" associated with the energy burned out on the other player) but red one likes blood twice as much as the white one, so as spected, red one finaly wins in 2:30.

But then it will notice that the broken white rocket still has some energy left, as it was able to take some before crashing from an energy drop (the green circles), so it comes back to finish the work.

This final attack (at 3:00) is really interesting, as the red one uses the walls to rotate as if it where in a action film.

After that, life goes on as usual, nothing interesting happend.


The videos were recorded using V1.5 of the software, but I will not post it until I reach V2.0 and all code is cleaned up and ready for the next big step: multilayered intelligence.

Monday, 27 October 2014

Mood feelings

This is the 5th post about "Emotional Intelligence", please visit the Introducing Emotional Intelligence, goal-less intelligence, enjoy feelings and fear feelings before you go on reading for a proper introduction on the subject.

Introduction

After discussing the simpliest feeling associated with a goal, the enjoy feeling and its counterpart, the fear feeling, and the way the are added to calculate the global "step enjoy" feeling after a agent change of state -or step- we are now going to start dealing with the enjoy modulators.

We will start with the "mood feelings", the simpliest and more evident form of enjoy changers, and then turn into the most strange ones, the gain and loose enjoy modulators.

Mood feelings


Imagine you are the agent, you are walking on the forest, at a given speed, experiencing the enjoy feeling of "I enjoy speeding" represented by the distance walked. But today your energy is quite low, and moving on consumes the very last pieces of energy in you. Obviously, you are not enjoying it very much.

Now you are plenty of energy, but you have a health problem with your knees, or a small peeble in your shoes, and this makes the walk much less enjoyable.

By considering that your health indicator, from 1 to 0, when gets lower, makes all the enjoy feelings lees enjoyable in this same factor, we can use it in our formula as being a coeficient that multiplies the global step enjoy feeling, making it higher when you are ok, and a little less when you are feeling sick.

If you add the energy as another factor that contributes to make the walk less enjoyable, then we end up withh a formula, based on the global enjoy feeling or "StepEnjoy" discused in the last post, like this:

StepScore = StepEnjoy * (product of all mood factors)

That in the case of a kart like the ones in the videos:

StepEnjoy = Sqrt(Raced² + (Energy*dt)² + (Health*dt)²)

And then:

StepScore = Sqrt(Raced² + (Energy*dt)² + (Health*dt)²) * (Energy * Health)


High feelings

In the two examples above, both moods had factors lower than 1, meaning it reprented something that can make your walk worst than usual, but as always, there is another side: if you have a mood with a factor higher than 1, it would mean you are modelling a "high feeling".

Imagine you want your two agents to try to walk together. The most efficient way to do it is by adding a mood feeling that use the distance between them to compute a factor that is higher than one when they are near, and lower than 1 when they are apart.

TogheterFactor = 10 / (1+distance)

It will score higher than one when you are nearer than 9 units, making it more enjoyable than not walking near, so they will naturally tend to walk together. Adding one was just to avoid zero divisions, dirty minds!

So you can add a new goal, the "together" goal, and model it as having enjoy feeling of zero and a mood factor of 10/(1+distance) and make the agent to love wlaking together.

Using a goal with zero enjoy feeling is not the best option, in this case we used it just to show how to build a goal that only affects the mood, but in my implementation, enjoy feeling is distance*dt and the mood factor is 10/(1+distance).

Example

Those kind of feelings are new to the emotional engine, so there are not previous posts about it or videos, but I have filmed one that shows mood feelings in action.  You will notice how sutile some of the movements are when compared to the examples using goal-less inteligence or even the video showing one enjoy feeling (the distance raced example).


The agents in the video also bost some of the newer "loose feelings" to avoid near crashes, but for the most of the time, the enjoy feelings and the mood factors do all the work.

Fear feelings

This is the 4th post about "Emotional Intelligence", please visit the Introducing Emotional Intelligence, goal-less intelligence and enjoy feelings before you go on reading for a proper introduction on the subject.

Introduction

As commented in Introducing Emotional Itelligence post, the goals, when are defined usign "feelings" toy models, have only three scoring parameters, three kind of emotional "outputs".

The first of them correspond to things you "enjoy" experiencing, like speeding where you enjoyed velocity. Enjoy feelings are basically added together into a general "enjoy feeling" score after each movement the agent does.

But all the three components of a goal, each kind of basic feeling (enjoy, moods and gains) has a reverse, a negative counterpart you need to know and properly manage in your algorithm.

Fear feelings


The dark side of the enjoy feelings are the "fear feelings" or "hate feelings" (they are basically the same) and matematically correspond to goals with a negative fixed value for "enjoy feeling". You can read more on them in this old post about negative goals experiments.

Whatch this video of agents with stronger or lighter negative scorings (crashing and lossing any amount of health is simulated by stoping the agent after a crash -so it doesn't change its ending point- and modeled as a negative enjoy feeling in the actual "emotional" languaje):


Plase note karts are difficult to "suicide" agents. This same case, when applied to rockets, can make them decide to actively crash into the ground in stressing situations as when energy is depleting and it is modeled with negative feelings.

Thecnically, the first thing to note about negative enjoy or "fears" is that they totally break the logic of the algorithm in a couple of basic senses:

1) If we are measuring some kind of entropy gain, having a negative value means your entropy can decrease some times. It is not an entropy of a system, or your system is quantic, and this method can apply to quantum systems bacause second law of thermodinamics can't be applied neither.

2) We totally meshed up the metric we had in the state space. Negative distances are not allowed anywhere.


The results are worst than you could foresee. As in a real psicology, two efects can be easily detected:

A) Being driven by fears means that, in some cases, the fear will make the agent to stop, being unable to decide what to do.

In the algortihm, the agent can find out that moving to anywhere is just too "scaring". The fear feelings are ginving you big negative scores that, when added with the positive ones that could exists, still gives a negative global enjoy feeling. So all options will score negatively.

As staying still, away from the scaring futures you can envision, is not as bad (negative) as moving, finally the option "do nothing" scores the most and is predominant.

B) If the previous case happends when you have a way to suicide, you will do.

When an agent is surrounded by things that score negative, turning around and crashing with the ground is the best option by far, as it always scores zero (you die), so the intelligence will happily suicide.

Even small negative feelings must be avoided. When facing the sufficient desperate situation, this negativiness will be almost all it has, and it means limit situations will make the fear to win and it will commit an nice and "intelligently planed" suicide.

Many videos showing silly ways to crash where produced and then discarded as erroneus changes in the code, but indeed they were only fears being introduced into the scoring feelings. I don't use negative enjoy feelings any more, only for "educational purpouses", as they produce neurotic and suicide intelligences.

How to code them


If you still want to use them, you need to deal with negatives. I do like this in my code (but don't use it, so may be is not the right solution).

Remember we used before as the global enjoy feeling the squeare root of the sum of the squared enjoys, like in the euclidean distance you use d = sqrt(dx²+dy²), but now dx² can be negative.

Mathematically there is very suggestive way to think of it (call it a joke if you want): if the enjoy feeling dx² is negative, then the quantity being measuring, dx, must be and imaginary quantity!

Being just a mathematical joke, entropically and psicologically accurately reflects the situation: you are imaging something with a negative growth of entropy, so it is only possible in your imagination, not in real life. And psicologically, you are irrationally running away from a danger that doesn't really exists, it is purely imaginary. Take it as only a naming curiosity.

So we need to operate those negative enjoys as being pure imaginary numbers, so when squared, they turn into negative numbers.

It means (dx²+dy²) in the distance exampe, can be negative, and squared root of negatives are not possible. Well, only if you consider this sum as being, again, an pure imaginary number instead of a real negative one.

Using code and assuming sqrt() is the squared root, abs() the absolute value and sign() is -1 for negatives and 1 for positives, then you should first sum for all goals:

Sum = sign(enjoy) * sqr(abs(enjoy))

And finally define the global enjoy feeling again as:

GlobalEnjoy:= sign(Sum) * sqrt(Sum)

Why do the negative feelings even exists then?


I suppose they where neccesary in natural evolution of life, when the intelligences the agents had was too limited. If you can not foresee what will happend if you do that, then a simple strategy is adding a fear felling associated with unknow things.

Think of cocroaches: they can't harm you, but you desperately hate having one near. The "fear" is profund, you can't scape it, because it is an ancient instinc: avoid them because they run fast and randomly, and you can not know if they will be on your leg in half a second. Fear for uncertainly is the simpliest solution.

So may be they where neccesary once, and may be they still saves lives, but for an artificial intelligence it is much better if we discard this early beta of feelings and focus on the nice working ones.

Saturday, 25 October 2014

Enjoy feelings

This is the 3rd post about "Emotional Intelligence", please visit the Introducing Emotional Intelligence and goal-less intelligence before you go on reading for a proper introduction on the subject.

Enjoy feelings

Once I had this simulation with the goal-less algortihm working I wanted to go further. The kart was really driving it quite nicely, but it clearly was not optimal.

Why? The idea was so simple and powerful it was not clear the problem at a first glipse.

Trying to improve


I tried to use longer and longer futures with bad efects. Also incremented greatly the number of futures calculated, but it just showed a marginal gain. This was not the root of the problem.

The real problem was with the scoring of the futures. Being always one was not fair. In some futures the kart crashed quite near the starting point, while other times the kart was able to safely go far away. You can't say both futures are the same thing for you, it had to be some oversimplification.

I decided to try with the more evident candidate for me: the length raced on each future would be the score, so we no longer use N = "number of different futures" to score an option, instead we use the sum of the distance raced on each future.

With this new option scoring eschema, the intelligence recieved a great boost, as you can see in the following video, were the old goal-less intelligence is clearly outperformed by two of the new models (one boosting the new "distance raced" as score, and the orther one, with the squared distance raced).


Something incredible happened: now the agents seems to just like speeding, so they behave and drive more agresively. The one with the squared distance was even more agressive and finally wins, it races more distance in general, but also it had a tendency to being a little too imprudent some times.

In retrospective, this test was a perfect succes and I could not make it any better today. I chosed the correct formula -distance raced without the square- for the task (but first I tried hundres of others, I confess) for several technical reason:

1) Distance raced is a real way to measure the entropy of a moving particle, as the kart is.

The entropy a particle has when you consider a gas, can be aproximated with its linear momentum v*m, so a path integral of this momentum over the future's path (a red or blue line in the videos showing futures), the path integral of v*m*dt, would be a perfect candidate to assign an entropy to the path of a future.

But m is a constant in all my futures so I can safely discard it, we will normalice it after all so it desn't make any difference, and v*dt = raced distance, so we are integrating the distances raced on each time step. That is why the distance raced is the correct way to give a moving particle some form of entropy gain aproximation.

Depending on how you do the path integral, integrating over dt or over dx (the lenth of the delta of path at each step) you will end up with the distance raced, or with the squared version of it. Both are similar ways to compute a real entropy, you only change the physic model for witch you calculate the clasic entropy.

2) Using a real distance to score the future is equivalent to have a real metric in the state space of the system. It also applies to the squared distance raced of the third winning kart.

If you define the distance from state A to state B as the minimum distance raced by a future starting in A and ending in B, you have a real metric on the space of all possible states of the system.

Enjoy feelings

The fact that the score we calculate at each step is in the from v*dt is quite important to understand how we were quitely introducing "feelings" in the mix.

We wanted the agent to love speeding, and we ended up using as score the "speed" you are experincing at each moment, multiplied by the amount of time you enjoy it, "dt".

Enjoy feelings represent anything the system, in some relaxed sense, enjoy experiencing. Something possitive that accumulates with time, and that you can't loose, like in the distance raced.

You will need to have enought "enjoy feelings" in your simulations, as the emotional intelligence desperately need them to even work. Having a enjoy feeling of zero means the agent will stop deciding and freeze forever. It is dead. It will only follow the physics laws in the simulation from now.

Other examples

Luckily, all other goals you would need to add to your intelligence will allways have a "enjoy" feeling associated. It is a must, and you, as the "dessigner" of this intelligence, have to find the positive on it, the "bright side".

So the golden rule here will be: never add a goal without a enjoy feeling associated.

For instance, a goal created to avoid damage on the rocket, the "take care of health" goal, will have a enjoy feeling associated with your actual health (from 1 to 0), as if being healthy, per se, was a way of enjoying as valid as speeding was for a kart pilot.

In this case, just by having a 50% of health in a given time point, you add in this step of the future an enjoy score of  0.5*dt, meaning you not only enjoyed racing at 200 km/h for some dt seconds with v*dt, you also enjoyed the health you have, multiplied againg by the time you enjoyed it.

I have always ended up determining one "thing" you enjoy associated with the goal or motivation I needed to model, then assume the agent was enjoying it for a delta of time.

The goal "take care of your energy" is quite similar. In this case, the thing you enjoy is "having energy", and the energy level (from 1 to 0) so your enjoy is energy*dt.

Another "only enjoy" feeling I use is "get drops" and "store drops". When the rockets take energy from drops to the storages, they are really enjoying it as much as speeding. In this case enjoy = energy transmited = speed of the energy transmision * dt. You enjoy the "speed" of the transmision, not the energy transmited, as you need to use *dt some how in your formula.

Note: The scale of each feeling have to be manually adjusted in this actual implementation. I used as the unit how much you enjoy racing the size of you body. With this in mind I judged it was fair to use h and e from 0 to 1 as its enjoy feeling value. As you later can set a "strength" of the feeling in the imlementation, this scale is not fixed and can be adjusted in real time. This emotional intelligence is not able to auto adjust the feeling scales -or stregths- to get a better mix, I manually adjust it before every simulation. This will be elegantly addressed in the next version of the model, the "layered" model I am currently working on.

Could exists enjoy feeligs without "*dt" at the end? No. If you try it, may be you will be able to adjust it to something usefull. But if now you switch the delta time from 0.1 s. into a finer 0.01 s., the effect is that your usefull goal now weigths x100 compared to all the other goals that used dt in the formulation. Being so higly dependent on small changes in the delta of time makes it a bad idea to add it to the mix.

Scoring a future with several enjoy feelings

A kart pilot with only one enjoy feeling was a simplistic case. In general, we need to deal with agents that have a big number of them, so we need to know how to actually combine them into a single score.

The answer was in the speeding goal we already used.

Remember we used v*dt as the enjoy feeling. But v is a compound of two vector components, v = vx + vy. We could have had two goals instead of one and still get the same intelligence, so both ways have to be equivalent.

As the value for v is sqrt(vx²+vy²) where sqrt() is the squared root-, then, if we add the enjoy feeling for health (h) and enjoy feeling for energy (e), the total enjoy feeling should be:

Enjoy = sqrt(vx² + vy² + h² + e²) * dt = sqrt(v² + h² + e²) * dt

This is the way I add all my enjoy feelings, one from each goal (as they allways have some positive enjoy feeling associated), to get the enjoy feeling (named "Points" in the code) corresponding to every step on the future, by computing it after each state change.

By the way, this makes the mixing of enjoy feelings to always be a real metric over the state space as we mentioned early, no matter how many enjoys you add.

Will it need more feelings?


Enjoy feelings accounts for all good things you can detect while you are imagining a future. It would be enought in a world with no dangers, no hunger, no way to harm yourself and die, no enemies... it is not the kind of world we need to simulate, we need the intelligence to cope with dangers, with batteries that drain, with bodies that can be broken, with others agent that will compite.

We are going to need more than just enjoy feelings if we are out of candy land.

In the next tree posts of this serie we will deal with different ways to cope with this danger, and how to mix them to get a realistic model of how we feel and react to a danger, as the possibility to loose something, being it the health, the energy... or money, and how it modulates the enjoy feelings.

Wednesday, 22 October 2014

Goal less intelligence

This is the 2nd post about "Emotional Intelligence", please visit the Introducing Emotional Intelligence before you go on reading for a proper introduction on the subject.

Goal less intelligences

In my first post I already commented on the internals of the simpliest entropic intelligence possible, one that scores all the futures as 1. If you haven't read it and want to know in more detail this case, you can visit the link first. Anyhow, I will try to summarize the basic working of tis model again.

We have a kart as our system, a simple physic simulation code is able to answer the question "where will you be in 0.1 seconds if now you are here".

We also have a degree of freedom as part of our system: we can move the driving wheel left or right at any moment by "pushing" a imaginary joystick left of right.

We will call it "our options", so in this simpliest case, we only have two of them: left and right. The kart engines are always on, there is no way to break or slow down, so our options consist on a single number -the force you have to apply to the imaginary joystick- but in general, it will consist in a vector of N numbers, where N is the number of degrees of freedom -or joysticks, in my previous example- so think in an option as a vector containing the "push" you need to apply to the N joysticks to make the agent evolve "intelligently".

With just that information, this algorithm is able to tell you how strong you should push this joystick at any moment to get a "intelligent behaviour" on the kart. What this will mean in each case is out of your control, the intelligence is "goal less".

I call it a "common sense" or "goal less" intelligence.

Watch this video to see this implementation in action:


Some technical details: The video shows a simple kart simulation with two degrees of freedom, the acceleator and the left-rigth control. For each one I use four options, that in the case of left-rigth control, could be: -15, -5, +5 and +15 as it worked much better than the simplified two-options counterpart.

The algorithm in detail

I will detail the algortihm in this basic form using this image captured from the real application (V1.1) as my test case:



1) Take the first available option, in this case, "push the joystick left". We are going to "score" this option so we can later compare with the second option.

Everything related to the first option "go left" is painted on blue in the above image. For the "go rigth" option red is used. This way you can clearly separate what each option use as futures.

2) Take your actual position (or "state") and use the simulation to know where you will be after 0.1s if you push the joystick as this option dictates. It will give you the state where you will be after taking this decision.

In the image above, this would be the origin of all blue lines, actually under the kart body. For the second option, going rigth, it is the origin of all the red lines.

3) From this "option initial position", imagine a 5 seconds long random future, by iterating steps of 0.1 second long. In all those 0.1s steps, the degree of freedom will take random values in a given range, in this case from -5 to +5 units of force, so the joystiks are randomly pushed left or right in every new step, until you reach you time horyzon, how many seconds in the future you want to "think". In my first tests it was 5 seconds, so at a 0.1 second steps, it will take 500 steps until you arrive to the future's end point.

This step woud draw a blue or red line, depending on the option you were scoring.

4) Take this end point coordinates, round them to a given precision (in my case I initially used 10 pixel units), and add it to the option's list of different futures found (a list of vectors containing the ending point of all different futures found this far). If this ending point is already in the list, just discard it.

In the image above, it correspond to the blue circles for the left option, and red circles for the right option. The radius doesn't mean anything, as they all score as one in this method.

5) Repeat the process fom 3) to imagine a new possible future untill you try a fixed number of futures. In my case I used 100 futures to try, but a biger number like 500 works better.

This would make all 100 blue dots appear one by one on the first option, and all the reds on the second option.

6) Score the option with the number of different futures you found, N.

If you imagine the grid size used to round the final future positions as representing "tiles" on the track, as in a grid, the this N is actually measuring the area of the surface "touched" by the blue circles.

The tile area itself -the squared grid size- is not counted as we only need to compare the blue area with the red area.

It also means we are scoring each future with a simple one, and using N to score the option becasue it is the sum of all those ones. Options always score by summing all its futures' scores.

7) Repeat with the next option available (in this case "turn rigth") from point 2) until all options have been scored.

Now you have the blue and red areas measured.

8) Now we normalize all the options score so they sum 1: start by locating the smaller score and substract it to all the options scores, so now the smaller one is exactly zero. Now divide them by its sum so they sum 1. You have converted the option's scores into weigths you can use to average.

In the image, it would mean you first find the smaller area -the  blue one is smaller this time- and take it from all the areas, as if they were shrinking until one dries completely. Finally, one area is zero and the others are >= 0.

Note: This step can seems too dramatic a change. It is in this case, but when you play with 8 or more options, doing this will make intelligence faster deciding among similar options. Imagine the kart has a Y shaped junction in front. Going left or rigth is almost equally scoring, but it you do nothing and go straight, you will crash. The sooner you decide one path, the better, so in general it makes intelligence "sharper". In a case we were not dealing with such a "organic" agent, may be disconnecting this would make the algortihm more "neutral".

9) Then, the Intelligent decision is the averaged value of your options -remember they were just vectors with one component per free param- weigthened with the scores we normalized in 8).

You are graphically comparing blue and red areas in the inintial example image. If you see more blue dots, you should go left.

Now you have your intelligent decision, it is a vector and each component is a force you need to apply to a joystick at each step, so you end now by simulating, on screen, this decision:

10) We are back to reality instead of imagining a future. In my case, I switched between two posible states an agent can have in my implementation: a "real state" showing where you actually are on screen, and a secondary "imaginary state" I use in the simulation while the agent is imagining a future (there is an internal flag for that in the agent's code).

11) Push the joysticks with the forces contained in the "intelligent_decision" vector. Pushing the joysticks change the state (now the "real state", we are not imagining a future) of the agent -the joystick position changed- so when you ask the simulation "where will I be in 0.1s from this initial state", the anwer will be the next real position of the agent in your simulation, the one showing next on screen.

12) Here you refresh the screen with the new positions -or states- of the agents, using the "real states".

We could also draw the futures lines in the step 3 to see "what this agent is thinking before deciding", in the exe you can switch this on or off and in the video above it was "on", so you could see these blue and red lines create in real time as the agent is pondering its options.

This was the exact moment the image above was taken!

13) You ended a frame on the film. The agent are now in a differnt position. Take those positions as initial positions, and go back to step 1. You have a video of an agent moving around "intelligently".

At this point, you have produced a video like the one I showed at the beggining. Watch it again and try to follow the kart's logic when it decides to make a turn or the other just based on the number of blue and red dots.

Remarks


Please take those remarks into consideration if you plan to really code it:

-Using N (number of different futures of each option) as the option score is a simplistic way to approximate entropy. It doesn't matter too much because we will normalize the scores so the scale is not important. Basically, you are assigning a score of one to all of the futures.

-Instead of N you should try to calculate the actual probability of each of the different futures by counting the "hits" this ending point received -how many futures ended on this rounded position- and normalizing those hits by dividing by 100 (as you imagined 100 futures, the sum of the hits is 100).

OptionScore = Sum for all different futures of (p*Log(p))
With p = probability of this future = hits/100.

-Preselecting the grid size proved to be tricky. In some situations it is better to use a small grid size (when you are in a difficult situation) but other times a bigger size works better, depending on how "close" or "open" is the space you are moving on.Using some kind of heuristic is better than using a fixed grid size.

Forewords


Along with the algortihm itslef, you should consider those two concepts before continuing:

1) The step in witch we deleted the futures with similar ending points was the moment the algortihm became "entropic", as this mean we are using some kind of entropy. It is key for the algortihm to work, so it is always untouched on every version of the algortihm.

2) The moment we decided to use N as the option score we was really saying "all the futures score the same for me". This made the algorithm "goal less" and the resulting psicology was "no psicology". By changing this score from one into something more meaninful is the way to radically improve the resulting AI.

3) If you use Sum(p*Log(p)) instead of N as commented on the remarks, you will be using the best goal-less entropic intelligence available. But all the futures still score "one hit" so the resulting intelligence remain a goal-less and is psicologically flat.

Please also note being "goal-less" doesn't mean not begin capable of incredible things. If you watch the video once again, you well notice how well the kart does it in quite a sliding track!

If you were to let this AI drive your real RC helicopter, it would do it correctly at the first try and keep it up and safely running on a changing environment witch wind changes. It is the ideal autopilot for any gadget or real system as far as you can give it a rought approximiation of a simulator. Quite remarcable in my opinion.

From this point on, everything we will do is change this score of 1 with some different functions and discuss the consecuences.

The final goal of this version 2 posts serie is to change the one with a nice working function based on toy models of real feelings that we supposed the agent has.

Tuesday, 21 October 2014

First "emotional" video

My code is not still "fully emotional" this far, some cases are not still used and others lack more testing, but I am ready to produce my first video where goals are considered in this new "emotional" way.

The video just shows the old test case of a set of agents -karts in this case- moving around, where the must collect drops and then deploy it on squared containers to get a big reward, but this time they are rockets inside a cavern and follow the goals in a fully "emotional way".

The changes are not totally evident in this case, the task is too simple to make a great difference, surely I need to find more challenging scenarios for the next videos. But you will still notice the big step in the small details: how actively the pursue theirs goals and how efficiently they do it.


You will notice the rockets has changed. Before this, there were a couple of gauges showing you the energy and health levels, but they were quite distracting. Now the energy and the health levels are represented as triangles painted on the left and right sides of the rocket bodies.

Once the three energy containers have been filled, the main goal is no longer active and they change into a more borring strategy of just hover around and land to refill energy. I could had avoided it be rising the strength of the "get drops" and "love speeding" goals, or lowering the "care about energy" one, but my main goal was to show a set of agents with a strong tendency to do sometinhg -collect energy- but still being capable of keep healthy by correctly using their "dont loose your health" warning feeling.

I will post new videos as soon as I can produce nice test cases of all possible "feelings" combinations, but at the present moment, I still need to retune some old parts of the code that are not working with this new model (in particular, there are a couple of mechanism to auto adjust internal parameters of the AI -the grid size for detecting similar futures and the sensibility of the joysticks- that I miss a lot as they improve intelligence and stability at no cost).

Monday, 20 October 2014

Introducing Emotional Intelligence

In the actual entropic intelligence algorithm, the scoring you assign to each different future you imagine is the golden key, as they determine how the options you are considering will compare to each other, ultimately defining the resulting "psicology" of the agent that makes it behaves one way or the other.

These future scorings are made up by adding the effects of a set of different "motivations" or goals the agent has, like in "I love speeding", "I care about energy" or "I care about health", measured over the future's path, step by step, like in a path integral aproximation.

Being able to define the right motivations set for an agent, along with a proper way to calculate the different effects those motivation could have on every step the agent takes, and mix them together to get the correct future's score, is ultimately what I am looking for and the life motiv of this blog.

I have used quite a big number of simple goal schemas to simulate some interesting intelligent behaviours, like the ones previously presented on this blog, but I am far from happy with them.

Basically they failed to show me the real optimum behaviour I was specting from them. Some had weird problems on limit situations, like if you are running out of fuel and out of energy at the same time. But there was also an ugly limitation on the length of the futures it was able to handle that really made them not so generally usable.

In the try-and-error process, I could found some interesting patterns on the goal scoring schemas that worked better: they always avoided the possibility of negative scores, and most of them defined a real metric on the space of states of the system (when you assign a score to a future connecting the initial and final states of the system, you are inducing a metric on the state space, one that tells you how interesting a future looks for your agent).

In some point of the procces, I felt that the key idea for going from a goal less inteligence, based on a pure physical principle of "future entropy production maximization" (as described in my first posts or in the Alexander Wisner Gross' paper) to a stable and rich human-like intelligence was trying with some realistic modeling of the "feelings" themselves and how they affect our own internal scoring systems, and then try to base everything else on them.

Plase note that, when I name some different parts of the involved equations like actual feelings, they are representing perfectly defined mathematical functions, not any kind of pseudo scientific concepts. It just seemed to me that the parts I needed to use in the mathematical mix were incredible similar (in the form and the effects on the resulting behaviours) with concepts usually related to human psicology. Over the time I have naturally turned into naming them as "enjoy", "fear" or "moods". Take this as a pure mnemotecnic trick or as a clue of some deeper connection, anyhow it will help you to better visualize the algorithm.

The introduction of those "feeling" models supposes a bost in the different "motivations" I am able to directly simulate now. It is quite easy now to model all basic goals I was using before and they work much better, but it also allowed me to model new kind of interesting and useful motivations.

Before going on, here you have a video showing the full potential of the emotional intelligence in streesing situations, like 50 asteroids falling over your head:




In the following posts I will recapitulate the actual working of this new "emotional" version 2.0 of the entropic intelligence algorithm in full detail. I will not mention things that didn't work in the past (they have been deleted from the V2.0 code to make it clearer) and follow the straight line between my first simplier model and the actual "emotional" one.

The topics of those posts are:

Common sense
We will examine the initial form of the AI that corresponds to Alex Wisner Gross' paper. The psicology it represent is no psicology at all, just pure "common sense" in action, and correspond with the case where all futures score just one.

Enjoy feelings
We will jump to a much better version of the AI where the distance raced on a future was the key. We will define it as a "enjoy" feeling and discuss the correct from it should be calculated to have an "entropic sense". I will then comment on some other possible examples of "enjoy feelings" and how to mix up several "enjoy feelings" on the future's final score.

Fear/Hate feelings
They correspond to negative enjoy feelings and represent the irrational fears as opposed to the ones based on real risks. It is generally a bad idea to add them into the mix, as the resulting psicology will include hidden suicide tendecies and, on some situations, the AI will panic and block. They will also negatively affect the quality of the resulting metric on the state space of the system, so I have actually banned them from my implementation, even if they are correctly used if you define such a goal.

The mood feelings
Mood feelings change the "enjoy" scoring you calculated from the exisiting mix of "enjoy feelings" by multiplying it by a factor. If it is lower that one it will correspond to "negative reinforcement" like when you are walking quite fast but you can't fully enjoy the speed because you have a peeble on one shoe or you are really exahusted. In the other hand, when it is bigger than one, it is modeling a "positive reinforcement", like when you are an intrepid explorer walking into the unknow or you are in love and prefer walking along with your beloved one.

Loose feelings
Losing your life can be scaring and must be avoided some how, but if it will happend 100 years in your future you don't really have to worry about it. Loose feelings are weird, their effects fade out in the future and apparently break some laws about entropy and metrics (all fears do after all), but they are really needed -by now- if you are serious concerned about the agent's IQs.

Gain feelings
They are opposed to the loose feelings and correspond to great hings that could happend to you at some point in the future. Like in the loose feelings, the importance of the gain tend to zero as it happend in a more distant point in the future. They can simulate the effect of landing a damaged rocket to have it repaired and fill its health up to 100%, or model a rewarding feeling when you avoid other player's dead, for instance.

This will close my presentation of the version 2 of the entropic intelligence, the emotional entropic intelligence. In some point I will release a version 2.0 (note from the future: I did it on 1st december 2014) of the app and its source code, internally revamped to reflect this emotional model and the new namings, with examples of motivations based on all those new feelings.

There will be a future version 3 in witch the three motivations' parameters (the "strength" of the feeling, for instance, I will discuss them on the "loose feeling" post) will be automatically adjusted to the optimum values for each situation in real time, boosting perfomance in a completely new way (at a computational cost) if they behave as I spect.