Thursday, 20 November 2014

A Talk on Emotional Intelligence

Yesterday I presented the "Entropic Emotional Intelligence" model in a long (2h!) talk at the Computer Science faculty, Murcia University, as a way to add intelligence to video game players:
 

In about one week there will be an official video (in spanish) in the university tv site, that I will upload to youtube so I can subtitle it to english.

The slides in spanish and english are available at the blog's download page.

In the meanwhile, the paper I am working on is getting slowly to the finish line. I can't really publish it on any journal because they only accept 30 pages or less articles, not +100 pages, so I will publish it directly in ArXiv.

Tuesday, 4 November 2014

Is it the Terminator AI?

Most of the people I talk about this magic algortihm use to bring into the conversation the terminator that using a powerfull artificial intelligence decided to destroy the humans. Scaring.

It has been something that has really preocupated in the process, really, I don't want to help create a perfect weapon anymore than you would do!.

Is it actually feasible?

 Actually, the system is quite near of giving you such a posibility of building a trully intelligent robot, just add to the AI some sensors, conected to a sensorial AI of the kind that actually exists that detects patterns, forms, 3D shapes, predict the future state of the system, and use RAM to store patterns so it can learn in the process by applying deep learning techniques. It is all done.

If you think of this entropic emotional intelligence like a "mother board" with a "intelligent CPU" on it capable of running this algortihm, and conect it to the forementioned artificial intelligent sensors, they can feed the intelligent with what would be our actual "simulation".

The resulting machine could be then inserted into a real robot and let the "black box" deal with the robot joysticks. All this can be almost done today in my opinion, so the alarm is justificated I must say.

But I have finally found the corner stone of the algorithm that prevents it from being "bad" in any sense. There is a natural filter in the intelligence for that, but it is not natural in the sense that animals have it, no, we don't have this "feature", it was deactivated by natural selection to make us more agressive in a surely very rude environment.

In that moment it could be a good heuristic, but only because the intelligences where not fully developed.

So what is about being good boy or bad boy in the algortihm?


The exact point is where negative enjoy feelings are allowed into the thinking process. Intelligence formulaes do not admit negative enjoys at all, all the enjoys are first squared before being used by the intelligence, so when negative enjoy feelings are allowed by our "emotional system", when they enter the thinking process, the are converted into possitives as a result of squaring them.

It is the same thing that occurs when you calculate the length of a vector, you square the difference in each dimension, and by doing this, you accept that, in a distance, the sign of the difference is not important, as negatives become positives inside the distance formula.

In the intelligence formula, the enjoys of all emotions are squared, summed, and then the square root is used. Exactly like in the euclidean distance formula. So our intelligence is "euclidean" when calculating the "length" of its combined feelings, and so, negative ones are considered positives inside the intelligence.

What does it means for the agent? Anything that scares him, like dying, also attracts him, as in its mind, fear is as attractive as a real enjoy. It will enjoy the fear and will run toward scaring situations.

This is why natural selecction used it. If you feel atraction for what produces fear, you will fear your enemies, and still find the courage to attack them, and the more dangerous it gets, the more you enjoy it. You are a natural killer, a T-Rex. So being atracted by fear is a great advantage, but lead the intelligence toward anger adn bad behaviour, in a general sense, emerge.

The actual form of the algorithm doesn't allows negative enjoys to flow inside the intelligence, they are discarted. With only this measure, the resulting intelligence will never be violent, will always decide based on positive things, and it make it a "nice guy" in all the senses you imagine.

Can this filter be circumvented?

I am afraid it can be done, I have played with it countless times, and it is not so difficult, but the good news is that, the more intelligent you try to make your "negative enjoy filter off" agent, the more it will tend to just suicide. Before that, the intelligence will start pondering consecuences of being atracted to negative enjoys and will determine it is bad, it is not avoidable, it will notice.

Once detected, the algortihm will, if allowed, decrease the negative enjoy feeling strength to zero, and if you don't allow it to be done by code, then the intelligent decision will be suicide.

It is anavoidable, a multilayered intelligence as described in the paper can not be bad and really intelligent at the same time. It is not possible.

So yes, a robot like a terminator can be build tomorrow, and this algorithm could be on it, but if the coders decide to break the rule of not admiting negative enjoys, the resulting AI will be... brute.

So tonight I will sleep well!

...I think I am the first artificial intelligence psicologist!

Busy writting

I consider the algortihm of the "Entropic Emotional Intelligence" almost fully completed, not in this blog, but in my mind. I still will need some months to put all the ideas on the code and test it in its finished form. I have great expectations, but I think it will take a lot of CPU!

In the process of building the general version of the algortihm, I am also writting a complete academic paper detailing the algortihm with a more technicall aproach than I can follow in this blog.

It will keep me busy for some monts, but after that time, I promise to add it to arXiv inmediately.

Ah! I also found the idea for my next algortihm: is about consciousness...

Uncertainty

If the algorithm wants to be generally usable, it must be able to gracefully deal with uncertainty.

When a rockets detects a falling asteroid in the video shown in the last post, the rockets imagine a future where the exact position of the asteroid is imagined, with no errors. It is only possible because it is a deterministic simulation what the rocket is using to calculate the future positions of the asteroid, and as you can see in the video, they can avoid the risk with a remarkably cold blood.

But it is not realistic. In the real systems, there are uncertainties. You know the asteroid will be in that position in one second, but with a standard error you can not control.

In the simulation is correspond to a random noise added to the asteroid velocity at each step, but only when a future is being simulated, in the real simulation, the one you see on screen, the asteroid follow strictly with the laws of physic using its real position and velocity.

In the following video, traces of the futures imagined by the rockets for each asteroid is drawn. As you could spect, the traces look like a whaterfall. As the future goes on, more and more small uncertainties accumulate on the asteroid velocities, so after some time, the traces obtained for each future diverge.

The effect on the rockets is that now they don't have to avoid a single falling trace of an asteroid, they have to hide away from a shower of asteroids falling in different directions. The rockets panic some times, and if they find no way to scape an asteroid, they just wait for it.


In this example the rockets doesn't have any uncertainty about their own positions or velocities, it can be added, but it rapidly increments the number of futures you need to simulate, and my CPU is already smelling strange.

No real rockets were harmed while producing this video footage! 

Wednesday, 29 October 2014

Gain feelings

This is the 6th -and last- post about "Emotional Intelligence", please visit the Introducing Emotional Intelligence, goal-less intelligence, enjoy feelings, fear feelings and mood feelings before you go on reading for a proper introduction on the subject.

Introduction

To end the enumeration of the three basic feelings associated with a goal, we need to deal with gains and looses, and their proper feeling representation.

Gain or Loose feelings

When something you had at the initial state, like your health level, is higer or lower in the final state representing the end point of the future, it means you have gained or lost something.

In the early example of the "speeding" enjoy feeling, if you started totally stoped, and at the end of the future you are stoped again, then you did not have any real gain. Speeding is something you enjoy experiencing, but when it stops, nothing is left.

But with energy of health it is not true. If you started with 100% of health, but in the middle of the future you crash and loose 40% of it, then, at the end of the future, this lost is conserved. You really have -40% of something valuable for the agent.

Inversely, if you touch an energy green drop and gained energy, then may be you started with 35% of energy and ended the future with 55%. You had a real gain here, as even if you now do a full stop, the agent retain this gain, it still has 55% of energy.

When something "real" is gained or lost during a future, being it health, energy or some other valuable thing, you need to modulate the final score of the future, the "FutureEnjoy" we formed by adding all the "StepEnjoys" on the previous post. We will modulate it, as always, by multiplying if by a future's "FinalCoef".

The "GainCoef"

In the first example, as we were tracing the future we got some sum of "FutureEnjoy", but we also started with 100% health and ended with 40%. The "GainCoef" is then calculated as (final value / initial value) = 40/100 = 0.4.

In this example, as we lost 40% of the initial quantity, the "GainCoef" is 0.4, lower than 1 as it is representing a loos and not a gain.

But we can not just use this 0.4 as the "FinalCoef" we need, it would not work (it was in the root of the problem the previous goal model has, not being able to simulate more than a few seconds) as there is a factor we forgoten to take into consideration: the time.

The "TimeCoef"

If the doctor says to you "If you don't stop doing this or that, in 80 years may be you will die because of it", would you panic? Nopes. 80 years is way too much for me to care about a possible lost I could have in the future. We would dismiss the alarm and go on with our lives.

But if it says " ... then you will probably die in two months", all the alarms fires in your mind, and you promise to your self to avoid "this or that" as if it were the only important thing in life.

The "GainCoef" is always zero, as your health drops to 0 and you die, but how far in the future it occours makes it more or less important to you. We need a "TimeCoef" to make the "GainCoef" fade away in importance as it happends more and more into the future.

To construct this "TimeCoef" I defined two parameters that controls how near in time it has to happend to start being important tom me, the "ReactionTime" in seconds, and a second "Urgency" factor that controls how fast the alarm then grows as it happends nearest to your present in time.

The following graphic shows a "TimeCoef" plot for a "ReactionTime" of 5 seconds and "Urgency" factors of 1 (blue line) and 2 (purple line), while the green line represent the "FinalCoef" for Urgency=2 applied to a lost of 0.4 (you lost 40%):



Lets start with the blue line, it represent how the "TimeCoef" varies, for a "ReactionTime" of 5 seconds and an "Urgency" of 1, as the loose occurs in different points in time (time is presented in the X axis, where 0 is the initial point where the future started).

"TimeCoef" is 1 for times longer than the "ReactionTime" because if the lost occurs more than 5 seconds far in the future, I will not care about it, so the coef need to be 1 not to change the enjoy feeling we could had.

For times smallers that 5 s. the "TimeCoef" linealy drops to 0, meaning alarm grows linealy as it happends near to the present time.

So in this case ("Urgency"=1) the "TimeCoef" will varies with time t with:

TimeCoef = Min(1, t/ReactionTime)


If "Urgency" were 2, then the lineal growth would convert into a X² growth. As number are smaller than 1, we are lowering down the line, meanig the alarm will initialy grow faster, as you see on the purple line.

If "Urgency" were set to 1/2 the line would had been above the blue line, so we would get the opposite efect: the alarm will initially grow slowly, and only when it happends really near to the present, it gets really scaring.


So for any "Urgency" value, the complete "TimeCoef" formula is:

TimeCoef = Power( Min(1, t/ReactionTime), Urgency)

Mixing them into the "FinalCoef"

Finally, we mix both "GainCoef" and "TimeCoef" as in the green line:

FinalCoef = GainCoef  * TimeCoef + (1 - GainCoef)

Where (1 - GainCoef) represent the 60% you didn't lost.

The final "emotional" formula

 We now have all the ingredients and the way to mix them together to score a future ina completely "emotional" way:

If we have 2 goals G1 and G2, and each goal has 3 params for the 3 basic feelings called G.Enjoy, G.StepCoef and G.FinalCoef, then the future would score as this:

StepEnjoy = Sqrt( G1.Enjoy² + G2.Enjoy² ) * (G1.StepCoef * G2.StepCoef)
Future.Enjoy = Sum for all steps( StepEnjoy ) 

Future.Score = Future.Enjoy * (G1.FinalCoef * G2.FinalCoef)


That was all, this gives you an "emotional" way to socre futures, so if you had 2 options, now you can trace 100 futures for each, discard repeated ones, and score each option with

Option.Score = Sum for all different futures( Future.Score )

And, the AI desicion will be:

Decision = Sum for all option( Option.Vector * Option.Score )

Where Option.Vector contains the changes you does to all the degrees of freedom, or joysticks.

All is done then?

Apart for a post I need to add about how to model the joystick itself , yes, we are done with the emotional intelligence: we have fully defined a complete layer of emotional intelligence.

But there is still a big step awaiting: multilayered emotional intelligence.

But it will be on some weeks from now, I need to take a little time to prepare a V2.0 of the demo app, my actual version needs to be cleanep up a little before it.

Emotions full power

In this post I just show you a couple of new videos using the full "emotional" model for the goals and also a new system to auto adjust the joystick sensitivity (I will comment on this on a future post, it is far more important that it seems).

Asteroid field

First video shows 6 rockets being streessed by a asteroid field (with 50 of them) randomly falling down and how the actual intelligence can deal with this without getting nervous at all (this is thanks to the new joystick model).

I created this simulation because I needed some visual way to judge how competent the agents are in hard/delicate/streessing situations. It was the third or fourth video of a serie, as it was almost imposible to make a rocket be hited by a rock using 10, 20 of 30 asteroids at the same time, so finally I tried with 50, and even then only one rocket get hited!

We need the algorithm to be solid rock and stable, so this kind of tests are of great interest to me.


It is really the most remarkable video I have produced this far.

Natural killers

The second video uses the same algorithm, but this time we have two "ninja rockets" that can use its thruster flames to burn other's energy out. If one rocket burn 50% of the other's energy, one half, 25%, is added to its own energy, as if the laser beam could bring in some energy.

Burning others score quite high on both (I use a "gain feeling" associated with the energy burned out on the other player) but red one likes blood twice as much as the white one, so as spected, red one finaly wins in 2:30.

But then it will notice that the broken white rocket still has some energy left, as it was able to take some before crashing from an energy drop (the green circles), so it comes back to finish the work.

This final attack (at 3:00) is really interesting, as the red one uses the walls to rotate as if it where in a action film.

After that, life goes on as usual, nothing interesting happend.


The videos were recorded using V1.5 of the software, but I will not post it until I reach V2.0 and all code is cleaned up and ready for the next big step: multilayered intelligence.

Monday, 27 October 2014

Mood feelings

This is the 5th post about "Emotional Intelligence", please visit the Introducing Emotional Intelligence, goal-less intelligence, enjoy feelings and fear feelings before you go on reading for a proper introduction on the subject.

Introduction

After discussing the simpliest feeling associated with a goal, the enjoy feeling and its counterpart, the fear feeling, and the way the are added to calculate the global "step enjoy" feeling after a agent change of state -or step- we are now going to start dealing with the enjoy modulators.

We will start with the "mood feelings", the simpliest and more evident form of enjoy changers, and then turn into the most strange ones, the gain and loose enjoy modulators.

Mood feelings


Imagine you are the agent, you are walking on the forest, at a given speed, experiencing the enjoy feeling of "I enjoy speeding" represented by the distance walked. But today your energy is quite low, and moving on consumes the very last pieces of energy in you. Obviously, you are not enjoying it very much.

Now you are plenty of energy, but you have a health problem with your knees, or a small peeble in your shoes, and this makes the walk much less enjoyable.

By considering that your health indicator, from 1 to 0, when gets lower, makes all the enjoy feelings lees enjoyable in this same factor, we can use it in our formula as being a coeficient that multiplies the global step enjoy feeling, making it higher when you are ok, and a little less when you are feeling sick.

If you add the energy as another factor that contributes to make the walk less enjoyable, then we end up withh a formula, based on the global enjoy feeling or "StepEnjoy" discused in the last post, like this:

StepScore = StepEnjoy * (product of all mood factors)

That in the case of a kart like the ones in the videos:

StepEnjoy = Sqrt(Raced² + (Energy*dt)² + (Health*dt)²)

And then:

StepScore = Sqrt(Raced² + (Energy*dt)² + (Health*dt)²) * (Energy * Health)


High feelings

In the two examples above, both moods had factors lower than 1, meaning it reprented something that can make your walk worst than usual, but as always, there is another side: if you have a mood with a factor higher than 1, it would mean you are modelling a "high feeling".

Imagine you want your two agents to try to walk together. The most efficient way to do it is by adding a mood feeling that use the distance between them to compute a factor that is higher than one when they are near, and lower than 1 when they are apart.

TogheterFactor = 10 / (1+distance)

It will score higher than one when you are nearer than 9 units, making it more enjoyable than not walking near, so they will naturally tend to walk together. Adding one was just to avoid zero divisions, dirty minds!

So you can add a new goal, the "together" goal, and model it as having enjoy feeling of zero and a mood factor of 10/(1+distance) and make the agent to love wlaking together.

Using a goal with zero enjoy feeling is not the best option, in this case we used it just to show how to build a goal that only affects the mood, but in my implementation, enjoy feeling is distance*dt and the mood factor is 10/(1+distance).

Example

Those kind of feelings are new to the emotional engine, so there are not previous posts about it or videos, but I have filmed one that shows mood feelings in action.  You will notice how sutile some of the movements are when compared to the examples using goal-less inteligence or even the video showing one enjoy feeling (the distance raced example).


The agents in the video also bost some of the newer "loose feelings" to avoid near crashes, but for the most of the time, the enjoy feelings and the mood factors do all the work.

Fear feelings

This is the 4th post about "Emotional Intelligence", please visit the Introducing Emotional Intelligence, goal-less intelligence and enjoy feelings before you go on reading for a proper introduction on the subject.

Introduction

As commented in Introducing Emotional Itelligence post, the goals, when are defined usign "feelings" toy models, have only three scoring parameters, three kind of emotional "outputs".

The first of them correspond to things you "enjoy" experiencing, like speeding where you enjoyed velocity. Enjoy feelings are basically added together into a general "enjoy feeling" score after each movement the agent does.

But all the three components of a goal, each kind of basic feeling (enjoy, moods and gains) has a reverse, a negative counterpart you need to know and properly manage in your algorithm.

Fear feelings


The dark side of the enjoy feelings are the "fear feelings" or "hate feelings" (they are basically the same) and matematically correspond to goals with a negative fixed value for "enjoy feeling". You can read more on them in this old post about negative goals experiments.

Whatch this video of agents with stronger or lighter negative scorings (crashing and lossing any amount of health is simulated by stoping the agent after a crash -so it doesn't change its ending point- and modeled as a negative enjoy feeling in the actual "emotional" languaje):


Plase note karts are difficult to "suicide" agents. This same case, when applied to rockets, can make them decide to actively crash into the ground in stressing situations as when energy is depleting and it is modeled with negative feelings.

Thecnically, the first thing to note about negative enjoy or "fears" is that they totally break the logic of the algorithm in a couple of basic senses:

1) If we are measuring some kind of entropy gain, having a negative value means your entropy can decrease some times. It is not an entropy of a system, or your system is quantic, and this method can apply to quantum systems bacause second law of thermodinamics can't be applied neither.

2) We totally meshed up the metric we had in the state space. Negative distances are not allowed anywhere.


The results are worst than you could foresee. As in a real psicology, two efects can be easily detected:

A) Being driven by fears means that, in some cases, the fear will make the agent to stop, being unable to decide what to do.

In the algortihm, the agent can find out that moving to anywhere is just too "scaring". The fear feelings are ginving you big negative scores that, when added with the positive ones that could exists, still gives a negative global enjoy feeling. So all options will score negatively.

As staying still, away from the scaring futures you can envision, is not as bad (negative) as moving, finally the option "do nothing" scores the most and is predominant.

B) If the previous case happends when you have a way to suicide, you will do.

When an agent is surrounded by things that score negative, turning around and crashing with the ground is the best option by far, as it always scores zero (you die), so the intelligence will happily suicide.

Even small negative feelings must be avoided. When facing the sufficient desperate situation, this negativiness will be almost all it has, and it means limit situations will make the fear to win and it will commit an nice and "intelligently planed" suicide.

Many videos showing silly ways to crash where produced and then discarded as erroneus changes in the code, but indeed they were only fears being introduced into the scoring feelings. I don't use negative enjoy feelings any more, only for "educational purpouses", as they produce neurotic and suicide intelligences.

How to code them


If you still want to use them, you need to deal with negatives. I do like this in my code (but don't use it, so may be is not the right solution).

Remember we used before as the global enjoy feeling the squeare root of the sum of the squared enjoys, like in the euclidean distance you use d = sqrt(dx²+dy²), but now dx² can be negative.

Mathematically there is very suggestive way to think of it (call it a joke if you want): if the enjoy feeling dx² is negative, then the quantity being measuring, dx, must be and imaginary quantity!

Being just a mathematical joke, entropically and psicologically accurately reflects the situation: you are imaging something with a negative growth of entropy, so it is only possible in your imagination, not in real life. And psicologically, you are irrationally running away from a danger that doesn't really exists, it is purely imaginary. Take it as only a naming curiosity.

So we need to operate those negative enjoys as being pure imaginary numbers, so when squared, they turn into negative numbers.

It means (dx²+dy²) in the distance exampe, can be negative, and squared root of negatives are not possible. Well, only if you consider this sum as being, again, an pure imaginary number instead of a real negative one.

Using code and assuming sqrt() is the squared root, abs() the absolute value and sign() is -1 for negatives and 1 for positives, then you should first sum for all goals:

Sum = sign(enjoy) * sqr(abs(enjoy))

And finally define the global enjoy feeling again as:

GlobalEnjoy:= sign(Sum) * sqrt(Sum)

Why do the negative feelings even exists then?


I suppose they where neccesary in natural evolution of life, when the intelligences the agents had was too limited. If you can not foresee what will happend if you do that, then a simple strategy is adding a fear felling associated with unknow things.

Think of cocroaches: they can't harm you, but you desperately hate having one near. The "fear" is profund, you can't scape it, because it is an ancient instinc: avoid them because they run fast and randomly, and you can not know if they will be on your leg in half a second. Fear for uncertainly is the simpliest solution.

So may be they where neccesary once, and may be they still saves lives, but for an artificial intelligence it is much better if we discard this early beta of feelings and focus on the nice working ones.

Saturday, 25 October 2014

Enjoy feelings

This is the 3rd post about "Emotional Intelligence", please visit the Introducing Emotional Intelligence and goal-less intelligence before you go on reading for a proper introduction on the subject.

Enjoy feelings

Once I had this simulation with the goal-less algortihm working I wanted to go further. The kart was really driving it quite nicely, but it clearly was not optimal.

Why? The idea was so simple and powerful it was not clear the problem at a first glipse.

Trying to improve


I tried to use longer and longer futures with bad efects. Also incremented greatly the number of futures calculated, but it just showed a marginal gain. This was not the root of the problem.

The real problem was with the scoring of the futures. Being always one was not fair. In some futures the kart crashed quite near the starting point, while other times the kart was able to safely go far away. You can't say both futures are the same thing for you, it had to be some oversimplification.

I decided to try with the more evident candidate for me: the length raced on each future would be the score, so we no longer use N = "number of different futures" to score an option, instead we use the sum of the distance raced on each future.

With this new option scoring eschema, the intelligence recieved a great boost, as you can see in the following video, were the old goal-less intelligence is clearly outperformed by two of the new models (one boosting the new "distance raced" as score, and the orther one, with the squared distance raced).


Something incredible happened: now the agents seems to just like speeding, so they behave and drive more agresively. The one with the squared distance was even more agressive and finally wins, it races more distance in general, but also it had a tendency to being a little too imprudent some times.

In retrospective, this test was a perfect succes and I could not make it any better today. I chosed the correct formula -distance raced without the square- for the task (but first I tried hundres of others, I confess) for several technical reason:

1) Distance raced is a real way to measure the entropy of a moving particle, as the kart is.

The entropy a particle has when you consider a gas, can be aproximated with its linear momentum v*m, so a path integral of this momentum over the future's path (a red or blue line in the videos showing futures), the path integral of v*m*dt, would be a perfect candidate to assign an entropy to the path of a future.

But m is a constant in all my futures so I can safely discard it, we will normalice it after all so it desn't make any difference, and v*dt = raced distance, so we are integrating the distances raced on each time step. That is why the distance raced is the correct way to give a moving particle some form of entropy gain aproximation.

Depending on how you do the path integral, integrating over dt or over dx (the lenth of the delta of path at each step) you will end up with the distance raced, or with the squared version of it. Both are similar ways to compute a real entropy, you only change the physic model for witch you calculate the clasic entropy.

2) Using a real distance to score the future is equivalent to have a real metric in the state space of the system. It also applies to the squared distance raced of the third winning kart.

If you define the distance from state A to state B as the minimum distance raced by a future starting in A and ending in B, you have a real metric on the space of all possible states of the system.

Enjoy feelings

The fact that the score we calculate at each step is in the from v*dt is quite important to understand how we were quitely introducing "feelings" in the mix.

We wanted the agent to love speeding, and we ended up using as score the "speed" you are experincing at each moment, multiplied by the amount of time you enjoy it, "dt".

Enjoy feelings represent anything the system, in some relaxed sense, enjoy experiencing. Something possitive that accumulates with time, and that you can't loose, like in the distance raced.

You will need to have enought "enjoy feelings" in your simulations, as the emotional intelligence desperately need them to even work. Having a enjoy feeling of zero means the agent will stop deciding and freeze forever. It is dead. It will only follow the physics laws in the simulation from now.

Other examples

Luckily, all other goals you would need to add to your intelligence will allways have a "enjoy" feeling associated. It is a must, and you, as the "dessigner" of this intelligence, have to find the positive on it, the "bright side".

So the golden rule here will be: never add a goal without a enjoy feeling associated.

For instance, a goal created to avoid damage on the rocket, the "take care of health" goal, will have a enjoy feeling associated with your actual health (from 1 to 0), as if being healthy, per se, was a way of enjoying as valid as speeding was for a kart pilot.

In this case, just by having a 50% of health in a given time point, you add in this step of the future an enjoy score of  0.5*dt, meaning you not only enjoyed racing at 200 km/h for some dt seconds with v*dt, you also enjoyed the health you have, multiplied againg by the time you enjoyed it.

I have always ended up determining one "thing" you enjoy associated with the goal or motivation I needed to model, then assume the agent was enjoying it for a delta of time.

The goal "take care of your energy" is quite similar. In this case, the thing you enjoy is "having energy", and the energy level (from 1 to 0) so your enjoy is energy*dt.

Another "only enjoy" feeling I use is "get drops" and "store drops". When the rockets take energy from drops to the storages, they are really enjoying it as much as speeding. In this case enjoy = energy transmited = speed of the energy transmision * dt. You enjoy the "speed" of the transmision, not the energy transmited, as you need to use *dt some how in your formula.

Note: The scale of each feeling have to be manually adjusted in this actual implementation. I used as the unit how much you enjoy racing the size of you body. With this in mind I judged it was fair to use h and e from 0 to 1 as its enjoy feeling value. As you later can set a "strength" of the feeling in the imlementation, this scale is not fixed and can be adjusted in real time. This emotional intelligence is not able to auto adjust the feeling scales -or stregths- to get a better mix, I manually adjust it before every simulation. This will be elegantly addressed in the next version of the model, the "layered" model I am currently working on.

Could exists enjoy feeligs without "*dt" at the end? No. If you try it, may be you will be able to adjust it to something usefull. But if now you switch the delta time from 0.1 s. into a finer 0.01 s., the effect is that your usefull goal now weigths x100 compared to all the other goals that used dt in the formulation. Being so higly dependent on small changes in the delta of time makes it a bad idea to add it to the mix.

Scoring a future with several enjoy feelings

A kart pilot with only one enjoy feeling was a simplistic case. In general, we need to deal with agents that have a big number of them, so we need to know how to actually combine them into a single score.

The answer was in the speeding goal we already used.

Remember we used v*dt as the enjoy feeling. But v is a compound of two vector components, v = vx + vy. We could have had two goals instead of one and still get the same intelligence, so both ways have to be equivalent.

As the value for v is sqrt(vx²+vy²) where sqrt() is the squared root-, then, if we add the enjoy feeling for health (h) and enjoy feeling for energy (e), the total enjoy feeling should be:

Enjoy = sqrt(vx² + vy² + h² + e²) * dt = sqrt(v² + h² + e²) * dt

This is the way I add all my enjoy feelings, one from each goal (as they allways have some positive enjoy feeling associated), to get the enjoy feeling (named "Points" in the code) corresponding to every step on the future, by computing it after each state change.

By the way, this makes the mixing of enjoy feelings to always be a real metric over the state space as we mentioned early, no matter how many enjoys you add.

Will it need more feelings?


Enjoy feelings accounts for all good things you can detect while you are imagining a future. It would be enought in a world with no dangers, no hunger, no way to harm yourself and die, no enemies... it is not the kind of world we need to simulate, we need the intelligence to cope with dangers, with batteries that drain, with bodies that can be broken, with others agent that will compite.

We are going to need more than just enjoy feelings if we are out of candy land.

In the next tree posts of this serie we will deal with different ways to cope with this danger, and how to mix them to get a realistic model of how we feel and react to a danger, as the possibility to loose something, being it the health, the energy... or money, and how it modulates the enjoy feelings.

Wednesday, 22 October 2014

Goal less intelligence

This is the 2nd post about "Emotional Intelligence", please visit the Introducing Emotional Intelligence before you go on reading for a proper introduction on the subject.

Goal less intelligences

In my first post I already commented on the internals of the simpliest entropic intelligence possible, one that scores all the futures as 1. If you haven't read it and want to know in more detail this case, you can visit the link first. Anyhow, I will try to summarize the basic working of tis model again.

We have a kart as our system, a simple physic simulation code is able to answer the question "where will you be in 0.1 seconds if now you are here".

We also have a degree of freedom as part of our system: we can move the driving wheel left or right at any moment by "pushing" a imaginary joystick left of right.

We will call it "our options", so in this simpliest case, we only have two of them: left and right. The kart engines are always on, there is no way to break or slow down, so our options consist on a single number -the force you have to apply to the imaginary joystick- but in general, it will consist in a vector of N numbers, where N is the number of degrees of freedom -or joysticks, in my previous example- so think in an option as a vector containing the "push" you need to apply to the N joysticks to make the agent evolve "intelligently".

With just that information, this algorithm is able to tell you how strong you should push this joystick at any moment to get a "intelligent behaviour" on the kart. What this will mean in each case is out of your control, the intelligence is "goal less".

I call it a "common sense" or "goal less" intelligence.

Watch this video to see this implementation in action:


Some technical details: The video shows a simple kart simulation with two degrees of freedom, the acceleator and the left-rigth control. For each one I use four options, that in the case of left-rigth control, could be: -15, -5, +5 and +15 as it worked much better than the simplified two-options counterpart.

The algorithm in detail

I will detail the algortihm in this basic form using this image captured from the real application (V1.1) as my test case:



1) Take the first available option, in this case, "push the joystick left". We are going to "score" this option so we can later compare with the second option.

Everything related to the first option "go left" is painted on blue in the above image. For the "go rigth" option red is used. This way you can clearly separate what each option use as futures.

2) Take your actual position (or "state") and use the simulation to know where you will be after 0.1s if you push the joystick as this option dictates. It will give you the state where you will be after taking this decision.

In the image above, this would be the origin of all blue lines, actually under the kart body. For the second option, going rigth, it is the origin of all the red lines.

3) From this "option initial position", imagine a 5 seconds long random future, by iterating steps of 0.1 second long. In all those 0.1s steps, the degree of freedom will take random values in a given range, in this case from -5 to +5 units of force, so the joystiks are randomly pushed left or right in every new step, until you reach you time horyzon, how many seconds in the future you want to "think". In my first tests it was 5 seconds, so at a 0.1 second steps, it will take 500 steps until you arrive to the future's end point.

This step woud draw a blue or red line, depending on the option you were scoring.

4) Take this end point coordinates, round them to a given precision (in my case I initially used 10 pixel units), and add it to the option's list of different futures found (a list of vectors containing the ending point of all different futures found this far). If this ending point is already in the list, just discard it.

In the image above, it correspond to the blue circles for the left option, and red circles for the right option. The radius doesn't mean anything, as they all score as one in this method.

5) Repeat the process fom 3) to imagine a new possible future untill you try a fixed number of futures. In my case I used 100 futures to try, but a biger number like 500 works better.

This would make all 100 blue dots appear one by one on the first option, and all the reds on the second option.

6) Score the option with the number of different futures you found, N.

If you imagine the grid size used to round the final future positions as representing "tiles" on the track, as in a grid, the this N is actually measuring the area of the surface "touched" by the blue circles.

The tile area itself -the squared grid size- is not counted as we only need to compare the blue area with the red area.

It also means we are scoring each future with a simple one, and using N to score the option becasue it is the sum of all those ones. Options always score by summing all its futures' scores.

7) Repeat with the next option available (in this case "turn rigth") from point 2) until all options have been scored.

Now you have the blue and red areas measured.

8) Now we normalize all the options score so they sum 1: start by locating the smaller score and substract it to all the options scores, so now the smaller one is exactly zero. Now divide them by its sum so they sum 1. You have converted the option's scores into weigths you can use to average.

In the image, it would mean you first find the smaller area -the  blue one is smaller this time- and take it from all the areas, as if they were shrinking until one dries completely. Finally, one area is zero and the others are >= 0.

Note: This step can seems too dramatic a change. It is in this case, but when you play with 8 or more options, doing this will make intelligence faster deciding among similar options. Imagine the kart has a Y shaped junction in front. Going left or rigth is almost equally scoring, but it you do nothing and go straight, you will crash. The sooner you decide one path, the better, so in general it makes intelligence "sharper". In a case we were not dealing with such a "organic" agent, may be disconnecting this would make the algortihm more "neutral".

9) Then, the Intelligent decision is the averaged value of your options -remember they were just vectors with one component per free param- weigthened with the scores we normalized in 8).

You are graphically comparing blue and red areas in the inintial example image. If you see more blue dots, you should go left.

Now you have your intelligent decision, it is a vector and each component is a force you need to apply to a joystick at each step, so you end now by simulating, on screen, this decision:

10) We are back to reality instead of imagining a future. In my case, I switched between two posible states an agent can have in my implementation: a "real state" showing where you actually are on screen, and a secondary "imaginary state" I use in the simulation while the agent is imagining a future (there is an internal flag for that in the agent's code).

11) Push the joysticks with the forces contained in the "intelligent_decision" vector. Pushing the joysticks change the state (now the "real state", we are not imagining a future) of the agent -the joystick position changed- so when you ask the simulation "where will I be in 0.1s from this initial state", the anwer will be the next real position of the agent in your simulation, the one showing next on screen.

12) Here you refresh the screen with the new positions -or states- of the agents, using the "real states".

We could also draw the futures lines in the step 3 to see "what this agent is thinking before deciding", in the exe you can switch this on or off and in the video above it was "on", so you could see these blue and red lines create in real time as the agent is pondering its options.

This was the exact moment the image above was taken!

13) You ended a frame on the film. The agent are now in a differnt position. Take those positions as initial positions, and go back to step 1. You have a video of an agent moving around "intelligently".

At this point, you have produced a video like the one I showed at the beggining. Watch it again and try to follow the kart's logic when it decides to make a turn or the other just based on the number of blue and red dots.

Remarks


Please take those remarks into consideration if you plan to really code it:

-Using N (number of different futures of each option) as the option score is a simplistic way to approximate entropy. It doesn't matter too much because we will normalize the scores so the scale is not important. Basically, you are assigning a score of one to all of the futures.

-Instead of N you should try to calculate the actual probability of each of the different futures by counting the "hits" this ending point received -how many futures ended on this rounded position- and normalizing those hits by dividing by 100 (as you imagined 100 futures, the sum of the hits is 100).

OptionScore = Sum for all different futures of (p*Log(p))
With p = probability of this future = hits/100.

-Preselecting the grid size proved to be tricky. In some situations it is better to use a small grid size (when you are in a difficult situation) but other times a bigger size works better, depending on how "close" or "open" is the space you are moving on.Using some kind of heuristic is better than using a fixed grid size.

Forewords


Along with the algortihm itslef, you should consider those two concepts before continuing:

1) The step in witch we deleted the futures with similar ending points was the moment the algortihm became "entropic", as this mean we are using some kind of entropy. It is key for the algortihm to work, so it is always untouched on every version of the algortihm.

2) The moment we decided to use N as the option score we was really saying "all the futures score the same for me". This made the algorithm "goal less" and the resulting psicology was "no psicology". By changing this score from one into something more meaninful is the way to radically improve the resulting AI.

3) If you use Sum(p*Log(p)) instead of N as commented on the remarks, you will be using the best goal-less entropic intelligence available. But all the futures still score "one hit" so the resulting intelligence remain a goal-less and is psicologically flat.

Please also note being "goal-less" doesn't mean not begin capable of incredible things. If you watch the video once again, you well notice how well the kart does it in quite a sliding track!

If you were to let this AI drive your real RC helicopter, it would do it correctly at the first try and keep it up and safely running on a changing environment witch wind changes. It is the ideal autopilot for any gadget or real system as far as you can give it a rought approximiation of a simulator. Quite remarcable in my opinion.

From this point on, everything we will do is change this score of 1 with some different functions and discuss the consecuences.

The final goal of this version 2 posts serie is to change the one with a nice working function based on toy models of real feelings that we supposed the agent has.

Tuesday, 21 October 2014

First "emotional" video

My code is not still "fully emotional" this far, some cases are not still used and others lack more testing, but I am ready to produce my first video where goals are considered in this new "emotional" way.

The video just shows the old test case of a set of agents -karts in this case- moving around, where the must collect drops and then deploy it on squared containers to get a big reward, but this time they are rockets inside a cavern and follow the goals in a fully "emotional way".

The changes are not totally evident in this case, the task is too simple to make a great difference, surely I need to find more challenging scenarios for the next videos. But you will still notice the big step in the small details: how actively the pursue theirs goals and how efficiently they do it.


You will notice the rockets has changed. Before this, there were a couple of gauges showing you the energy and health levels, but they were quite distracting. Now the energy and the health levels are represented as triangles painted on the left and right sides of the rocket bodies.

Once the three energy containers have been filled, the main goal is no longer active and they change into a more borring strategy of just hover around and land to refill energy. I could had avoided it be rising the strength of the "get drops" and "love speeding" goals, or lowering the "care about energy" one, but my main goal was to show a set of agents with a strong tendency to do sometinhg -collect energy- but still being capable of keep healthy by correctly using their "dont loose your health" warning feeling.

I will post new videos as soon as I can produce nice test cases of all possible "feelings" combinations, but at the present moment, I still need to retune some old parts of the code that are not working with this new model (in particular, there are a couple of mechanism to auto adjust internal parameters of the AI -the grid size for detecting similar futures and the sensibility of the joysticks- that I miss a lot as they improve intelligence and stability at no cost).

Monday, 20 October 2014

Introducing Emotional Intelligence

In the actual entropic intelligence algorithm, the scoring you assign to each different future you imagine is the golden key, as they determine how the options you are considering will compare to each other, ultimately defining the resulting "psicology" of the agent that makes it behaves one way or the other.

These future scorings are made up by adding the effects of a set of different "motivations" or goals the agent has, like in "I love speeding", "I care about energy" or "I care about health", measured over the future's path, step by step, like in a path integral aproximation.

Being able to define the right motivations set for an agent, along with a proper way to calculate the different effects those motivation could have on every step the agent takes, and mix them together to get the correct future's score, is ultimately what I am looking for and the life motiv of this blog.

I have used quite a big number of simple goal schemas to simulate some interesting intelligent behaviours, like the ones previously presented on this blog, but I am far from happy with them.

Basically they failed to show me the real optimum behaviour I was specting from them. Some had weird problems on limit situations, like if you are running out of fuel and out of energy at the same time. But there was also an ugly limitation on the length of the futures it was able to handle that really made them not so generally usable.

In the try-and-error process, I could found some interesting patterns on the goal scoring schemas that worked better: they always avoided the possibility of negative scores, and most of them defined a real metric on the space of states of the system (when you assign a score to a future connecting the initial and final states of the system, you are inducing a metric on the state space, one that tells you how interesting a future looks for your agent).

In some point of the procces, I felt that the key idea for going from a goal less inteligence, based on a pure physical principle of "future entropy production maximization" (as described in my first posts or in the Alexander Wisner Gross' paper) to a stable and rich human-like intelligence was trying with some realistic modeling of the "feelings" themselves and how they affect our own internal scoring systems, and then try to base everything else on them.

Plase note that, when I name some different parts of the involved equations like actual feelings, they are representing perfectly defined mathematical functions, not any kind of pseudo scientific concepts. It just seemed to me that the parts I needed to use in the mathematical mix were incredible similar (in the form and the effects on the resulting behaviours) with concepts usually related to human psicology. Over the time I have naturally turned into naming them as "enjoy", "fear" or "moods". Take this as a pure mnemotecnic trick or as a clue of some deeper connection, anyhow it will help you to better visualize the algorithm.

The introduction of those "feeling" models supposes a bost in the different "motivations" I am able to directly simulate now. It is quite easy now to model all basic goals I was using before and they work much better, but it also allowed me to model new kind of interesting and useful motivations.

Before going on, here you have a video showing the full potential of the emotional intelligence in streesing situations, like 50 asteroids falling over your head:




In the following posts I will recapitulate the actual working of this new "emotional" version 2.0 of the entropic intelligence algorithm in full detail. I will not mention things that didn't work in the past (they have been deleted from the V2.0 code to make it clearer) and follow the straight line between my first simplier model and the actual "emotional" one.

The topics of those posts are:

Common sense
We will examine the initial form of the AI that corresponds to Alex Wisner Gross' paper. The psicology it represent is no psicology at all, just pure "common sense" in action, and correspond with the case where all futures score just one.

Enjoy feelings
We will jump to a much better version of the AI where the distance raced on a future was the key. We will define it as a "enjoy" feeling and discuss the correct from it should be calculated to have an "entropic sense". I will then comment on some other possible examples of "enjoy feelings" and how to mix up several "enjoy feelings" on the future's final score.

Fear/Hate feelings
They correspond to negative enjoy feelings and represent the irrational fears as opposed to the ones based on real risks. It is generally a bad idea to add them into the mix, as the resulting psicology will include hidden suicide tendecies and, on some situations, the AI will panic and block. They will also negatively affect the quality of the resulting metric on the state space of the system, so I have actually banned them from my implementation, even if they are correctly used if you define such a goal.

The mood feelings
Mood feelings change the "enjoy" scoring you calculated from the exisiting mix of "enjoy feelings" by multiplying it by a factor. If it is lower that one it will correspond to "negative reinforcement" like when you are walking quite fast but you can't fully enjoy the speed because you have a peeble on one shoe or you are really exahusted. In the other hand, when it is bigger than one, it is modeling a "positive reinforcement", like when you are an intrepid explorer walking into the unknow or you are in love and prefer walking along with your beloved one.

Loose feelings
Losing your life can be scaring and must be avoided some how, but if it will happend 100 years in your future you don't really have to worry about it. Loose feelings are weird, their effects fade out in the future and apparently break some laws about entropy and metrics (all fears do after all), but they are really needed -by now- if you are serious concerned about the agent's IQs.

Gain feelings
They are opposed to the loose feelings and correspond to great hings that could happend to you at some point in the future. Like in the loose feelings, the importance of the gain tend to zero as it happend in a more distant point in the future. They can simulate the effect of landing a damaged rocket to have it repaired and fill its health up to 100%, or model a rewarding feeling when you avoid other player's dead, for instance.

This will close my presentation of the version 2 of the entropic intelligence, the emotional entropic intelligence. In some point I will release a version 2.0 (note from the future: I did it on 1st december 2014) of the app and its source code, internally revamped to reflect this emotional model and the new namings, with examples of motivations based on all those new feelings.

There will be a future version 3 in witch the three motivations' parameters (the "strength" of the feeling, for instance, I will discuss them on the "loose feeling" post) will be automatically adjusted to the optimum values for each situation in real time, boosting perfomance in a completely new way (at a computational cost) if they behave as I spect.

Friday, 19 September 2014

Adding evolution to the mix

My second experiment was about using a simple evolutive algorithm to fine adjust the goal strengths in order to get the best possible set of strengths given the environment you place the players in.

I added a "magic" check box on the app so you can switch on this evolution, then add new players if desired, goals, and let the population grow and the best adapted selected over time.

A set of changes take place to the simulation when you turn this on:

-Players that get a 100% of energy automatically duplicate them selves, dividing available energy into halves so each one has 50%. The new born copy its progetinor's goals strengths and then changed those strength a given random amount, let say in the range -10/+10%.

-Rockets are not allowed to recharge its energy by landing. They can land, but energy will not fill up anymore. They need to get drops to survive, as intended.

-Karts are the hunters. They can't take energy drops, instead they can hunt the rockets to take their energies into its own battery. If rockets run out of energy they will die after crashing somewhere, and then they will get smaller and smaller until they shrink away, leaving room for newcomers. Karts will still be able to suck energy from the died rockets, so karts will jump over dead rockets as zombies.

Thats all, only this took to make things evolve over time, but it is difficult to see the evolution happend just by watching the game on the screen, so I also added a couple of graphic changes to make it clearer:

-Players get theirs colours depending on the first goals strengths (the most important ones, runing, energy and health) so green ones are different from red ones, but different tones of green makes quite similar players. Just looking the screen you see the predominant color of the population and how it change over time.

-A new graph upper left shows the "ADN" of the population: Each player draws a horizontal line on his own player color showing the strenghts of all its goals. This looks like a mountain, with peaks on the stronger goals and valleys on the weake ones.

That said, the video itself is quite long and depressingly boring, I suggest you skip to different points on the time line of the video to see how the new graphic shows different colonies of similar players evolve, dissapear over jump to another color.

Surely the rules were not fair for the rockets as eventually the last one is hunted down by a miriad of hungry karts. Well, it made the video shorter anyway!



What could this approach be good for?

Basically it is way to fine tune your goals strength before going into "production". You could think of that as a way to choose the right setup of the pilot, instead of the kart's physical setup, but it could be widened to make other params, like the kart size, power, etc. to get the best kart configuration for a given track.

In my opinion it is just a strange and slow way to do things. I wanted to make a try on it, but this kind of work of fine tunning params can be made way more efficiently and real time instead of using precalculated params, using a second layer (as I described in my last post Follow my orders!).

There is an scenario in witch this evolution approach could be quite powerfull: evolving the player design itself.

By doing so we will need to make decisions like "Add a new thruster to the rocket in that position, angle and with this power". As you can see, there are yes/no decisions involved, like adding or not a new thruster, and this is not sometinhg a second layer will be able to do easily, but an evolutive phase would.

I am not quite interested on those possibilities, but may be I will come back later to it and try to make a spider evolve into some other more adapted insect by adding or taking legs, wings, etc. I can't promisse I will finally do it. 


Follow my orders!

After some months without working in this algorithm, I am back with some new ideas to code, but before this, I want to show you a couple of experiments I made before the summer break.

First one, showed at the talk in Miguel Hernandez University in an early stage, is just a concept test: could this intelligence be used to drive a vehicle without effort and safely, but following you directions in real time?

Imagine a real car using this algorithm to drive you anywhere, you can let it drive for you as in a google car, but with an added hability of a "semi-autonomous mode".

In this mode, you could let know the car where you want to go just by pointing in a map, or by moving a fake driving wheel inside the car. The method is not important, but the resulting behaviuor is potentially interesting: your 5 years old kid could take this car, point with is finger the direction he wants to go, and the car will drive him safely around the city.

How could it work? Imagine your kid is reaching a cross-road full of traffic, he points with his small finger to the road on his left, as he knows grandma home is over there. The car will try to go this way but avoiding collisions, red lights, pedestrians and so on. It will for instance stop, wait for a clear moment, and carefully turn left as the kid pointed, but safely and following all the "rules".

As a prof of concept I added to the software -download V1.2 if you want to play with it- a new menu option "Follow the mouse" and another one "Avoid the mouse", useful for giving the car instructions on where not to go (candy shop, for instance). As a result, all the players on the track will do their best to follow your mouse orders.

As you can see at some point in the video, the players will even go the long way if it is neccesary, meaning the kid could point to a forbiden road and the car will still understand it and go around the block and get to the point you defined in the right driving direction.


I also added this "Follow the mouse" feature as a early fake test of a possible "second layer" of intelligence. The mouse position X and Y could be a goal parameter instead of a ad-hoc added menu option.

I could define a new kind of goal called "FollowMyFinger" with X and Y as parameters. The second layer of intelligence is clearly seen in action when applied to those two parameters.

In the same way I used my brain to decide moving the mouse here or there, this second layer could test what would happed if I move this "atractor" point the left a little bit, and then let the game run for 60 seconds. On each delta time in the simulation, the "atractor" mouse point is moved again and again randomly, so think in the mouse as moving randomly while those 60 seconds runs.

After those 60 seconds of simulation, you have a future asociated with the firt move being to the left. Repeat another 99 times and you have 100 ending points for the "second layer" option "left", so you can again, like in level 1 I always used, discard similar ending ponts, count the remaining ones, and use this N to get an aproximation of the "second layer option" go left.

Repeat for right, up and down and you have all you need to decide where should you move the mouse at this instant.

This could be the second layer of intelligence changing params used in the first layer dinamically, if I had already coded it, that I haven't.

Once this mouse has been moved by second layer, the algortihm could follow with first layer as it is doing now: consider mouse static (they are params again on this first level, as in level two they where used as free params) and decide your move simulating 5 seonds in the future, as always.

Note the differences: First layer decide on cark movements using 5 seconds futures and considering mouse position as fixed params. Second layer is used before and decide about mouse movements, so mouse position is a couple of free params controlled by AI instead of fixed params, and the futures are calculated for 60 seconds using level 1 as part of the simulation itself.

This last point is crucial: if in first layer the kart was driving by mad monkeys, so more than 5 seconds was too optimistic, now you can think 60 seconds ahead while deciding where to move the mouse.

A more realistic example could be controlling one of the first layer goals strength with this second layer. The second layer could then change the kart behaviour from a conservative setup to a more agressive one if it is better in some point of the race.

Let say you need to pass two karts that run side by side, so this second layer would proyect 60 seconds to the future each of the two options (get more conservative, get more agressive) and get as a result that being a little more agressive gets more entropy (there are more different endings at 60 seconds, in some of them you don't pass them as when being conservative, but in another futures you get past them and they open new possibilities).

This power that a second layer adds without any additional work from you, just by using it to dinamically adjust the already existing goals of the players, is the first great advantage of a second layer, but a second  one could be even better if you can set a different goal set for this second layer.

This could open new posibilities like having a manager that cares about being able to run the next race also, not only winning or lossing this one, for instance.

Tuesday, 1 July 2014

Why do reductive goals exits?

Ten minutes ago I discovered what exactly "reductive" goals represent, or I think so.

As you know (other way, read the old post first) this entropic intelligence needs a simulation of a system to be able to work, but also, if you pretend it to make some hard work for you, you also need a set of "goals" that represent how much you earn when system travels from point A to point B.

Those goals I already talked about, could be categorised in "positive" goals, like earning points for the meters you run, or the energy you pick. Then we also needed "reductive" goals to make it work properly.

At first, they started being a value from 0 to 1 representing the "health" of the kart, so if I multiply the score (meters raced for instance) by the health coeficient, you got a value that made more sense: if you get 5 of energy, but crash and loose all your health, it scores 5x0, nothing.

This only coeficient, with a boost in the simulation engine that made it able to properly calculate bounces and the energy of the impacts, made possible that karts learned to avoid hard crash, as they take much health from you, lowering all the scorings of those ugly futures.

So it is clear they work pretty well, but if the positive goals represented "profits" of some kind, those coeficients were... the instincs?

Well, it made some sense, some way your reptilian brain tells you that crashing with a wall is a bad idea even if it makes you get 5 of energy. So I adopted this explanation, letting it excluded from the "entropic" part of the algorithm.

But wait, entropy gain can be anything from zero to... not infinity but unbounded, and living beings have a kind of entropy they never want to grow, at any cost: its internal entropy must be keept as low as possible in orther to continue living!

You have a really low internal entropy, if you compare with a pile of food of the same weight, for instance. Your internal temperature is constant, so this factor doesn't add much entropy, neither pH levels can change too much, or cardiac rithm, or glucose levels, any other level you could think of.

Your internal body is a piece of flesh really well ordered and controlled. So well placed everything that a doctor can open you and fix it as it knows where everything will be placed.

So internal entropy is something the AI have to keep really low in the game, and using the inverse of this internal entropy as a filter for all good things that could happed (the positive goals that represent the entropy you want to maximize) makes a lot of sense to me now.

The idea of using a positive goal multiplied by a factor from 0 to 1 representing health (or energy level) correspond to the function to be maximized being:

External entropy production / Internal entropy value

The health coeficient correspond to h = 1/Internal entropy value.

So both parts of the equation correspond to entropy, only that reductive part is about getting low the internal entropy value, and positive part was about getting high the external entropy.

Note I use the internal entropy "value" and not the "production", as production can be zero, and I only want to panic when the "value" of my health is getting much lower... and dividing by zero is deadly, you know, your whole universe collapses.

Finally, I am glad I could remove this ugly part of a general intelligence representing "instincs". I didn't want to create "hulligans" of any kind, so knowing it was just the inverse of a perfectly defined entropy makes me feel better. 

Sunday, 29 June 2014

I want to share an recent article about the video of the seminary I held by Francis Villatoro (@emulenews), one of the more important science blogger in spanish.

The original article is in plain spanish, but google does a nice work this time, so you can read it translated to english if you wish.

I can't be happier today!

Friday, 27 June 2014

Video: Seminary about entropic intelligence

Last May I held a little seminary (90 mins.) about this concept of "entropic intelligence" and how it could be used on optimizing and cooperative games at Miguel Hernandez University in Elche, Spain (UMH).

It was a talk in spanish, and youtube doesn't allow me to edit any subtitles, so don't trust even the automatic spanish subtitles, I had a look around and well, it was a big joke!


It is by far the best way to "catch up" with all the concepts presented on this blog!

Friday, 16 May 2014

Cooperating... or not.

Cooperating is quite easy in this framework: If you get 10 points in your score for a given future, then the other players will also get those 10 extra points. That simple.

So, if we all are cooperating on a goal, then we all share a common scoring for that goal (being it the sum of the scorings for this goal of all the players) no matter who exactly got each point.

In the case of a reductive goal, it is the same, all the players reduce their scoring with the reductive goal a single kart get, so again there is asingle reductive coeficient (multiply the reductive coefs. of all the players to get it) that is shared by all the players.

This last point is not free of troubles: If a players dies, its redcutive goal for health drops to zero, so my own scorings will be all... zero! So I lost the joy of living and let the rocket fall down to ground and break... uh! not so good to cooperate in those conditions!

The following video shows a group of players cooperating on all theirs goals. The effect is not much evident just because one layer of intelligence only simulates five seconds or so, and it is not long enough to really appreciate the difference. I hope the use of a second layer of AI (not implemented yet) will make it much more visible.


And what about competing for a goal? Well, almost as easy, but not really the same thing: when competing, you take the other's scoring from your own one, so you can end up having some futures with a negative scoring, and negative scorings use to make players a little too prone to suicide.

In the next video, just to make the "competition" a little more "visible", I added a nasty new goal: suck energy to others by firing your main truster into them!



I am not quite happy with neither of the two videos, they do contain the idea of cooperating or competing, but somehow the efect is not what I spected. I assume adding a second layer of AI will make them realize how bad it is to push another player to dead while cooperating (cooperating with the sucking rockects goal "on" didn't avoid some occasinal deaths, not so different from the "war" video as it should be).

By the way, all this is available on the last V1.1 of the application I uploaded some days ago.

Wednesday, 14 May 2014

A seminary on optimizing using entropic intelligence.

This past monday I hold a small seminary on the University Miguel Hernandez (UMH) of Elche about optimizing using this entropy based AI, in short there will be a nice video of it, but in spanish my friends (I will try to subtitle it to english if I have the right to do it on the video and the patience).

The note about the conference can be found here (again, the little abstract is in spanish, and google translate didn't work for this url, at least for me):

http://cio.umh.es/2014/05/07/conferencia-de-d-sergio-hernandez-cerezo.html

A google translation of the abstract, not so bad... once I have fixed some odd wordings:

Abstract:

Entropy is a key concept in physics, with an amazing potential and a relatively simple definition, but it is so difficult to calculate in practice that, apart from being a great help in theoretical discussions, not much real usage is possible.

Intelligence, on the other hand, is extremely difficult to define, at least in such a general way and specific enough to allow a direct and general conversion into any artificial intelligence algorithm.


A new approach to what defines "intelligence" or "intelligent behavior" points directly to the concept of entropy as ultimately responsible of it, and although the design and development of this idea on a theoretical level is somewhat complex, its application into algorithms for artificial intelligence turns out to be incredibly simple, making it a very promising approach even at this early stage of the idea.


The resulting intelligence is able to handle any kind of "system" that you can simulate in a "smart" way without the need of defining any specific goals.


Also it is  shown a way to manipulate the  formulation of entropy itslef in order to "implant" in the resulting intelligence a tendency to maximize any objective function of our choice, obtaining truly useful algorithms in the field of proccess optimization.


As an example, we will apply the above idea to create a simple algorithm capable of driving a simulated kart around unknow circuits on all types showing a very " close to optimal" behavior.


I will edit this post (and add a new one) once the video is available.

Finally the video is out, but I can't find a way to subtitle it, surely only the owner can.