How AI can explain how to change #46

Dec 08, 2023

Today I will explain how an AI system learns and uses it to explain the process of change.

Disclaimer - I will explain this in terms which hopefully everyone can follow.

Let’s begin…

One of the coolest projects I worked on was during my MSc. My final dissertation project explored the use of a type of machine learning called Reinforcement Learning.

The idea behind reinforcement learning is simple - Step 1) do a random behaviour, Step 2) find out if that was good or bad, Step 3) based on that feedback decide if to do more of that behaviour or less.

You can see this in the video below, where an AI system learns to play the Atari 2600 game called Breakout.

Here the AI system learns to play Breakout after 240 minutes of playing the game.

The interesting thing is, it didn’t know anything about the game before playing it.

It learns how to play the game through the process described above - Step 1) do a random behaviour, Step 2) find out if that was good or bad, Step 3) based on that feedback decide if to do more of that behaviour or less.

Now the absolute beauty of the system is this…

The components of the model don’t change for different games and there is no tailoring of the system to each Atari game.

This is referred to as ‘General Artificial Intelligence’ or ‘Artificial General Intelligence’.

It is like the holy grail of AI, or one of them at least.

Now in terms of components, it has three key parts…

Something for handling the vision/images (eyes)
Something for pattern matching (brain)
Something for doing an action (in this case a controller)

This can be seen below (the top image just shows the machine learning diagram for it, we’ll focus on the bottom image).

So let's use the bottom, more relatable, diagram as an example of how you learn to play Breakout.

You see the screen using your eyes, your brain processes that and you do something with the controller.

So, if you’ve never played the game before you’ll start by doing some random controls (up/down/left/right/fire).

Now if something good happens - you’ll remember that. You’ll remember - when the screen was like X then doing behaviour Y was good.

If something bad happens - you’ll remember that too. You’ll remember - when the screen was like X then doing behaviour Y was bad.

As time goes on, your brain will remember more and more of those patterns between screens X and behaviours Y.

You learn to play the game.

The AI system does the same, and in both cases, you and the AI system learn by the reward.

A simple AI brain

To understand what happens when ‘learning’ and how the brain stores those ‘learnings’ we can look at a machine learning/AI brain.

The image above shows a thing called an Artificial Neural Network (left) for our AI system and a representation of a real neural network (right), like in our brain.

In the Artificial Neural Network (left), the circles are the neurons and the lines are the connections to other neurons.

In a real brain/neuron, these connections would be similar to the dendrite and axon (see insert).

In both examples, the neurons fire when the combined signal of the connections is strong enough.

And it is at these connections where the learning takes place. Strong connections allow the neuron to fire, and weak connections do not fire.

So when playing the Breakout - if we get a reward then the connections contributing to that reward become stronger, and when it’s a negative reward the connections contributing to that negative reward become weaker.

In other words - we learn to do more of the good things and less of the bad things.

So, as shown in the original video, after 240 minutes of the AI system playing the game the AI system has learned the patterns between what it sees, and what behaviours result in the best rewards. But what actually happened was - some of the connections got stronger and some got weaker.

I want to learn something new

When the AI brain was created for the first time those connections were randomised. Some strong, some weak and some in the middle.

It played a game for 240 minutes and got good at it. It learned the patterns and made some connections between the neurons strong and others weak.

Now let’s say we gave it a new game.

Video Olympics Review for Atari 2600: - GameFAQs

This time it isn’t starting randomised, it’s really good at playing Breakout but it has never seen this new game before.

Those connections need retraining.

It needs to learn the new game.

If the new game is similar it might not take too long to retrain, however, if the new game is totally different, it might take a long time to retrain.

Back to the humans

Now to compare.

A human has 86,000,000,000 neurons.

A simple AI system to play Atari games can get away with less than 1,000.

We are significantly more complex than the game-playing AI. But the learning process is similar, we do more of the things which we get some kind of reward from, and we do less of the things we think are bad (admittedly only sometimes).

And whereas the game-playing AI was trained over 240 minutes, our brains were trained over many years, even decades.

The impact

Now if you remember a while back I wrote a newsletter (this one) explaining that even though I got made redundant and I could do anything I wanted, I still ended up sitting at my desk.

Well, guess what, I’m still sitting at my desk today.

Why?

Because that’s what my brain knows, it’s what I do, it’s how I’ve maximised my rewards over time.

A section of my 86,000,000,000 neurons has learned that behaviour over time, it’s what I know and what I’m comfortable with.

And just as I default to sitting at my desk, I also default to my eating habits, exercise habits (or lack of), way of thinking, etc. Because they’ve worked so far.

And just as with the AI system learning to play a new game, it takes a while to learn those new habits and behaviours.

Compare yourself to others

Generally, it’s a rule to not compare yourself to others. But in this case, it is a good thing.

First look at Alastair Humphreys, at the age of 24 he cycled around the world on a 4-year journey. He is now an author, a public speaker and works from his shed.

Next, Sam Holmes, currently sailing around the Mediterranean after sailing across the Atlantic. (And randomly this is from 17 years ago [link])

And even Casey Neistat, regarded as the OG of YouTube, blogged daily for about 800 days.

All of these examples are very different from me, because they did different things and ultimately learned to play a different game.

I doubt you’ll find any of those folks sat in a corporate office wearing a suit for 40 hours per week for the next 20 years.

That is just as strange to them, as it is strange to me (and us) to live the lives they do.

But that brings me to an old work colleague and an example of change….

A chap called Lyndon Poskit. Lyndon played the corporate life game, but 10 years ago had a near-fatal training incident. Perhaps in the spirit of last week’s newsletter, he decided to make a significant change - he got on his rather large motorbike and travelled around the world, a 245,000km trip through 74 countries and raced in 11 competitions along the way (check out the journey he captured).

Lyndon is one example of how we are trained in one game, however we can learn a new game too.

The point

And this is ultimately the point of the newsletter.

As with the AI system, we have developed behaviours which have served us well. They have helped us play the ‘game’ successfully based on the things we perceive as rewards.

Now if we want to play a new game, do something different, or get into a new routine - know we are capable of making that change, but we just need to understand that there is a learning process that needs to take place until those new behaviours and those connections become strong enough for it to be the new normal.

I hope this helps explain why some of your behaviours are there, and why they persist and gives an insight into the idea that we can all learn new things - it just takes time for those connections to change too.

And while there is a difference between behaviours and habits, it takes an average of 66 days, or between 18 to 254 days for a person to form a new habit according to a 2009 study. So give yourself time, be kind to yourself along the way and know it is a learning process.

Fin.

Happy Friday everyone.

John

Weekly newsletter of John Stamford PhD

Discussion about this post