computer intelligence sports betting resource

(c) 2003 - 2008 All Rights Reserved
Home.The Basics.Tactics.Best Software.Software Comparisons.Directory.Contact Us.Links.
Home.The Basics.Tactics.Best Software.Software Comparisons.Directory.Contact Us.Links.
Machine intelligence predictions benefit by being trained with large data sets of past events. Obviously for such predictions to be reliable the historical situations should remain as stable as possible.

In a nutshell, as much data as we can - but where the general parameters and circumstances associated with the events and results remain as constant as can be expected. The above rule makes predicting sports events problematical, factors affecting the results can be in a constant stage of flux.

Football league teams in the UK play only around 40 games per season, hardly enough for a form student to be certain of the potential of each one of them. At the season end some teams are relegated, others are promoted, which means the following campaign there are unknown quantities thrown into the ring. Add to this players are transferred in and out both at the summer break and the mid-winter transfer ‘window’. Then there’s the problem of the team in poor form, which can often result in managerial changes, a situation which on its own can frequently have dramatic effects on team morale, performances, or both.

You can maybe gather by now, I do not advocate attempting to train individual neural networks on specific teams, there are far too many changes over time to make Mancaster Athletic’s results from a couple of years ago reliably useful for predicting tomorrow’s game. So, we need to be a little less specific and try to assess football in ‘general’ terms.

Rather than thinking Blackport vs Chingfield what will be the likely outcome, we need to be thinking second-top of league playing at home to a mid-table club, what will be the most probable result?

At the extremes of this approach we certainly do not need to apply machine intelligence to help us sort things out. Top of league playing bottom of league . . . do we really need a neural network to help us out? But maybe, two mid-table teams . . . machine intelligence, armed with the right information might be able to tell us which one, if either, have the edge in this game . . . and just as importantly, by just how much.

Let’s walk through this example stage by stage, not necessarily the way the seasoned neural netter would attack the problem, but it may well illustrate the capabilities and/or the limitations of the genre.

We’ve already established that if top of league is playing the bottom club, machine learning is unlikely to be of help, so what can a neural network establish if we feed in just two inputs; simply the points-to-date of the home side and the points-to-date of the away team. The output will be the goal difference from the home team’s viewpoint (2-0 would be 2, 1-1 would be zero, 1-2 would be -1, etc)

I’m taking training data from English seasons 2003-04 & 2004-05, and we’ll test out the trained nets on season 2005-06. Tiberius is the software package chosen for this example and in order to allow ‘form’ to settle a little output lines will only be generated when both teams have played 8 games of more in each particular season.

Most machine learning packages inform the user of progress by whatever parameters are set for the training to aim at. In this example Tiberius was set for RMSE (the default setting)

A total of 3328 examples were trained, up to a ‘best’ RMSE figure of 1.5432, this takes only a few minutes with Tiberius. But, proof of any pudding is in the eating, so the newly trained network was then tested against the relevant games from 2005-06 season (i.e matches not seen by the neural net previously). How would it perform?

The approach I adopted for evaluation of the net was as follows. The 05-06 season produced 1,662 games to check against, and outputs from the net would indicate the predicted size of the home team’s advantage (in goals). My process would simply be to check the topmost and bottom-most network outputs against the actual results. What percentage of home wins would the network be able to produce?

In this first, very simplest of neural networks, the biggest predictions produced;

Highest 50 outputs, 33 home wins (66%)
Highest 100, 62 home wins (62%)
Highest 200, 120 home wins (60%)

. . . at the opposite end of network predictions . . .

Lowest 50 outputs, 10 home wins (20%)
Lowest 100, 25 home wins (25%)
Lowest 200, 54 home wins (27%)

Is that good?  Or bad?  Or any better than you could have done without the need for software? I don’t know, you tell me.

For the next stage, let’s see if supplying the network with a little extra information can improve its performance . . .

We’ll add another 4 inputs to the original 2. As well as hometeampoints and awayteampoints we’ll add home team goals scored, home team goals conceded, away team goals scored and away team goals conceded.

A few minutes later, and another network is trained, this time RMSE was 1.5397 (lower is better, so this shows a slight improvement on our first attempt) But will it show an improved performance when tested over 2005-06? The results were . . .

Highest 50 outputs, 34 home wins (68%)
Highest 100, 64 home wins (64%)
Highest 200, 126 home wins (63%)

. . . at the opposite end of network predictions . . .

Lowest 50 outputs, 12 home wins (24%)
Lowest 100, 27 home wins (27%)
Lowest 200, 59 home wins (29.5%)

A little improvement on picking the likely home wins, but a drop in performance at trying to avoid ‘em!

Painting a Better Picture

How are our inputs shaping up here though? Points to date is a rather limiting statistic, maybe one team has played a few more games than the other, and 20 points from 15 games is not as impressive as 18 points from 7 games is it. But the computer will see only 20 pts vs 18 pts.

Also, knowing what we know, we can add the logic of comparing one team’s stats with the other.

Okay, I’m aware the software could be able to sort that out, but it would DEFINITELY use this information if we presented the figures in a more appropriate format.

This time therefore we’ll be using similar inputs to example 2 above, but ‘engineered’ to make better sense to the computer. Firstly, we’ll divide points gained by games played to produce a points-per-game figure, to help even out the different number of games played problem. We’ll do the same with goals scored and conceded for both teams too.

One further enhancement will be, rather than hoping the computer will make the correct comparisons, we’ll process them beforehand. Our new inputs for our third example will be reduced to just three;

[Home-team-points-per-game] minus [away-team-points-per-game]
[HTeam goals scored per game] minus [ATeam goals scored per game]
[HTeam goals conceded per game] minus [ATeam goals conceded per game]

Would this allow net Mk3 to continue improvements?

The RMSE of this net certainly showed a little better once again at 1.5372, and the results against 05-06 were . . .

Highest 50 outputs, 39 home wins (78%)
Highest 100, 69 home wins (69%)
Highest 200, 127 home wins (63.5%)

. . . at the opposite end of network predictions . . .

Lowest 50 outputs, 10 home wins (20%)
Lowest 100, 25 home wins (25%)
Lowest 200, 62 home wins (31%)

Ah! Our best net yet. I can’t hold your hand forever, and be with you through every situation you may meet, but I hope the above examples will give you food for thought, and encourage you to experiment . . it is the only way to learn and perfect your personal expertise with machine learning.

One final test example to leave you with . . . on the next page . . .