computer intelligence sports betting resource

(c) 2003 - 2007 All Rights Reserved
Home.The Basics.Tactics.Hand-Picked Software.Links.Contact.
Home.The Basics.Tactics.Hand-Picked Software.Links.Contact.
Creating a Training File from scratch

All supervised artificial intelligence (AI) software will attempt to perform pretty much the same task. Attempting to calculate the most likely outcome (result) from a set of prior circumstances, or conditions, that were in place before the result was known.

In machine intelligence parlance, pre-event circumstances are referred to as INPUTS. The outcome, or result relevant to a specific set of inputs is known as the OUTPUT. All routines will attempt to model a likely outcome of future events by learning from the examples in a training file supplied by the user.

Because AI programs are essentially no more than number crunching or calculation engines, in almost all cases both inputs & outputs need to be represented numerically. In many cases this precondition presents no problem

Let's show an example of these principle in action, using the query "what will be the likely result of the Premier league football fixture, Manchester United vs Bogroll Town?"

Before we can predict an output using AI however, we do need a model already in place that has been pre-trained on our specific problem. So, let's look at a typical thought process in constructing a training file for this problem.

Given that AI software will learn from the inputs & output combinations we provide, it will come as no surprise that the overriding priority is that the parameters we select must be relevant to the task in hand. Even if we discovered that teams with red shirts (which includes Manchester United) win 66% of their matches, common sense should tell us that such a statistic is as a result of coincidence rather than cause & effect. Using shirt colour as one of our proposed model inputs would, I'm sure most would agree, not be an effective choice.

We could look back at how the two teams have performed in previous meetings, seems on the surface to be a relevant statistic - but this approach has flaws;
a.
Teams only meet each other once at each ground each season, so getting a meaningful sample size would involve going back many years.
b.
Players are drafted in and/or out of a team on a regular basis, so a team of only 5 years ago may bear only passing resemblance to today's squads.
c.
Last but not least, with promotion/relegation any two teams may not have had regular league meetings over the years.

We are forced therefore to take a more general overview, but which/what parameters to use? My recommendation would always be to examine the problem using a human perspective, then dissect this into its constituent parts. So . . .

Man Utd are recent league and European Champions. Bogroll were fortunate winners of last season's play-offs and gained promotion. Man Utd, big club, stuffed with talented players. Bogroll have a cut-price squad and are in trouble if a key player is injured. Your brain, your knowledge & experience shout that Man Utd have a BIG advantage in this game - but how to quantify how much of an advantage?

The league tables should be of great help. Much of what we know about the two teams should be reflected in the relevant up-to-date league standings. Sure enough, Man Utd have played 12 games, having won 10 of these, two were drawn and they have yet to lose a game this season. Bogroll Town have played 14 games, and have won, draw, lose figures of 2, 2, 10. Even someone with little of no knowledge of the individual teams involved in a game could get an idea of a likely outcome if they were shown that the home team's Played,Won,Draw,Lost record to date was 12, 10, 2, 0 and their opponent's similar record was 14, 2, 2, 10.

Such comparisons fit the bill surprisingly well in qualifying as inputs for a training file, and should also be readily available source of data by using results from previous seasons. The output can be whatever aspect we're looking to train on. Home win. draw or away win - degree of home team advantage - number of goals scored in the game, etc.

Remembering we need to use numeric values for outputs too, the last of those three options - number of goals scored - is in the correct format without further work. Degree of home advantage could easily be measured in goals, a 3-0 win = +3, a draw =0 or a 0-2 loss = -2. If however we are looking to model home win, draw or away win it will be necessary to represent these conditions by using a number, and such conversions do need a little forethought.

DUMB COMPUTER

No matter how sophisticated the software, you would be well to observe that we are still dealing with a dumb computer of no intelligence. It will see the data you present as numbers, pure & simple, and, as a calculating machine it will deal with them as such. If values for the three result options were chosen to be 0, 1 & 2 the computer would deal with them as such. i.e. 0 is less than both other options, 2 is the largest number, whereas 1 is mid-way between the other two states. We need to relate this numeric logic therefore, well, as near as is possible, to home win, away win and draw. Clearly (at least in my view) a home win is one extreme, an away win is the other extreme, and a draw sits neatly between these two. Other than that, order isn't particularly important (so long as we stick to it once we've chosen)

So Home=0, Draw=1, Away=2 will work just as well as Home=2, Draw=1, Away =0.

Other values should be equally valid. H=10, D=8, A=6, etc. But maybe using H=10, D=5, A=4  suggests numerically that a draw is not the exact mid point between home win & away win. Maybe it isn't, but I'd need some proof from somewhere, and I've not seen it yet. Just proceed with caution when converting to numbers, make sure they make sense and remain within their original overall concept!

Our training file here would therefore be constructed from past results and be presented, one record per line as;
Hteamplayed, W, D, L, Ateamplayed, W, D, L . . . followed by the result (so, 8 inputs, 1 output) and might begin like this

Line 01 . . . 12, 10, 2, 0, 14, 2, 2, 10, +2
Line 02 . . . . 13, 6, 5, 2, 13, 7, 4, 2, +1
Line 03 . . . . .12, 2, 5, 5, 13, 5, 5, 3,-1

The first three lines of our example file with outputs (the final number on each line) indicate a home win by 2 goals, a home win by 1 goal and an away win by 1 goal respectively. Training files would normally be as many lines as you can possibly muster. Not always possible I know, but experience suggests you should never trust a result from anything trained with fewer than 100 samples, and whenever you can go for training files of 1,000 or better.

In our purpose of taking one step at a time to keep things simple, the example here is still in a fairly crude state. Further refinements are not only possible, but they are probably to be recommended. We’ll discuss further options, refine, and hopefully improve the training file presentation on the next page.
more . . .