computer intelligence sports betting resource

(c) 2003 - 2010 All Rights Reserved
Home.The Basics.Tactics.Best Software.Software Comparisons.Contact Us.Links.
Home.The Basics.Tactics.Best Software.Software Comparisons.Contact Us.Links.

Evaluating machine learning software, one against another, is not as simple a task as it might seem. Certain products have strengths in particular areas, or have multiples of settings which can be adjusted to help suit a task.

 

Software vendors in this area frequently use industry 'standard' data sets to illustrate the prowess of their particular product. But let's face it, if they are trying to convince the potential purchaser that theirs is the one to go with, they're unlikely to select a problem which does not show their package in its best light. Also, the often used 'problem' data sets, are not problems at all. Asking a machine to sort out what's going on in an XOR function is more like an exhibition AI obstacle course, its not really going to assist an individual to understand anything. Another favourite is Iris classification (the flower) from various bits of specific information . . . Again, the rules are so simple that it is not, in my opinion, representative of a real-world problem.

 

Another shortcoming of the Iris, XOR, examples is that they have definite, specific solutions. If x, y AND z then iris species is definitely Virginica. In my area of interest, as I suspect with many others, there are very few, if any, definite answers to my queries ~ I'm looking for trends. For instance, a specific set of pre-game circumstances in a sports contest may produce result x. But, the very next day, an exact same set of circumstances produces result y.

 

So here I'll present a comparison of some commercial packages that have been trained and tested on a couple of sports related data sets of my own compilation.

 

A couple of caveats:

The data configurations used make no claim to be the best solution to the given task, but at least they are the same for every program.

So far as possible, each program was used in its 'default' state. (some allow a host of configurations and settings allowing a user to possibly 'drill down' to a better solution)

 

 

Software used:

neural net - Ward Systems Predictor

neural net - Tiberius data mining

genetic programming - Discipulus

genetic programming - GeneXProTools

regression splines - Salford Systems MARS

 

Sample Task 1: (football)

 

10 Inputs per record. League games won, drawn, lost, goals scored and goals conceded for each of home and away teams. Each data item is divided by the number of games played, producing per game figures. e.g. won 9 (but from 20 games) = 0.45

 

One output, an integer representing the home team advantage in goals.

e.g. score = 3-1, output = +2. score = 0-1, output = -1.

 

Trained upon data sampled from two seasons (1,422 records), tested against a third season (2,121 records).

Fitness measurements compared;

R-squared. Standard statistical measure of fitness predicted versus actual (1=perfect match)

Sum of actual errors *

Sum of raw errors *

R-squared

 

0.04525 Tiberius

0.04484 WARD neural mode

0.04102 MARS

0.03620 GeneXProTools

0.03944 Discipulus best 'team'

0.03186 WARD genetic mode

0.03180 Discipulus best program

Sum of actual errors

 

155.49 GeneXProTools

226.94 WARD genetic mode

227.70 Tiberius

255.15 WARD neural mode

286.02 MARS

306.17 Discipulus best program

320.41 Discipulus best 'team'

Sum of raw errors

 

2719.89 Tiberius

2721.66 Discipulus best 'team'

2729.83 WARD neural mode

2730.41 MARS

2767.60 GeneXProTools

2780.54 Discipulus best program

2818.13 WARD genetic mode

Within half-a-goal

 

560 Discipulus best 'team'

554 Tiberius

553 WARD neural mode

553 MARS

553 GeneXProTools

530 WARD genetic mode

529 Discipulus best program

Sample Task 2: (horse racing favourites spreads)

 

3 Inputs per record. Race a handicap or not, Number of runners, Odds of favourite

 

One output, an integer representing the spreads value for a favourite's performance where: win=25, coming 2nd = 10pts, finishing 3rd = 5pts, otherwise zero points.

 

Trained upon data sample of 1,000, tested against out-of-sample set of a 859 records.

 

Fitness measurements taken;

R-squared. Standard statistical measure of fitness predicted versus actual (1=perfect match)

Sum of actual errors

Sum of raw errors

 

Software shown in ranking order, best score nearer the top;

* if the software predictions for 4 cases were 2 x 5 too high, 2 x 5 too low. Sum of Actual = 5+5-5-5 = 0, whereas the sum of raw errors = 5+5+5+5 = 20.

If another package predicted all 4 @ 2 too high, actual = 8, raw = 8. So, Actual figure allows an overview of the distribution of errors, the nearer to zero, the better its focus. Raw figures give an accumulated error over the whole data set (lower is better)

Within half-a-goal. Sum of all cases where prediction was closer than 0.5 of the actual home team goal superiority.

 

By far & away the most significant two statistical measures are R-squared and Sum of Raw Errors. Software shown in ranking order, best score nearer the top;

Approximate training times (both exercises);

================================

MARS < 1 minute

WARD genetic 1 hour

WARD neural < 1 minute

GeneXproTools 1 hour

Discipulus 1 hour (both individual & team are trained simultaniously)

Tiberius  <10 minutes

 

Software Prices

===========

WARD Predictor   US$550.00

GeneXproTools Advanced £650

Discipulus Professional US$495.00

Tiberius  US$265.00 (3-year license)

MARS Salford Systems quoted me for the least expensive option which was $4,995.00 for a single user license with a further $1,998.00 annual renewal charge. If it makes any difference MARS price does include tech support, maintenance, all upgrades to future versions and internet training for a single user. Seats to any upcoming Salford Systems MARS training will be discounted by 55%

 

Testing was performed without bias, either in selection of tasks or otherwise. They are in my experience quite typical and perhaps underline why Tiberius is not only my package of choice, but the one to which I now judge all others.

 

The software chosen for this comparison is, in my experience, the cream of the current (2008) commercial machine learning software. A package performing poorly in this company does not infer the software is not up to scratch. I have tried & tested many products - but obviously not all!, of those tried many were rejected for this exercise for reasons stated below. All the following products I rank as below the capabilities of all the packages used in the above tests;

 

Attrasoft

BrainCom

Crespin

Emergent

ExcelNeural

FANN

Joone

Membrain

Neurosolutions

Pythia

QNet

RapidMiner

RockEye

Tanagra

Trajan (which is also the Neural Network add-in incorporated into Statistica)

XLPert

 

My rejection of these was for a variety of factors. It is not my intention to review these products individually, and of course my reasons for rejection may not be valid cause for others to do the same.

 

The above rejection list, in this reviewers opinion, suffer from at least one (and in a few cases a good few more than one) of the following negative factors;

 

Very poor at out-of-sample predictions.

Flaky and/or bug-ridden software.

Frequent program crashes

Overly complex user interface (some are possibly targeted primarily at academic users)

Very poor user support (sometimes NO user support)

R-squared

 

0.09133 MARS

0.09108 Tiberius

0.09018 WARD genetic mode

0.08953 GeneXProTools

0.08654 Discipulus best 'team'

0.08076 Discipulus best program

0.07090 WARD neural mode

Sum of actual errors

 

20.79 WARD genetic mode

-21.13 Discipulus best program

22.85 Discipulus best 'team'

109.20 Tiberius

132.71 GeneXProTools

173.85 MARS

201.80 WARD neural mode

Sum of raw errors

 

7173.65 Tiberius

7188.63 MARS

7203.16 WARD genetic mode

7221.83 GeneXProTools

7232.79 Discipulus best program

7239.55 Discipulus best 'team'

7315.22 WARD neural mode

Bookmark and Share