Creating a Training File from scratch (continued)
Before looking to enhance the basic training file we constructed on the previous page, we’ll look at clarifying terms the newcomer will encounter regarding file Outputs. This descriptor remember is the training file element that represents “what happened” when specific inputs were in place, the item we will ultimately be looking to predict.
OUTPUT TYPES
In broad terms Outputs will be either Classification or Continuous. The most common implementation method is classification, which simply means the result under scrutiny will be one of two specific states. e.g. yes or no, on or off, black or white, win or lose, malignant or benign.
If another element is included as an option, e.g. yes/no or maybe, black/white or grey, win/lose or 2nd., the output becomes continuous, also frequently referred to as ‘regression’. To be fair these continuous/regression problems usually have far more than the three variants we’ve illustrated here . . . duration in minutes of a rain storm, runs scored in a cricket match, etc. . . the output range may have upper & lower limits, but the actual figure can be anywhere between the two extremes (i.e. Continuous)
The generally accepted wisdom is that knowledge discovery algorithms are at their best when learning (and ultimately predicting) a single output. For example, we could decide our case study football project could use the very same set of input data to learn & predict home team goal advantage AND total goals scored. In such cases it is preferable to train one model on home advantage, then another separate model to learn the likely total goals scored.
Classifiers too, where there are more than two output states, say Black-Grey-White, should be segregated into models aimed at two-state examples, Black/Not-black, Grey/Not-grey and White/Not-white.
There are packages which will allow the user to train on multiple outputs simultaneously, but even where this is the case, ignoring the option still seems to be the better way to proceed. Even though it may take twice as long, training two models rather than just the one, resist the temptation to cut corners. The AI modeller should be more interested in a robust finished product, rather than getting the a half-job done more quickly where the software will inevitably be making some sort of compromise to achieve more than a single goal. Training a model to converge on a single target allows it to achieve more consistent results.
Back to our football case study though . . .
TWEAKING THE DATA FOR BETTER PERFORMANCE
We decided that using each team’s league results data would be a good starting point from which to allow a prediction to be made for a game played between the two. After all, when a team that has won 10 games from 12 is playing another with only 2 games won from 12, it should be a simple task to spot the likely winner. But hold on a minute . . . if our problem is this easy to solve, who needs artificial intelligence? Our own built-in neural device we all carry around above our shoulders is more than up to the task.
This is of course true, but would the task as easy if the two teams have a much more closely matched record? Or if I wanted to put a numeric value on exactly how much superiority one team had over their opponent? Knowledge discovery software is an attempt to appraise a situation is a unbiased way, and one which is not burdened by emotion.
In an effort to get the very best from artificial intelligence software, it is well worth stepping back to take an overview of the data we’re about to train with. Which elements are important, and which would maybe benefit from a little pre-processing? We can examine this further by dissecting the played/won/draw/lose format from our case study.
Which elements of Manchester United’s record of 12, 10, 2, 0, allow an insight into their strength and/or weakness? Well, the fact they’ve played 12 games, in isolation, is of little value. Does a team who have played 12 games have more or less merit than a team who have played 10 games, or 20? Clearly not.
Similarly, how should we compare a team who have won TEN games against one who have won just FIVE . . . well it can’t be done with any certainty unless we know their respective ‘games played’ figures. Played 12, won TEN, compared with played 12, won FIVE is one thing. But Played 20, won TEN, against played 5, won FIVE puts a completely different complexion on things - yet both compare two teams who have won TEN & FIVE games respectively. Which is where the stepping back overview is important. In order to make more sense of the 12, 10, 2, 0 sequence our own brain performs further tasks, often without us even realising the fact.
Won, draw, lose PER GAME, carries an equal amount of useful & relevant information in three elements, as the original presentation did using four. But this time we are ‘forcing’ the software to view the facts in a form we judge to be much more meaningful.
No matter whether we’re looking to train on outputs which are classification or continuous, knowledge discovery software simply shakes, cajoles and manipulates the presented data by comparisons and combining the training file elements into a format which helps best predict the output we have provided. If we know or presume particular combinations of our proposed inputs will be more informative when processed a little further, better to do it ourselves than rely upon the software doing it for us.
Remember, whatever the software, it will see a series of inputs as no more, no less, than individual and independent numeric values, plain & simple. It cannot hope to have the prior knowledge that we have of obvious and perhaps significant inter-relationships. What we do want of the software is that it can hunt out and exploit the more subtle dependencies, ones we would otherwise of missed - or even have had little or no chance of discovering. This is the area where they can excel.
Implementation of AI algorithms into modern software has improved immeasurably over the past decade, but they are not, and should never be viewed as, the modern day version of a crystal ball.
YOUR PRE-PROCESSING IS MORE IMPORTANT THAN THE SOFTWARE
Success it is claimed is usually 10% inspiration and 90% perspiration. Successfully built and utilised AI models can be viewed in the same way, it is NOT all down to the software used. It is not even a 50/50 relationship, but 90% of the success you may (or may not) achieve from using these products will be the skill you develop in putting the correct pieces in place for a specific problem. The final 10% finishing touch will, we hope, be provided by the product.