Historically, humans have a long and complicated relationship with predicting the future. Many different methods and attempts to predict the future from the oculist to the astrologist illustrate how an intrinsic need to foresee the future is ingrained within the very fabric of human nature.
Anticipation is the biological means to naturally predict what can come next. Anticipation is often based on prior experience, unwittingly the anticipation is due to some past experience or knowledge. Subsequently, the notion that more experience and a greater amount of knowledge results in better predictions is a widely accepted concept. Today in an ever growing data centric world prediction is growing as are the problems to solve. Some examples of the problems are fraud detection, risk analysis and sales optimisation.
A typical problem is normally solved using some form of ‘learning’ from prior data, a model is then built which enables the prediction on the back of this learning procedure. A classic scenario consists of an outcome that needs to be predicted based on a set of features that can inform the model on what the outcome should or could be. The data that is used to ‘learn’ from is often referred to as a training set of data. In general the more data available the better the prediction. This is analogous to a child not knowing that touching a hot pan will hurt, as the child gets older it learns (either from experiences or being told enough times) that in order to avoid getting hurt it should not touch the hot pan.
Within predictive modelling there are many different goals and many different approaches. Broadly speaking a machine learning algorithm is one that is able to learn from the data (Goodfellow et al., 2016). However, this differs from traditional statistical analysis in that machine learning is a focus on prediction while statistical methods tend to focus on explaining how the prediction is made using the input data. Compromising the accuracy of the final prediction is acceptable under statistical methods in order to allow for a greater understanding of the data.
There are many different types of problems within machine learning, however in every machine learning task it requires a system to learn an example, often a collection of features that have been quantitatively measured from some previous event or object that the machine learning system must process. The example is normally given in vector form, where each component of the vector is another feature (Goodfellow et al., 2016). For example an image where the different pixel values are the individual features the system must therefore learn to identify the image from the different input features.
Types of problems
Common tasks which machine learning can be used to solve are the following:
- Classification: Here the task is to identify which category the input belongs to. This can be a simple pass or fail, yes or no or more categories, essentially it can be k categories. The algorithm produces a function ƒ : ℝn → {1,…,k}. When y = ƒ(x), then the descriptive vector x is identified by y. An example of a classification problem is object recognition such as recognising from pixel brightness and location what the image is. This is the basis of facial recognition technology (Taigman et al., 2014).
- Classification with missing inputs: If there is no guarantee that the input features contain every measurement for each feature then solving the classification task becomes more challenging. Here the learning algorithm must now learn a set of functions with each function now classifying x with a different missing input. This process is often known as pattern classification and is common in medical diagnoses where there are a range of different tests on a patient with some being too expensive or too invasive to carry out in each case. Missing inputs can occur in many of the other tasks and can be generalised for most of the tasks in this section.
- Regression: Here the task differs from classification in the response variable where a numerical value is to be predicted given an input or a set of inputs. To solve this the learning algorithm gives a function ƒ : ℝn → ℝ. An example for this is predicting the price of a house, diamond or even a company’s stock price on the stock exchange.
- Transcription: Here the task is to observe an unstructured representation of the data and transcribe it into a structured, readable and textual form. An example of this is speech recognition where a learning algorithm will recognise the different speech pattern and transform that into a textual format (Goodfellow et al., 2016).
- Machine translation: A translation task is taking the input symbols from one language and converting them into the correct sequence and symbols of the other language.
- Structured output: The task of outputting a vector with important relationships between a set of different elements is known as structured output since the output here contains several values all closely inter-related. An example of this is parsing which involves taking a natural language sentence and mapping it into a tree with the different nodes being the verbs, nouns and adverbs. The tree gives an overall structured description of the sentence.
There are many different tasks which a learning algorithm could be applied to and the list above is only an example of some of these tasks and is by no means exhaustive.
In general tasks with a quantitative response tend to be solved using regression techniques. There is often a broad range of problems to solve and a host of different issues within each task. Dealing with every type of problem and every scenario is not only beyond the scope of this post, it would possibly be a near impossible task owing to the many different problems that can fall under the regression category.
References
I. Goodfellow, Y. Bengio, and A. Courville.Deep Learning. MIT Press, 2016. http://www.deeplearningbook.org.
Y. Taigman, M. Yang, M. Ranzato, and L. Wolf. Deepface: Closing the gap to human-level performance in face verification. In Proceedings of the IEEE conference on computer vision and pattern recognition, pages 1701–1708, 2014.