Ablative analysis

ablative analysis

Ablative analysis is a technique for evaluating machine learning models, to help an analyst make tradeoffs between accuracy and complexity.

Suppose a given model, or hypothesis, $H_{0}$ , shows acceptably low error after being trained on empirical inputs $x_{train}$ and the associated $y_{train}$ output ground truth. Here, $x_{train}$ is an $m\times f$ matrix of $m$ input examples, each having $f$ scalar features, and $y_{train}$ is a column vector of $m$ scalar outputs. If the model is, e.g., a binary classifier, then each output will be 0 or 1.

We then input some brand new $x_{test}$ data and record $H_{0}$ 's generalization error, or the RMS difference between $H_{0}(x_{test})$ and the empirically measured $y_{test}$ ground truth.

There are several motivations for simplifying $H_{0}$ while achieving similar error: we may be able to reduce the cost of gathering input data by using fewer input features, and we may be able to use a machine learning algorithm with reduced complexity for training or for predictions, in terms of space or time, perhaps by shrinking a processing pipeline buried within $H_{0}$ . Also, a simpler model may avoid over-fitting and exhibit better generalization error.

Ablative analysis is a technique for generating such models, which will compete with $H_{0}$ . Choose a column of the feature vector to discard, and use the modified $x_{train}$ to train a new model $H_{1}$ . Now record $H_{1}$ 's generalization error, the RMS difference between its output and $y_{test}$ . If the error is still acceptably low we might choose to simplify our model, adopting the model $H_{1}$ that does not depend on the uninformative feature we discarded.

Alternatively, suppose that all features are necessary for acceptable performance, and that $H_{0}$ contains an ensemble of models, or a pipeline of several processing stages. A stage might clip values to a range, or condition one variable upon another. As a model is developed and debugged, such special cases may accrete over time. Choose an ensemble member or a stage, and discard it, to form model $H_{2}$ . As before if its measured error is acceptably low we might prefer it over $H_{0}$ . For the features or stages we choose to retain, we can now quantify how much they contribute to the success of $H_{0}$ , relative to other parts of the model.

References[edit]

Hastie: 7.2 Bias, Variance and Model Complexity

External links[edit]

Andrew Ng, Advice for applying Machine Learning [1], pp. 23-24.

Held, Thrun, and Savarese, Learning to Track at 100 FPS with Deep Regression Networks [2] section 6.4.

Chen Sun et al., Revisiting Unreasonable Effectiveness of Data in Deep Learning Era [3]

Minghuang Ma et al., Going Deeper into First-Person Activity Recognition [4]

Sara Maatta, Predicting groundwater levels using linear regression and neural networks [5]

Jan Overgoor et al., Predicting Negative CouchSurfing Experiences using Local Information [6]

Pamela Bhattacharya et al., Automated, highly-accurate, bug assignment using machine learning and tossing graphs [7]

ablative analysis[edit]

This article "Ablative analysis" is from Wikipedia. The list of its authors can be seen in its historical and/or the page Edithistory:Ablative analysis. Articles copied from Draft Namespace on Wikipedia could be seen on the Draft Namespace of Wikipedia and not main one.