You can edit almost every page by Creating an account. Otherwise, see the FAQ.

# Ablative analysis

ablative analysis

Ablative analysis is a technique for evaluating machine learning models, to help an analyst make tradeoffs between accuracy and complexity.

Suppose a given model, or hypothesis, ${\displaystyle H_{0}}$, shows acceptably low error after being trained on empirical inputs ${\displaystyle x_{train}}$ and the associated ${\displaystyle y_{train}}$ output ground truth. Here, ${\displaystyle x_{train}}$ is an ${\displaystyle m\times f}$ matrix of ${\displaystyle m}$ input examples, each having ${\displaystyle f}$ scalar features, and ${\displaystyle y_{train}}$ is a column vector of ${\displaystyle m}$ scalar outputs. If the model is, e.g., a binary classifier, then each output will be 0 or 1.

We then input some brand new ${\displaystyle x_{test}}$ data and record ${\displaystyle H_{0}}$'s generalization error, or the RMS difference between ${\displaystyle H_{0}(x_{test})}$ and the empirically measured ${\displaystyle y_{test}}$ ground truth.

There are several motivations for simplifying ${\displaystyle H_{0}}$ while achieving similar error: we may be able to reduce the cost of gathering input data by using fewer input features, and we may be able to use a machine learning algorithm with reduced complexity for training or for predictions, in terms of space or time, perhaps by shrinking a processing pipeline buried within ${\displaystyle H_{0}}$. Also, a simpler model may avoid over-fitting and exhibit better generalization error.

Ablative analysis is a technique for generating such models, which will compete with ${\displaystyle H_{0}}$. Choose a column of the feature vector to discard, and use the modified ${\displaystyle x_{train}}$ to train a new model ${\displaystyle H_{1}}$. Now record ${\displaystyle H_{1}}$'s generalization error, the RMS difference between its output and ${\displaystyle y_{test}}$. If the error is still acceptably low we might choose to simplify our model, adopting the model ${\displaystyle H_{1}}$ that does not depend on the uninformative feature we discarded.

Alternatively, suppose that all features are necessary for acceptable performance, and that ${\displaystyle H_{0}}$ contains an ensemble of models, or a pipeline of several processing stages. A stage might clip values to a range, or condition one variable upon another. As a model is developed and debugged, such special cases may accrete over time. Choose an ensemble member or a stage, and discard it, to form model ${\displaystyle H_{2}}$. As before if its measured error is acceptably low we might prefer it over ${\displaystyle H_{0}}$. For the features or stages we choose to retain, we can now quantify how much they contribute to the success of ${\displaystyle H_{0}}$, relative to other parts of the model.

## References

Hastie: 7.2 Bias, Variance and Model Complexity