There has been interesting research in helping to make machine learning models more understandable, such as Unmasking Clever Hans predictors and assessing what machines really learn. Also see practical implementations of this approach:
- Heatmaps showing which features majorly influenced hand writing, image, or text classification
- Analyzing these heatmaps can point out undesired correlations in the training data, between samples and labels. For example, an image classifiers for train track might rely on objects that are present in each picture (such as , while not being present in pictures of counter examples for horses. This artifact in the collected data set may be subtle, and not noticeable to a human, but would be visible on the heatmap that highlights the critical features in each image that drove the classification.