EMC Proven Professional – Data Science and Big Data Analytics — Question 22
You have fit a decision tree classifier using 12 input variables. The resulting tree used 7 of the 12 variables, and is 5 levels deep. Some of the nodes contain only 3 data points. The AUC of the model is 0.85. What is your evaluation of this model?
Answer options
- A. The tree is probably overfit. Try fitting shallower trees and using an ensemble method.
- B. The AUC is high, and the small nodes are all very pure. This is an accurate model.
- C. The tree did not split on all the input variables. You need a larger data set to get a more accurate model.
- D. The AUC is high, so the overall model is accurate. It is not well-calibrated, because the small nodes will give poor estimates of probability.
Correct answer: A
Explanation
The correct answer, A, highlights the concern of overfitting, especially with small nodes in a deep tree, suggesting the need for shallower trees and ensemble methods for better generalization. Answer B incorrectly assumes that purity ensures accuracy, which is not the case with small sample sizes. Answer C suggests that more data is necessary, but the model's complexity is the primary issue here. Answer D acknowledges the high AUC but misinterprets it as a sign of overall accuracy without addressing the overfitting problem.