EMC Proven Professional – Data Science and Big Data Analytics — Question 22

You have fit a decision tree classifier using 12 input variables. The resulting tree used 7 of the 12 variables, and is 5 levels deep. Some of the nodes contain only 3 data points. The AUC of the model is 0.85. What is your evaluation of this model?

Answer options

Correct answer: A

Explanation

The correct answer, A, highlights the concern of overfitting, especially with small nodes in a deep tree, suggesting the need for shallower trees and ensemble methods for better generalization. Answer B incorrectly assumes that purity ensures accuracy, which is not the case with small sample sizes. Answer C suggests that more data is necessary, but the model's complexity is the primary issue here. Answer D acknowledges the high AUC but misinterprets it as a sign of overall accuracy without addressing the overfitting problem.