Databricks Certified Machine Learning Associate — Question 38
A machine learning engineer is converting a decision tree from sklearn to Spark ML. They notice that they are receiving different results despite all of their data and manually specified hyperparameter values being identical.
Which of the following describes a reason that the single-node sklearn decision tree and the Spark ML decision tree can differ?
Answer options
- A. Spark ML decision trees test every feature variable in the splitting algorithm
- B. Spark ML decision trees automatically prune overfit trees
- C. Spark ML decision trees test more split candidates in the splitting algorithm
- D. Spark ML decision trees test a random sample of feature variables in the splitting algorithm
- E. Spark ML decision trees test binned features values as representative split candidates
Correct answer: E
Explanation
The correct answer is E because Spark ML decision trees utilize binned feature values, which can lead to different split candidates compared to the continuous values used by sklearn. The other options do not accurately describe the differences in the algorithms; for instance, A and D incorrectly state how features are evaluated, while B and C do not relate directly to the core reason for the differing results.