Databricks Certified Machine Learning Associate — Question 38

A machine learning engineer is converting a decision tree from sklearn to Spark ML. They notice that they are receiving different results despite all of their data and manually specified hyperparameter values being identical.
Which of the following describes a reason that the single-node sklearn decision tree and the Spark ML decision tree can differ?

Answer options

Correct answer: E

Explanation

The correct answer is E because Spark ML decision trees utilize binned feature values, which can lead to different split candidates compared to the continuous values used by sklearn. The other options do not accurately describe the differences in the algorithms; for instance, A and D incorrectly state how features are evaluated, while B and C do not relate directly to the core reason for the differing results.