A machine learning engineer is converting a decision tree from sklearn to Spark ML. They…

Question

A machine learning engineer is converting a decision tree from sklearn to Spark ML. They notice that they are receiving different results despite all of their data and manually specified hyperparameter values being identical.
Which of the following describes a reason that the single-node sklearn decision tree and the Spark ML decision tree can differ?

Accepted Answer

Correct answer: E. E. Spark ML decision trees test binned features values as representative split candidates — The correct answer is E because Spark ML decision trees utilize binned feature values, which can lead to different split candidates compared to the continuous values used by sklearn. The other options do not accurately describe the differences in the algorithms; for instance, A and D incorrectly state how features are evaluated, while B and C do not relate directly to the core reason for the differing results.

Databricks Certified Machine Learning Associate — Question 38

Answer options

Correct answer: E

Explanation