Databricks Certified Machine Learning Associate — Question 15
Which of the Spark operations can be used to randomly split a Spark DataFrame into a training DataFrame and a test DataFrame for downstream use?
Answer options
- A. TrainValidationSplit
- B. DataFrame.where
- C. CrossValidator
- D. TrainValidationSplitModel
- E. DataFrame.randomSplit
Correct answer: E
Explanation
The correct answer is E, DataFrame.randomSplit, which is specifically designed to split a DataFrame into two or more parts randomly. The other options, such as TrainValidationSplit and CrossValidator, are used for model validation and tuning, not for splitting DataFrames.