Databricks Certified Machine Learning Associate — Question 15

Which of the Spark operations can be used to randomly split a Spark DataFrame into a training DataFrame and a test DataFrame for downstream use?

Answer options

Correct answer: E

Explanation

The correct answer is E, DataFrame.randomSplit, which is specifically designed to split a DataFrame into two or more parts randomly. The other options, such as TrainValidationSplit and CrossValidator, are used for model validation and tuning, not for splitting DataFrames.