Databricks Certified Associate Developer for Apache Spark — Question 224

Which of the following operations can be used to return a DataFrame with no duplicate rows? Please select the most complete answer.

Answer options

Correct answer: E

Explanation

The correct answer is E because it includes all methods that can achieve the removal of duplicate rows in a DataFrame: DataFrame.dropDuplicates(), DataFrame.distinct(), and DataFrame.drop_duplicates(). Options A, B, C, and D are incomplete as they do not encompass all available methods for this operation.