Databricks Certified Associate Developer for Apache Spark — Question 224
Which of the following operations can be used to return a DataFrame with no duplicate rows? Please select the most complete answer.
Answer options
- A. DataFrame.distinct()
- B. DataFrame.dropDuplicates() and DataFrame.distinct()
- C. DataFrame.dropDuplicates()
- D. DataFrame.drop_duplicates()
- E. DataFrame.dropDuplicates(), DataFrame.distinct() and DataFrame.drop_duplicates()
Correct answer: E
Explanation
The correct answer is E because it includes all methods that can achieve the removal of duplicate rows in a DataFrame: DataFrame.dropDuplicates(), DataFrame.distinct(), and DataFrame.drop_duplicates(). Options A, B, C, and D are incomplete as they do not encompass all available methods for this operation.