Databricks Certified Associate Developer for Apache Spark — Question 2
Which of the following operations fails to return a DataFrame with no duplicate rows?
Answer options
- A. DataFrame.dropDuplicates()
- B. DataFrame.distinct()
- C. DataFrame.drop_duplicates()
- D. DataFrame.drop_duplicates(subset = None)
- E. DataFrame.drop_duplicates(subset = "all")
Correct answer: E
Explanation
The correct answer is E because specifying 'subset = "all"' does not effectively target any specific columns, thus it does not guarantee the removal of duplicates. Options A, B, C, and D are designed to return a DataFrame with unique rows either by dropping duplicates or by considering all columns for distinct values.