Databricks Certified Associate Developer for Apache Spark — Question 2

Which of the following operations fails to return a DataFrame with no duplicate rows?

Answer options

Correct answer: E

Explanation

The correct answer is E because specifying 'subset = "all"' does not effectively target any specific columns, thus it does not guarantee the removal of duplicates. Options A, B, C, and D are designed to return a DataFrame with unique rows either by dropping duplicates or by considering all columns for distinct values.