Databricks Certified Associate Developer for Apache Spark — Question 78
The code block shown below contains an error. The code block is intended to return the exact number of distinct values in column division in DataFrame storesDF. Identify the error.
Code block:
storesDF.agg(approx_count_distinct(col(“division”)).alias(“divisionDistinct”))
Answer options
- A. The approx_count_distinct() operation needs a second argument to set the rsd parameter to ensure it returns the exact number of distinct values.
- B. There is no alias() operation for the approx_count_distinct() operation's output.
- C. There is no way to return an exact distinct number in Spark because the data Is distributed across partitions.
- D. The approx_count_distinct()operation is not a standalone function - it should be used as a method from a Column object.
- E. The approx_count_distinct() operation cannot determine an exact number of distinct values in a column.
Correct answer: E
Explanation
The correct answer is E because the approx_count_distinct() function is designed to provide an approximate count rather than an exact number of distinct values. Options A, B, C, and D misinterpret the functionality of approx_count_distinct() or suggest incorrect requirements for its use.