Which of the following code blocks will most quickly return an approximation for the numb…

Question

Which of the following code blocks will most quickly return an approximation for the number of distinct values in column division in DataFrame storesDF?

Accepted Answer

Correct answer: C. C. storesDF.agg(approx_count_distinct(col("division"), 0.15).alias("divisionDistinct")) — Option C is correct because a higher relative error (0.15) allows for a faster approximation of distinct values compared to the lower thresholds in other options. Options A and D do not use a relative error parameter, making them less efficient for quick estimations. Options B and E have lower relative errors than C, which would result in a longer computation time.

Databricks Certified Associate Developer for Apache Spark — Question 9

Answer options

Correct answer: C

Explanation