Databricks Certified Associate Developer for Apache Spark — Question 9

Which of the following code blocks will most quickly return an approximation for the number of distinct values in column division in DataFrame storesDF?

Answer options

Correct answer: C

Explanation

Option C is correct because a higher relative error (0.15) allows for a faster approximation of distinct values compared to the lower thresholds in other options. Options A and D do not use a relative error parameter, making them less efficient for quick estimations. Options B and E have lower relative errors than C, which would result in a longer computation time.