Databricks Certified Associate Developer for Apache Spark — Question 116

QUESTION NO: 75 -

Which of the following code blocks returns a DataFrame where column divisionDistinct is the approximate number of distinct values in column division from DataFrame storesDF?

Answer options

Correct answer: C

Explanation

The correct answer, C, uses the agg function to apply approx_count_distinct on the division column and aliases the result as divisionDistinct, thus creating a new DataFrame with the desired structure. Option A incorrectly uses withColumn instead of agg, while option B misuses the method by attempting to apply approx_count_distinct directly on the column reference. Options D and E also have incorrect usages of the methods, with D trying to use withColumn inappropriately and E having a redundant alias on the count distinct method.