Databricks Certified Associate Developer for Apache Spark — Question 116
QUESTION NO: 75 -
Which of the following code blocks returns a DataFrame where column divisionDistinct is the approximate number of distinct values in column division from DataFrame storesDF?
Answer options
- A. storesDF.withColumn("divisionDistinct", approx_count_distinct(col("division")))
- B. storesDF.agg(col("division").approx_count_distinct("divisionDistinct"))
- C. storesDF.agg(approx_count_distinct(col("division")).alias("divisionDistinct"))
- D. storesDF.withColumn("divisionDistinct", col("division").approx_count_distinct())
- E. storesDF.agg(col("division").approx_count_distinct().alias("divisionDistinct"))
Correct answer: C
Explanation
The correct answer, C, uses the agg function to apply approx_count_distinct on the division column and aliases the result as divisionDistinct, thus creating a new DataFrame with the desired structure. Option A incorrectly uses withColumn instead of agg, while option B misuses the method by attempting to apply approx_count_distinct directly on the column reference. Options D and E also have incorrect usages of the methods, with D trying to use withColumn inappropriately and E having a redundant alias on the count distinct method.