Databricks Certified Associate Developer for Apache Spark — Question 115
Which of the following code blocks returns the number of rows in DataFrame storesDF for each distinct combination of values in column division and column storeCategory?
Answer options
- A. storesDF.groupBy(Seq(col(“division”), col(“storeCategory”))).count()
- B. storesDF.groupBy(division, storeCategory).count()
- C. storesDF.groupBy(“division”, “storeCategory”).count()
- D. storesDF.groupBy(“division”).groupBy(“StoreCategory”).count()
- E. storesDF.groupBy(Seq(“division”, “storeCategory”)).count()
Correct answer: C
Explanation
The correct answer is C because it uses the groupBy method correctly with the column names in string format to group the DataFrame by both 'division' and 'storeCategory'. Options A and E use an unnecessary Seq wrapper, while option D incorrectly groups by 'StoreCategory' with a capital 'S', which does not match the original column name.