Databricks Certified Associate Developer for Apache Spark — Question 107
The code block shown below contains an error. The code block is intended to return a collection of summary statistics for column sqft in Data Frame storesDF. Identify the error.
Code block:
storesDF.describes(col("sgft "))
Answer options
- A. The describe() operation doesn't compute summary statistics for a single column — the summary() operation should be used instead.
- B. The column sqft should be subsetted from DataFrame storesDF prior to computing summary statistics on it alone.
- C. The describe() operation does not accept a Column object as an argument outside of a list — the list [col("sqft")] should be specified instead.
- D. The describe() operation does not accept a Column object as an argument — the column name string "sqft" should be specified instead.
- E. The describe() operation doesn't compute summary statistics for numeric columns — the sumwary() operation should be used instead.
Correct answer: D
Explanation
The correct answer is D because the describe() function requires a column name as a string rather than a Column object. The other options are incorrect as they misrepresent the functionality of the describe() method or suggest incorrect operations that do not apply to the context of this code block.