Databricks Certified Associate Developer for Apache Spark — Question 172
Which of the following code blocks returns a new DataFrame with the mean of column sqft from DataFrame storesDF in column sqftMean?
Answer options
- A. storesDF.withColumn(mean(col("sqft")).alias("sqftMean"))
- B. storesDF.agg(col("sqft").mean().alias("sqftMean"))
- C. storesDF.agg(mean("sqft").alias("sqftMean"))
- D. storesDF.agg(mean(col("sqft")).alias("sqftMean"))
- E. storesDF.withColumn("sqftMean", mean(col("sqft")))
Correct answer: D
Explanation
The correct answer is D, as it properly utilizes the agg function to compute the mean of the sqft column and assigns it a new column name sqftMean. Option A incorrectly uses withColumn instead of agg, while options B and C do not correctly reference the column for the mean calculation with col(). Option E also uses withColumn, which is not suitable for this aggregation task.