Databricks Certified Associate Developer for Apache Spark — Question 29
The code block shown below contains an error. The code block is intended to return a new DataFrame with the mean of column sqft from DataFrame storesDF in column sqftMean. Identify the error.
Code block:
storesDF.agg(mean("sqft").alias("sqftMean"))
Answer options
- A. The argument to the mean() operation should be a Column abject rather than a string column name.
- B. The argument to the mean() operation should not be quoted.
- C. The mean() operation is not a standalone function – it’s a method of the Column object.
- D. The agg() operation is not appropriate here – the withColumn() operation should be used instead.
- E. The only way to compute a mean of a column is with the mean() method from a DataFrame.
Correct answer: A
Explanation
The correct answer is A because the mean() function requires a Column object as an argument, not a string representation of the column name. Options B and C are incorrect as quoting the column name is necessary, and mean() is not a method of the Column object. Option D misrepresents the use of agg(), which is indeed appropriate for aggregating data, and option E is incorrect since there are multiple ways to compute means, including aggregation functions.