Databricks Certified Associate Developer for Apache Spark — Question 25
Which of the following code blocks returns a new DataFrame with a new column employeesPerSqft that is the quotient of column numberOfEmployees and column sqft, both of which are from DataFrame storesDF? Note that column employeesPerSqft is not in the original DataFrame storesDF.
Answer options
- A. storesDF.withColumn("employeesPerSqft", col("numberOfEmployees") / col("sqft"))
- B. storesDF.withColumn("employeesPerSqft", "numberOfEmployees" / "sqft")
- C. storesDF.select("employeesPerSqft", "numberOfEmployees" / "sqft")
- D. storesDF.select("employeesPerSqft", col("numberOfEmployees") / col("sqft"))
- E. storesDF.withColumn(col("employeesPerSqft"), col("numberOfEmployees") / col("sqft"))
Correct answer: A
Explanation
The correct answer is A because it uses the withColumn method correctly to create a new column employeesPerSqft by dividing the values in numberOfEmployees by sqft. Option B is incorrect because it uses string literals instead of column references. Options C and D do not use the withColumn method and thus are not used for adding new columns. Option E is also incorrect because it improperly uses col() for the new column name.