Databricks Certified Associate Developer for Apache Spark — Question 84
Which of the following code blocks fails to return a new DataFrame that is the result of an inner join between DataFrame storesDF and DataFrame employeesDF on column storeId and column employeeId?
Answer options
- A. storesDF.join(employeesDF, Seq(col("storeId"), col("employeeId")))
- B. storesDF.join(employeesDF, Seq("storeId", "employeeId"))
- C. storesDF.join(employeesDF, storesDF("storeId") === employeesDF("storeId") and storesDF("employeeId") === employeesDF("employeeId"))
- D. storesDF.join(employeesDF, Seq("storeId", "employeeId"), "inner")
- E. storesDF.alias("s").join(employeesDF.alias("e"), col("s.storeId") === col("e.storeId") and col("s.employeeId") === col("e.employeeId"))
Correct answer: A
Explanation
The correct answer is A because the join method expects a column name or a condition, but using Seq(col(...)) is not a valid column reference for an inner join in this context. The other options provide valid ways to specify the join conditions, either through string sequences or column comparisons.