Databricks Certified Associate Developer for Apache Spark — Question 121
Which of the following pairs of arguments cannot be used in DataFrame.join() to perform an inner join on two DataFrames, named and aliased with "a" and "b" respectively, to specify two key columns column1 and column2?
Answer options
- A. joinExprs = col(“a.column1”) === col(“b.column1”) and col(“a.column2”) === col(“b.column2”)
- B. usingColumns = Seq(col(“column1”), col(“column2”))
- C. All of these options can be used to perform an inner join with two key columns.
- D. joinExprs = storesDF(“column1”) === employeesDF(“column1”) and storesDF(“column2”) === employeesDF (“column2”)
- E. usingColumns = Seq(“column1”, “column2”)
Correct answer: B
Explanation
The correct answer is B because usingColumns requires the column names to be specified as strings without the col() function. Options A, D, and E correctly specify the join conditions and can be used for an inner join, while option C incorrectly states that all options are valid.