Databricks Certified Associate Developer for Apache Spark — Question 50
Which of the following pairs of arguments cannot be used in DataFrame.join() to perform an inner join on two DataFrames, named and aliased with "a" and "b" respectively, to specify two key columns?
Answer options
- A. on = [a.column1 == b.column1, a.column2 == b.column2]
- B. on = [col("column1"), col("column2")]
- C. on = [col("a.column1") == col("b.column1"), col("a.column2") == col("b.column2")]
- D. All of these options can be used to perform an inner join with two key columns.
- E. on = ["column1", "column2"]
Correct answer: B
Explanation
The correct answer is B because using col() with just the column names does not specify the DataFrame they belong to, which is required for the join operation. Options A, C, and E properly define the relationships between the columns in the respective DataFrames, making them valid for performing an inner join.