Databricks Certified Associate Developer for Apache Spark — Question 49
The code block shown below contains an error. The code block intended to return a new DataFrame that is the result of an inner join between DataFrame storesDF and DataFrame employeesDF on column storeId. Identify the error.
Code block:
StoresDF.join(employeesDF, "inner", "storeID")
Answer options
- A. The key column storeID needs to be wrapped in the col() operation.
- B. The key column storeID needs to be in a list like ["storeID"].
- C. The key column storeID needs to be specified in an expression of both DataFrame columns like storesDF.storeId == employeesDF.storeId.
- D. There is no DataFrame.join() operation – DataFrame.merge() should be used instead.
- E. The column key is the second parameter to join() and the type of join in the third parameter to join() – the second and third arguments should be switched.
Correct answer: E
Explanation
The correct answer is E because in the join() method, the key column should be the second argument and the join type should be the third. The other options suggest modifications that aren't necessary for the join operation to work correctly.