Databricks Certified Associate Developer for Apache Spark — Question 174
The code block shown below contains an error. The code block intended to return a new DataFrame that is the result of an outer join between DataFrame storesDF and DataFrame employeesDF on column storeId. Identify the error.
Code block:
storesDF.join(employeesDF, "storeId")
Answer options
- A. The default argument to the how parameter is "inner" – an additional argument of "outer" must be specified.
- B. The key column storeId needs to be wrapped in the col() operation.
- C. The key column storeId needs to be specified in an expression of both Data Frame columns like storesDF.storeId == employeesDF.storeId.
- D. The key column storeId needs to be in a list like ["storeId"].
- E. There is no DataFrame.join() operation – DataFrame.merge() should be used instead.
Correct answer: A
Explanation
The correct answer is A because the join method defaults to an inner join, and specifying 'outer' is necessary for the intended operation. Options B, C, D, and E are incorrect as they describe either valid practices or misunderstandings about the join method's functionality.