Databricks Certified Associate Developer for Apache Spark — Question 11
The code block shown below contains an error. The code block is intended to return a new DataFrame that is the result of a cross join between DataFrame storesDF and DataFrame employeesDF. Identify the error.
Code block:
storesDF.join(employeesDF, "cross")
Answer options
- A. A cross join is not implemented by the DataFrame.join() operations – the standalone CrossJoin() operation should be used instead.
- B. There is no direct cross join in Spark, but it can be implemented by performing an outer join on all columns of both DataFrames.
- C. A cross join is not implemented by the DataFrame.join()operation – the DataFrame.crossJoin()operation should be used instead.
- D. There is no key column specified – the key column "storeId" should be the second argument.
- E. A cross join is not implemented by the DataFrame.join() operations – the standalone join() operation should be used instead.
Correct answer: C
Explanation
The correct answer is C because Spark provides a specific method called DataFrame.crossJoin() for performing cross joins, while DataFrame.join() does not support this operation. Options A and E are incorrect as they suggest using a standalone join operation, which is not applicable. Option B is misleading since it mentions outer joins, which are not relevant to cross joins, and D is incorrect as it misinterprets the requirement for a key column in a cross join.