Databricks Certified Associate Developer for Apache Spark — Question 22
The code block shown below contains an error. The code block is intended to return a DataFrame containing all columns from DataFrame storesDF except for column sqft and column customerSatisfaction. Identify the error.
Code block:
storesDF.drop(sqft, customerSatisfaction)
Answer options
- A. The drop() operation only works if one column name is called at a time – there should be two calls in succession like storesDF.drop("sqft").drop("customerSatisfaction").
- B. The drop() operation only works if column names are wrapped inside the col() function like storesDF.drop(col(sqft), col(customerSatisfaction)).
- C. There is no drop() operation for storesDF.
- D. The sqft and customerSatisfaction column names should be quoted like "sqft" and "customerSatisfaction".
- E. The sqft and customerSatisfaction column names should be subset from the DataFrame storesDF like storesDF."sqft" and storesDF."customerSatisfaction".
Correct answer: D
Explanation
The correct answer is D because the drop() function requires the column names to be provided as strings, which means they should be enclosed in quotes. The other options are incorrect as they suggest methods or operations that are not necessary or valid for the drop() function in this context.