Databricks Certified Associate Developer for Apache Spark — Question 88
Which of the following code blocks returns a DataFrame containing only the rows from DataFrame storesDF where the value in column sqft is less than or equal to 25,000 OR the value in column customerSatisfaction is greater than or equal to 30?
Answer options
- A. storesDF.filter(col("sqft") <= 25000 and col("customerSatisfaction") >= 30)
- B. storesDF.filter(col("sqft") <= 25000 | col("customerSatisfaction") >= 30)
- C. storesDF.filter(col(sqft) <= 25000 or col(customerSatisfaction) >= 30)
- D. storesDF.filter(sqft <= 25000 | customerSatisfaction >= 30)
- E. storesDF.filter(col("sqft") <= 25000 or col("customerSatisfaction") >= 30)
Correct answer: B
Explanation
The correct answer is B because it uses the logical OR operator (|) correctly to combine the two conditions, which is necessary for the filter to include rows that meet either condition. Option A incorrectly uses 'and', which would only return rows that meet both conditions, while options C and E have syntax issues with the logical operators. Option D lacks the column reference necessary for filtering.