Databricks Certified Machine Learning Associate — Question 6
A data scientist has a Spark DataFrame spark_df. They want to create a new Spark DataFrame that contains only the rows from spark_df where the value in column discount is less than or equal 0.
Which of the following code blocks will accomplish this task?
Answer options
- A. spark_df.loc[:,spark_df["discount"] <= 0]
- B. spark_df[spark_df["discount"] <= 0]
- C. spark_df.filter (col("discount") <= 0)
- D. spark_df.loc(spark_df["discount"] <= 0, :]
Correct answer: C
Explanation
The correct answer is C because it uses the filter method, which is specifically designed for Spark DataFrames to filter rows based on a condition. Options A and B are not valid for Spark DataFrames, as they resemble Pandas syntax, while option D incorrectly uses loc with parentheses instead of brackets, making it invalid.