Databricks Certified Machine Learning Associate — Question 2

A data scientist has a Spark DataFrame spark_df. They want to create a new Spark DataFrame that contains only the rows from spark_df where the value in column price is greater than 0.
Which of the following code blocks will accomplish this task?

Answer options

Correct answer: B

Explanation

The correct answer is B because the filter method is the appropriate way to select rows based on a condition in Spark DataFrames. Option A is incorrect as it uses a syntax not supported in Spark, C is a SQL query and not valid for a Spark DataFrame, while D and E use loc, which is not applicable in the context of Spark DataFrames.