A data scientist has a Spark DataFrame spark_df. They want to create a new Spark DataFram…

Question

A data scientist has a Spark DataFrame spark_df. They want to create a new Spark DataFrame that contains only the rows from spark_df where the value in column price is greater than 0.
Which of the following code blocks will accomplish this task?

Accepted Answer

Correct answer: B. B. spark_df.filter(col("price") > 0) — The correct answer is B because the filter method is the appropriate way to select rows based on a condition in Spark DataFrames. Option A is incorrect as it uses a syntax not supported in Spark, C is a SQL query and not valid for a Spark DataFrame, while D and E use loc, which is not applicable in the context of Spark DataFrames.

Databricks Certified Machine Learning Associate — Question 2

Answer options

Correct answer: B

Explanation