Databricks Certified Associate Developer for Apache Spark — Question 207

A developer wants to refactor some older Spark code in order to take advantage of built-in functions introduced in Spark 3.5.0.

The developer comes across the following existing DataFrame code:

import pyspark.sql.functions as F

min_price = 110.50

result_df = prices_df \
.filter(F.col("spot_price") >= F.1it(min_price)) \
.agg(F.count ("*"))

Which code block should the developer use to refactor the code?

Answer options

Correct answer: B

Explanation

Option B is correct because it uses the new `count_if` function to count the rows where `spot_price` meets or exceeds `min_price`, effectively refactoring the original logic into a more efficient format. Option A creates a new column instead of directly counting, while options C and D do not focus on counting the valid prices that meet the condition.