A developer wants to refactor some older Spark code in order to take advantage of built-i…

Question

A developer wants to refactor some older Spark code in order to take advantage of built-in functions introduced in Spark 3.5.0. The developer comes across the following existing DataFrame code: import pyspark.sql.functions as F min_price = 110.50 result_df = prices_df \ .filter(F.col("spot_price") >= F.1it(min_price)) \ .agg(F.count ("*")) Which code block should the developer use to refactor the code?

Accepted Answer

Correct answer: B. B. result_df = prices_df \
.agg(F.count_if(F.col(“spot_price”) >= F.lit(min_price))) — Option B is correct because it uses the new `count_if` function to count the rows where `spot_price` meets or exceeds `min_price`, effectively refactoring the original logic into a more efficient format. Option A creates a new column instead of directly counting, while options C and D do not focus on counting the valid prices that meet the condition.

Databricks Certified Associate Developer for Apache Spark — Question 207

Answer options

Correct answer: B

Explanation