A data scientist has been given an incomplete notebook from the data engineering team. Th…

Question

A data scientist has been given an incomplete notebook from the data engineering team. The notebook uses a Spark DataFrame spark_df on which the data scientist needs to perform further feature engineering. Unfortunately, the data scientist has not yet learned the PySpark DataFrame API.
Which of the following blocks of code can the data scientist run to be able to use the pandas API on Spark?

Accepted Answer

Correct answer: A. A. import pyspark.pandas as ps
df = ps.DataFrame(spark_df) — The correct answer is A, as it correctly imports the pyspark.pandas module and converts the Spark DataFrame spark_df into a pandas-on-Spark DataFrame. Option B is incorrect because it attempts to convert the Spark DataFrame to a pandas DataFrame, which is not suitable for the context. Option C is not relevant since it deals with SQL operations, while Option D tries to use the standard pandas DataFrame constructor, which does not work with Spark DataFrames. Option E also incorrectly tries to convert to a pandas DataFrame directly from a Spark DataFrame.

Databricks Certified Machine Learning Associate — Question 11

Answer options

Correct answer: A

Explanation