Databricks Certified Machine Learning Associate — Question 26

A data scientist has written a data cleaning notebook that utilizes the pandas library, but their colleague has suggested that they refactor their notebook to scale with big data.
Which of the following approaches can the data scientist take to spend the least amount of time refactoring their notebook to scale with big data?

Answer options

Correct answer: E

Explanation

The correct answer is E because utilizing the pandas API on Spark allows the data scientist to leverage their existing knowledge of pandas while scaling for big data. Options A, B, C, and D involve more substantial changes to the notebook and require learning new APIs, which would take more time to implement.