Databricks Certified Machine Learning Associate — Question 17

A data scientist is wanting to explore the Spark DataFrame spark_df. The data scientist wants visual histograms displaying the distribution of numeric features to be included in the exploration.
Which of the following lines of code can the data scientist run to accomplish the task?

Answer options

Correct answer: E

Explanation

The correct answer is E, as it directly calls the summarize function on the DataFrame to produce the desired visual histograms. Option A and D provide summaries of the DataFrame but do not generate visualizations. Option B incorrectly formats the function call, and option C incorrectly states that the task cannot be completed in one line.