Databricks Certified Machine Learning Professional — Question 29
A data scientist is utilizing MLflow to track their machine learning experiments. After completing a series of runs for the experiment with experiment ID exp_id, the data scientist wants to programmatically work with the experiment run data in a Spark DataFrame. They have an active MLflow Client client and an active Spark session spark.
Which of the following lines of code can be used to obtain run-level results for exp_id in a Spark DataFrame?
Answer options
- A. client.list_run_infos(exp_id)
- B. spark.read.format("delta").load(exp_id)
- C. There is no way to programmatically return row-level results from an MLflow Experiment.
- D. mlflow.search_runs(exp_id)
- E. spark.read.format("mlflow-experiment").load(exp_id)
Correct answer: E
Explanation
The correct option is E, as it specifically uses the MLflow format to read experiment data into a Spark DataFrame. Option A lists run information but does not return it in a DataFrame format. Option B attempts to load data from a Delta format, which is not applicable here. Option C incorrectly states that it's impossible to retrieve row-level results, which is false given the correct use of option E. Option D is also incorrect as it does not return data in a DataFrame format.