Databricks Certified Machine Learning Associate — Question 1
A data scientist has created two linear regression models. The first model uses price as a label variable and the second model uses log(price) as a label variable. When evaluating the RMSE of each model by comparing the label predictions to the actual price values, the data scientist notices that the RMSE for the second model is much larger than the RMSE of the first model.
Which of the following possible explanations for this difference is invalid?
Answer options
- A. The second model is much more accurate than the first model
- B. The data scientist failed to exponentiate the predictions in the second model prior to computing the RMSE
- C. The data scientist failed to take the log of the predictions in the first model prior to computing the RMSE
- D. The first model is much more accurate than the second model
- E. The RMSE is an invalid evaluation metric for regression problems
Correct answer: B
Explanation
The correct answer is B because the RMSE for the second model, which uses log(price), should be exponentiated to compare it accurately against the actual price values. Option A is invalid because a higher RMSE indicates lower accuracy, and option E is incorrect as RMSE is a valid metric for regression. Options C and D are plausible explanations for the RMSE difference.