Databricks Certified Machine Learning Associate — Question 31
The implementation of linear regression in Spark ML first attempts to solve the linear regression problem using matrix decomposition, but this method does not scale well to large datasets with a large number of variables.
Which of the following approaches does Spark ML use to distribute the training of a linear regression model for large data?
Answer options
- A. Logistic regression
- B. Spark ML cannot distribute linear regression training
- C. Iterative optimization
- D. Least-squares method
- E. Singular value decomposition
Correct answer: C
Explanation
The correct answer is C, as Spark ML uses iterative optimization to effectively distribute the training of linear regression models across large datasets. The other options are incorrect because logistic regression is a different algorithm, option B is false as Spark ML can distribute training, option D describes a method rather than an approach to distribution, and option E refers to a matrix decomposition technique, not a distribution method.