A machine learning engineer is trying to scale a machine learning pipeline by distributin…

Question

A machine learning engineer is trying to scale a machine learning pipeline by distributing its feature engineering process.
Which of the following feature engineering tasks will be the least efficient to distribute?

Accepted Answer

Correct answer: D. D. Imputing missing feature values with the true median — Imputing missing feature values with the true median is least efficient to distribute because it requires access to the entire dataset to accurately compute the median value. In contrast, tasks like one-hot encoding and creating binary indicator features can be performed independently on subsets of data, making them more suitable for distribution.

Databricks Certified Machine Learning Associate — Question 9

Answer options

Correct answer: D

Explanation